Hamsa Bastani (Wharton School, University of Pennsylvania) je s soavtorji objavila študijo z rezultati kontroliranega eksperimenta, kako uporaba UI za učenje vpliva na dejansko znanje. Rezultati so pričakovani – uporaba UI daje študentom iluzijo, da znajo reševati naloge, ne da pa jim znanja. Študenti, ki so morali naloge reševati na »hard way« prek uporabe učbenikov in brez UI, so na testih, kjer UI ni bil dovoljen, dosegali bistveno višje rezultate kot tisti, ki so se “učili” s pomočjo UI. Študija je bila objavljena v ugledni reviji PNAS (Proceedings of the National Academy of Sciences) junija 2025 (DOI: 10.1073/pnas.2422633122).
A Wharton economist ran a randomized controlled trial on almost a thousand high school students in Turkey.
The result was so brutal for the AI-in-education narrative that it had to be peer-reviewed by PNAS before people would believe it.
Her name is Hamsa Bastani. She teaches operations and information at the Wharton School at the University of Pennsylvania, and the study she published in 2025 alongside her co-authors is one of the cleanest experiments anyone has run on what AI actually does to learning when you remove it from the equation and check what is left.
The setup was a randomized controlled trial, the same methodology used in clinical drug trials. Nearly a thousand high school math students in Turkey were split into three groups and put through four sessions of ninety minutes each. One group practiced with GPT Base, a standard ChatGPT-4 interface that could answer any question directly. One group practiced with GPT Tutor, a version of the same model that had been prompted to guide students with hints rather than hand them the answer. One group practiced with nothing but their textbook and their own head.
During the practice sessions, the AI groups looked like a miracle. The GPT Base group solved 48% more problems than the students working alone. The GPT Tutor group solved 127% more. Every administrator looking at those numbers would have written a press release about the transformative power of AI in education and moved on.
Then the actual exam came, and AI was not allowed.
The students who had practiced with GPT Base scored 17% worse than the students who had practiced alone. Seventeen percent worse, despite having solved nearly half again as many problems in the sessions leading up to it. The students who had struggled the most, who had sat with the confusion and worked through it without a tool to rescue them, were now the only ones who could actually do the math when it counted.
Bastani’s team read through the chat logs to understand what had actually been happening during the practice sessions, and the answer was exactly what the exam results had already implied. The GPT Base group had not been learning. They had been extracting answers and moving on, and every moment that felt like understanding was actually the model doing the cognitive work while the student’s brain waited for the next problem to arrive. The paper describes it precisely: without guardrails, students attempt to use GPT-4 as a crutch during practice, and subsequently perform worse on their own.
The detail that should follow every conversation about AI in education is the one buried in the post-test survey results. The students who had relied on AI the most during practice were also the most confident they had understood the material. The tool had not just failed to teach them. It had convinced them they had learned something they had not, which is a different kind of failure entirely and a much harder one to correct because the student has no idea it is happening.
The crutch had made them confident and weak at the same time.