Apple published a paper in June 2025 that called out the entire AI industry.
And the industry has not recovered from it since.
The paper is called “The Illusion of Thinking.” Six Apple researchers. Months of controlled experiments. One conclusion that landed like a grenade.
Frontier reasoning models face a complete accuracy collapse beyond certain complexities.
Complete. Not partial. Not gradual. Complete.
Here is what that actually means.
For two years, every major AI lab has been racing to build reasoning models. OpenAI’s o1, o3. Anthropic’s Claude 3.7 Sonnet Thinking. DeepSeek R1. Google’s Gemini Thinking. These models do not just answer questions, they visibly think first. They show their work. They reason step by step through a problem before arriving at an answer. The entire industry marketed this as the next evolution of intelligence.
Apple tested whether it was real.
They did not use math benchmarks or coding tests, the standard evaluations every AI company optimizes against during training. They built clean, controllable puzzle environments. Tower of Hanoi. River Crossing. Checker Jumping. Blocks World. Problems with precise, verifiable correct answers and zero possibility of data contamination.
Then they systematically turned up the complexity. And watched what happened.
For simpler, low-complexity problems, standard LLMs demonstrated greater efficiency and accuracy, the reasoning models were beaten by regular models that do not think at all. As complexity moderately increased, reasoning models gained an advantage. But when problems reached high complexity, both model types experienced complete performance collapse.
The thinking models, the ones that cost more, take longer, and are marketed as more intelligent, lost to basic models on easy tasks. Then both collapsed completely on hard ones.
But the finding that truly alarmed researchers was not the collapse itself.
Nadaljujte z branjem→
You must be logged in to post a comment.