Study finds AI doesn't understand sports like humans do
Researchers from UNC Chapel Hill and Northeastern tested top AI models on 35,000 hours of sports footage
Sports commentators can relax. A new study from researchers at the University of North Carolina at Chapel Hill and Northeastern University tested the most widely used AI models against 35,000 hours of professional sports footage and found that even the best performers collapsed when asked to do anything more demanding than describe what was happening on screen.
A new benchmark called SVI-bench (strategic video intelligence) was introduced specifically to test four skills that have remained difficult for previous AI assessments – perception, causation, simulation, and agency.
The data contained videos of basketball, soccer, and hockey games; 15 million tagged game plays; 15,000 hours of professional analysis; 23,000 post-match reports; and 103,000 statistics.
Tests were done using ChatGPT, Google's Gemini and the open-source model Qwen. This study has not yet been peer-reviewed.
Here, too, the success rates were no better than 74%, an accuracy level that wouldn't last long in even one youth-league broadcast, according to the researchers.
When it came to causal reasoning, trying to understand the cause for a certain play, the average level of accuracy dipped to just 40%. ChatGPT, when questioned about what was unusual about a shot taken from the top of the backboard, said it was the "player's first made three of the game".
In the area of simulation, where the models were asked to anticipate a player's physical action after analysing his trajectory, the best-performing model showed success rates close to the flip of a coin.
The agency tests asking models to conduct the kind of complex post-game statistical analysis a professional broadcaster performs routinely produced the study's most striking result. Accuracy fell to just 5 per cent.
"A good sportscaster does much more than describe what's on screen; they explain why a play worked, anticipate what's next, and decide which moments matter," said Lorenzo Torresani, a computer science researcher at Northeastern and co-author of the study. "Our study shows AI is already reasonably good at the descriptive part but collapses on the rest."
"The same gap shows up in any job whose value lies not in describing what's visible, but in understanding why events unfold, anticipating what comes next, deciding what matters, and recommending what to do about it," Torresani said. At a moment when anxiety about AI-driven job displacement is high across many industries, the study offers a more precise picture of where the technology's current limits actually sit.
-
OpenAI delays public launch of GPT-5.6 amid US government vetting
-
Former Meta employees sues company, says it is trying to silence her
-
Europe risks falling behind in space defence, experts warn
-
New AI weapon? China's bold claim of 'cyber nuclear weapon' raises alarm
-
What is Q-day? Biggest cybersecurity threat you've never heard of
-
Apple supplier restricts system access after major data breach
-
TikTok, YouTube deactivates 4.7m under-16 accounts in Indonesia
-
Why Italy is joining the US-led Pax Silica AI initiative despite tensions with Trump
