Can AI really understand scientific papers? New study raises doubts

For scientists, it is always a challenging task to keep up with thousands of scientific research papers. Researchers are increasingly using large language models (LLMs), including ChatGPT and Claude, to search and summarise scientific literature. According to a new study, there is still more work to be done in terms of interpreting scientific research as experts do.

Researchers from Cornell University and Google have tested several AI systems to check if they can analyse scientific papers as well as experts do. The study, which was published in the Proceedings of the National Academy of Sciences, examined the ability of these AI systems to deliver precise solutions for complicated superconducting materials questions through their analysis of extensive scientific paper collections.

Can AI read research papers like scientists?

The researchers developed a scientific database containing 1726 scientific articles which studied high-temperature cuprates, materials that scientists have researched for more than thirty years. They created 67 difficult questions to test the AI systems' answering abilities, which showed their advanced knowledge about the subject matter.

The tested AI systems included ChatGPT-4 and Claude 3.5 and Perplexity AI and Gemini Advanced Pro 1.5 and NotebookLM. Twelve human experts assessed the AI system responses without knowing which AI system had provided them.

The study found that AI systems which utilised scientific literature outperformed those which conducted internet search-based research. User-uploaded document reading systems like NotebookLM produced superior results.

While the models were effective at extracting text-based information from research papers, they struggled with interpreting charts and figures. According to the researchers, understanding data visualisation is a key skill for scientists, making this a major limitation for current AI systems.

The study has also found that some models occasionally invented references or failed to capture the full complexity of scientific debates. Researchers say improvements in citation accuracy, reasoning across multiple studies, and visual analysis will be essential for future AI systems.