Science, once defined by human-led experiments, hypotheses, and trial and error, is being transformed by artificial intelligence. Today, AI-generated insights often take center stage, leading some to fear that the future of academic research and the rigor of human-based experimentation are at stake.
However, the recent study warns the scientific community of a sombre reality: while AI is writing more science than ever, it might not be getting it right. In a nutshell, you cannot trust AI with science.
The research conducted by Washington State University (WSU) professor Mesut Cicek and his research team, aimed to explore whether ChatGPT (versions 3.5 and 5 mini) could crack accurate claims when it was fed 700 scientific hypotheses taken from research papers.
The team also asked the same question 10 times for each one to measure consistency.
According to findings published in the Rutgers Business Review, apparently AI appeared accurate on the surface, showing a consistency of 73 percent and accuracy of 80 percent.
But researchers found that when they used ChatGPT for random guessing, its true reliability plunged as the AI performed only about 60 percent better than chance, a level closer to a low D grade.
Cicek, an associate professor in the Department of Marketing and International Business in WSU and lead author of study, said, "We are not just talking about accuracy, we're talking about inconsistency, because if you ask the same question again and again, you come up with different answers.”
ChatGPT also remained unsuccessful in identifying false statements, correctly labelling them only 16.4% of the time.
The researchers also highlight the “fluency trap” and “illusion of understanding” generated by ChatGPT.
Although AI chatbots can create smooth and convincing language, thereby compelling the brain to accept the information as truthful and fact-based. Such cognitive bias often leads to a “fluency trap.”
According to Cicek, the findings suggest that the capability of AI to truly think with sheer accuracy is still farther away than many expect.
The researchers also argue that it lacks a conceptual "brain." It relies on memorized patterns rather than actual reasoning or world-knowledge.
Given the significant inconsistency in scientific answers generated by ChatGPT, the researchers urge the scientific community to use AI responsibly and verify the information.
The best approach is skepticism coupled with better training needed for AI systems.
"Always be skeptical. I'm not against AI. I'm using it. But you need to be very careful,” Cicek said.