ChatGPT Failed on Scientific Hypotheses: AI Just Guesses, and Does It Very Poorly

Technologies
Focus
Publiation data: 19.03.2026 14:28

Scientists tested ChatGPT by asking it to evaluate whether hundreds of scientific hypotheses were true or false, and the results were not very encouraging. The study shows that AI has modest reasoning abilities.

Researchers from Washington State University, USA, repeatedly tested ChatGPT by asking it to evaluate the truth or falsehood of scientific hypotheses taken from research articles. The aim of the study was to determine whether AI could accurately identify whether each statement was supported by research or not. In other words, whether it was true or false. In total, the scientists provided over 700 hypotheses for evaluation. The results showed that AI responses should be approached with a significant degree of skepticism and that all information generated by AI chatbots should be verified, writes Focus.

During the experiment in 2024, ChatGPT provided correct answers 76.5% of the time. In 2025, this figure rose to 80%. However, after the researchers accounted for random guessing, the results appeared less impressive. The artificial intelligence only showed correct answers in 60% of cases. Although this study specifically focused on ChatGPT, the scientists noted that similar experiments with other AIs yielded comparable results.

Artificial intelligence struggles the most with identifying false statements, correctly identifying them only 16.4% of the time. AI also demonstrates significant inconsistency. Even when the same question is asked 10 times, different answers can be obtained, the researchers say.

The authors of the study believe that artificial intelligence capable of truly "thinking" may emerge further in the future than many expect.

Modern AI tools do not understand the world as we do — they lack a "brain." They simply memorize and can provide some insights, but they do not understand what they are talking about," the authors of the study say.

The study's results indicate a fundamental limitation of artificial intelligence. Although AI can generate convincing answers, it often struggles to analyze complex issues. This can lead to responses that sound convincing but are actually incorrect, the researchers say.

The researchers recommend verifying information generated by AI and approaching it with skepticism.