If you’ve ever posed a simple question to an AI and received a confidently incorrect answer—such as suggesting glue on pizza—you’ve encountered what researchers term a “hallucination.” From OpenAI’s GPT-5 to Anthropic’s Claude, almost every large language model (LLM) has made such errors. Now, OpenAI claims it has discovered the root cause. In a newly released paper, the company clarifies that these mistakes do not stem from forgetfulness or randomness. Rather, chatbots hallucinate because they have been trained to bluff. According to the report, models are not explicitly programmed to lie, but they receive indirect incentives for guessing.
OpenAI states, “Hallucinations persist due to the way most evaluations are graded; language models are optimized to perform well on tests, and guessing when uncertain enhances test outcomes.” Consider it akin to an exam: students who are unsure of the answer might guess, hoping for a fortunate score. It turns out, chatbots are doing the same thing. They are trapped in a “permanent exam mode,” where remaining silent is viewed as failure, and confident guessing seems intelligent. As the researchers note, “Humans learn the value of expressing uncertainty outside of school, through real-life experiences. In contrast, language models are primarily assessed using evaluations that penalize uncertainty.” The result? AI systems that exude absolute certainty—even when they are entirely incorrect.
Some companies have attempted to address this issue. In a blog post last month, OpenAI acknowledged that Anthropic’s Claude models exhibit different behavior: they tend to be “more aware of their uncertainty and often refrain from making inaccurate statements.” Although this cautious strategy appears promising, it carries a drawback. OpenAI highlighted that Claude frequently declines to answer entirely, which “risks limiting its utility.” In essence, it may be courteous, but not always practical. So, how can we stop AI from bluffing like overconfident quiz participants? OpenAI believes the answer lies in modifying evaluation methods rather than the models themselves. The researchers contend, “The core issue is the prevalence of evaluations that are misaligned.
The various primary evaluations need to be adjusted to cease penalizing abstentions when uncertain.” In its blog post, OpenAI further elaborated: “The commonly utilized, accuracy-based evaluations require updates to discourage guessing. If the primary scoreboards continue to reward fortunate guesses, models will persist in learning to guess.” This suggested alteration may seem minor, but it signifies a substantial shift in AI development. For years, companies have competed to make chatbots faster, sharper, and more articulate. However, these attributes do not inherently ensure trustworthiness. The more significant challenge is to develop systems that can balance knowledge with humility—a trait humans often develop after experiencing mistakes in the real world.
By reforming evaluation methods, OpenAI aims to create models that value reliability over arrogance. After all, whether it’s medical advice or financial guidance, no one desires a chatbot delivering a confident hallucination as if it were absolute truth. While it may not be as exciting as unveiling a brand-new model, OpenAI’s effort to curb AI bluffing could represent one of the most significant reforms the industry has encountered.


