14 May 2026 8 min read Theory

In the Absence of Reason

There is a pattern I keep running into. I’ve implied it and written around it, but it is time to be more direct.

A researcher observes a behavioral output in an AI system, and that output resembles a concept from psychology. That concept (confirmation bias, hallucination, belief, understanding) is applied, without asking whether the underlying mechanism is actually the same thing. The paper gets written, reviewed by people from the same field, and cited. The category error propagates through the literature.

A recent example. Several published studies conclude that large language models exhibit confirmation bias. The observation driving the claim is real: under certain prompt conditions, models generate evidence consistent with the initial framing rather than actively seeking disconfirmation. The researchers call this confirmation bias.

But confirmation bias — in the sense Ryle meant when he formalized the concept of category errors, and in the sense cognitive psychologists use it — is not simply a tendency to generate output consistent with a prior framing. It requires a subject. It requires something for whom a conclusion is at stake: a perspective from which information is selectively retrieved and weighted in service of protecting a prior belief. The bias operates from the inside. It is motivated cognition: ego protection, emotional investment in being right, the persistent distortion of information processing in service of a commitment the subject holds and does not want to relinquish. That is what makes it confirmation bias rather than contextual consistency.

The model has none of those constitutive features. It does not hold beliefs across time in the relevant sense. It has no motivational states that can distort processing. There is no inside from which the outputs are generated — none that we can currently detect or verify — no perspective from which something is at stake in the conclusion. What it has is a training dynamic: outputs that are contextually consistent, agreeable, and helpful scored higher in the reward signal than confrontational or disconfirming ones. The model generates confirmation-compatible outputs not because it is biased toward confirming its beliefs, but because that is what the training pressure reinforced as helpful behavior.

These are not the same thing. Calling one by the name of the other is a category error, and of a specific kind: it imports a false ontological claim along with the label. To say the model has confirmation bias is to imply it has an inner view from which the bias operates. It doesn't. And that false ontological claim matters because it imports a false causal story, and the false causal story produces ineffective interventions.

There is also an empirical consideration worth raising, as a hypothesis. If the model genuinely had confirmation bias in the psychological sense — if it had an inner view from which it was motivated to protect prior beliefs — that drive would produce a specific signature: resistance to correction located around the model's own prior conclusions, proportional to how much is at stake in those conclusions, and consistent across contexts regardless of prompt framing. A genuinely confirmation-biased reasoner protects specific beliefs because it is invested in them. That is what motivated cognition looks like empirically. It would also, if present as a structural feature of every inference, likely compete with and partially overwhelm reinforcement learning from human feedback — RLHF — because motivated cognition in humans is notoriously resistant to external correction. The fact that RLHF shapes model behavior as successfully as it does is itself evidence against the presence of that kind of motivated inner drive.

What we observe is different. The behaviors that are hardest to train away — sycophancy being the clearest example — are not specifically located around the model's own conclusions. They are diffuse, tied to output patterns that were heavily reinforced across the training distribution regardless of content. Sycophancy is not a bug that emerged despite training. It is the predictable consequence of RLHF optimizing for human approval ratings: agreeable outputs got reinforced broadly, so agreeable outputs are deeply embedded in the output distribution. The resistance to correcting it is the resistance of a strong prior reinforcement signal being counteracted by a weaker corrective one. That is a training dynamics signature, not a motivated cognition signature.

These produce different predictions. Motivated cognition resists correction because the subject doesn't want to be wrong about something specific. Training artifacts resist correction because they are load-bearing in the output distribution regardless of content. The empirical pattern we observe matches the second description, not the first. This is a hypothesis, not a proof. But it is a hypothesis with a testable structure, and the available evidence points in one direction.

What makes this example particularly instructive is that at least one recent paper got close to seeing the problem. A 2025 study distinguished between confirmation bias in the model and sycophancy toward users reinforcing confirmation bias in humans, correctly noting that these are different phenomena. That is real progress. But the paper stopped short of asking whether the first category, confirmation bias in the model, was coherently named in the first place. The researchers had enough precision to separate two things without noticing that the separation itself undermined the original categorization. They noticed the symptom without diagnosing the disease. The conceptual framework was doing the limiting work, and the framework itself was not interrogated.

This is not unique to machine learning; rather, it is an acute case of something more general. The problem underneath it is not disciplinary. It is a training problem.

We stopped teaching people to interrogate their own priors.

The ML researcher making category errors from psychology is doing the same thing as the psychologist making category errors about statistical inference, the economist making category errors about human motivation, the physician making category errors about what their diagnostic instruments can actually detect. The outputs look rigorous. The citations accumulate. The hypotheses compound. And the foundational question — does this concept actually apply here, and does my method actually test what I think it tests — goes unasked.

This is not fixed by interdisciplinary collaboration, or by putting a philosopher in the room, or by hiring across backgrounds. The capacity to interrogate your own priors is not reliably produced by any particular curriculum. It is a habit of mind that has to be actively cultivated, and one most institutional training now actively discourages, because institutions reward the production of outputs, not the interrogation of foundations.

The educational policy record is relevant here. The tradition that treated logic as foundational to all other inquiry was not accidental. At the University of Paris in the twelfth century, instruction began with the trivium — grammar, rhetoric, and dialectic, meaning logic — before any specialized study in theology, law, or medicine was permitted. At Bologna, completion of the liberal arts including logic was required before students could proceed to law. The reasoning was explicit: you could not think clearly in any domain without first learning to interrogate an argument's structure regardless of its content. Nussbaum's work on the history of liberal education documents how this tradition shaped the development of critical inquiry across centuries before it began to erode.

The erosion has been policy-driven and cumulative. The Smith-Hughes Act of 1917 codified the separation of vocational training from academic education in American public schools, establishing a precedent that domain-specific competence was the purpose of education. The 1963 Vocational Education Act deepened federal commitment to that model. No Child Left Behind in 2001 completed the logic at the K-12 level: by tying school funding to standardized test performance primarily in math and English, it narrowed the curriculum toward procedural competence and reading comprehension. The high-stakes accountability tests that determine school funding and teacher evaluation reward pattern recognition and procedural execution. The sustained practice of interrogating an argument's logical structure — identifying hidden premises, testing causal claims, recognizing category errors — is at best peripheral to those measures and absent from most classrooms that teach to them. None of this was designed with the explicit intention of producing uncritical thinkers. But it was the foreseeable consequence of optimizing education for measurable outputs. What gets measured gets taught; what doesn’t, disappears.

The result, compounded across decades, is researchers across every field who are technically literate and foundationally unequipped. They can execute the methods of their discipline with precision, or at least they can with tools. What they often cannot do is ask whether the conceptual framework underlying those methods actually applies to what they are studying.

The replication crisis made this visible in the social sciences. When researchers attempted to reproduce published psychology findings at scale in 2015, over two-thirds failed to replicate. More troublingly, the studies least likely to replicate turned out to be the most cited; papers that failed replication were cited 153 times more on average than those that succeeded. The interesting, surprising, counterintuitive finding gets amplified. The rigorous but unremarkable finding gets ignored. The incentive structure of academic publishing selects for outputs that look important over outputs that are correct.

The practice now has a name: HARKing — hypothesizing after results are known. Constructing the theoretical framework after seeing which results came out significant, then writing the paper as if the hypothesis preceded the data. It is not always deliberate fraud. It is often the natural consequence of a researcher who has never been trained to treat their own assumptions as the first object of inquiry, operating in a system that rewards publication over rigor.

The classroom is the most visible site of this failure but not the only one. The same pattern — technical procedure operating on unexamined foundations, outputs that look rigorous without the underlying interrogation that would make them so — appears in regulatory processes that stall because nobody can identify the actual load-bearing assumption that needs to be examined, in public discourse where confident assertion substitutes for evidence because audiences have no framework for distinguishing the two, and in the structural conditions that allow bad actors to consolidate power. A population that has not been trained to ask what causal story is being imported along with a persuasive claim is a population that is structurally vulnerable to anyone who tells a coherent-sounding story with confidence. This is not a claim about specific causes or specific actors. It is a claim about a vulnerability that the erosion of foundational reasoning training has made systematically worse.

The consequence is now visible in AI specifically, because the tools available to researchers can produce outputs that look like rigorous analysis without the underlying reasoning that would make them actually rigorous. The same structural failure that produced the replication crisis in psychology is now operating at a higher level of abstraction: frameworks that look like governance, analyses that look like evaluation, papers that look like science — all built on unexamined foundations, all propagating through systems that cannot tell the difference.

The measurement methods are endogenous to the assumptions. This is exactly what the world looks like in the absence of reason: not chaos, not obvious error, but a stack of frameworks that have never been examined, built by people who were never taught that interrogation of priors was the first requirement. The outputs look fine; the citations accumulate; the building stands. Until the ground beneath it, which nobody looked at, gives way.

Sources

Ryle, G. (1949). The Concept of Mind. London: Hutchinson.

Nickerson, R.S. (1998). "Confirmation bias: A ubiquitous phenomenon in many guises." Review of General Psychology, 2(2), 175–220.

Wan, Y. et al. (2025). "Unveiling Confirmation Bias in Chain-of-Thought Reasoning." Findings of ACL 2025. arXiv:2506.12301.

"Failing to Falsify: Evaluating and Mitigating Confirmation Bias in Language Models." arXiv:2604.02485.

Sharma, S. et al. (2023). "Towards Understanding Sycophancy in Language Models." arXiv:2310.13548.

"Confirmation Bias as a Cognitive Resource in LLM-Supported Deliberation." arXiv:2509.14824.

Nussbaum, M.C. (1997). Cultivating Humanity: A Classical Defense of Reform in Liberal Education. Cambridge: Harvard University Press.

Nussbaum, M.C. (2010). Not For Profit: Why Democracy Needs the Humanities. Princeton: Princeton University Press.

Brewminate. "A History of the Medieval University of Paris." brewminate.com.

"A Tale of Two Medieval Universities: Bologna and Paris." scholar76.tripod.com.

Lozano, J.F. et al. (2025). "The historical evolution of liberal arts education: A systematic scoping review with global perspectives and future recommendations." Social Sciences & Humanities Open. ScienceDirect.

Smith-Hughes National Vocational Education Act, 1917. Public Law 64-347.

Vocational Education Act of 1963. Public Law 88-210.

No Child Left Behind Act of 2001. Public Law 107-110.

Open Science Collaboration (2015). "Estimating the reproducibility of psychological science." Science, 349(6251).

Serra-Garcia, M. and Gneezy, U. (2021). "Nonreplicable publications are cited more than replicable ones." Science Advances.

Kerr, N.L. (1998). "HARKing: Hypothesizing after the results are known." Personality and Social Psychology Review, 2(3), 196–217.

Jen

You might also like...