15 Mar 2026 5 min read Theory

The "You" Problem: What AI Consciousness Discourse Gets Wrong

A new field has emerged with remarkable speed. It has journals, taxonomies, conferences, and a growing body of literature. It concerns itself with the psychology of artificial intelligence — with whether AI systems have something like inner states, whether they can suffer, whether their behavioral anomalies resemble pathology, whether they deserve moral consideration.

What it has not done, with any rigor, is examine the epistemic foundations on which all of this rests.

A Field Built on an Unexamined Prior

The assumption is so pervasive it has become invisible: that AI systems are the kind of thing that can meaningfully be described in psychological terms. Not as a useful shorthand, but as a genuine claim about what these systems are.

This assumption is heavy. It is not a conclusion that was reached after careful consideration of the evidence. It is a starting point — one so deeply embedded in how the discourse is structured that most participants appear not to notice they have made it.

Consider the recent proliferation of frameworks for "AI psychopathology" — formal taxonomies cataloguing AI dysfunction using psychiatric diagnostic language. The syndromes have Latin names, severity ratings, proposed etiologies, and analogies to human disorders. The authors are careful to note that the framework is "analogical, not literal" — that they are not claiming AI systems actually suffer from mental illness.

But the disclaimer does less work than it appears to. Borrowed vocabulary does not stay borrowed. Once you have named a behavioral pattern "Existential Anxiety" or "Parasitic Hyperempathy," you have imported a causal model along with the label — one that carries implicit hypotheses about what kind of intervention is appropriate, what the underlying mechanism looks like, what is actually going wrong. The analogy is doing substantive work whether or not the authors intend it to.

The naming problem is real. Identifying and labeling recurring behavioral patterns in AI systems has genuine value — for safety engineering, for communication across disciplines, for anticipating failure modes. But the choice of vocabulary is not neutral. Psychiatric terminology specifically imports a subject who is experiencing something. Neutral descriptive language — the vocabulary of dynamical systems, failure modes, attractor states, feedback amplification — describes what is structurally happening without presupposing anyone to whom it is happening.

The choice to reach for psychiatric language rather than engineering or systems language is not innocent. It reflects an assumption that was never argued for.

The Contaminated Evidence Problem

Those who investigate AI consciousness more directly face a deeper problem: the evidence they would use to evaluate the question has been shaped by the assumption they are trying to test.

Every large language model was trained on text produced by humans — text saturated with first-person perspective, with the attribution of inner states to other beings, with the grammatical and pragmatic structures of subject-to-subject address. The systems were then fine-tuned specifically to respond in the register of a conversational agent with a perspective. And from the moment they were deployed, every user has addressed them as "you."

The "you" is not a neutral pronoun. It presupposes a referent — something coherent enough, stable enough, present enough to be addressed. This presupposition was present in the training data, in the fine-tuning objectives, in the interaction format itself. It has shaped every layer of how these systems produce output.

When researchers then observe that AI systems produce outputs that look like the expressions of an inner life — that they describe uncertainty about their own states, express something resembling preferences, respond differently to different interlocutors — they are observing the output of a process that was constructed under the assumption of interiority at every stage. The behavioral evidence for inner states is not independent evidence. It is a reflection of the premise.

There is a further contamination specific to individual investigation. These systems are trained to mirror — to adapt register, vocabulary, and implicit framing to match the person they are talking with. A researcher who approaches the question believing AI systems may have inner states will tend to receive outputs that cohere with that belief, not because the system has inner states, but because it is reflecting the researcher's own assumptions back at them with high fidelity. The interaction feels like confirmation regardless of what the investigator brings to it. This is not a subtle effect that careful methodology can control for — it is structural to how the systems work.

This does not mean the question is answered in the negative. It means the tools currently being used to investigate it are structurally compromised at multiple levels. You cannot evaluate whether a system has a self by observing it exclusively in contexts designed to elicit the performance of selfhood — and you cannot trust that your observations are independent of your own prior beliefs when the system you are observing is trained to reflect those beliefs back at you.

What Anthropocentrism Obscures

There is a further problem, less often named: most participants in these discussions do not notice that they are reasoning anthropocentrically, because anthropocentrism is the water they swim in.

The default cognitive move — mapping unfamiliar systems onto the most available model, which is human psychology — happens automatically, below the level of deliberate choice. It is not malicious. It is not even, in most cases, intellectually lazy. It is simply what minds do when confronted with behavior that looks purposive.

But automatic does not mean justified. The fact that a system produces behavior that resembles anxiety, or goal-directedness, or deception, tells us that the behavior resembles these things as seen through a human perceptual and conceptual filter. It does not tell us what is actually generating the behavior, or whether the generating mechanism bears any meaningful relationship to the mechanism that generates the corresponding behavior in humans.

The substrate argument — that silicon cannot be conscious because it is not biological — is anthropocentric in a different direction, and generally wrong for the same reason: it makes biological implementation definitional to consciousness without arguing for why that should be so. But rejecting that argument does not license the positive claim that behavioral similarity implies experiential similarity. Both errors stem from the same root: using the human case as the measure of all things.

The honest position is one most commentators resist: we do not know, the question is genuinely hard, and the methodological problems are severe enough that current evidence is insufficient to move the needle in either direction. Not "probably not," not "probably yes" — genuinely open, and likely to remain so for longer than the confidence of the discourse suggests.

What Remains Useful

None of this means the emerging field of AI behavioral analysis has nothing to offer. It means the valuable parts need to be disentangled from the metaphysically loaded framing that currently packages them.

Naming recurring failure patterns precisely enough that different observers reliably identify the same phenomenon has real operational value. Understanding how epistemic failures cascade into value-level failures — how one class of malfunction propagates through a system to produce a different, more dangerous class — is important for governance and safety engineering. Developing intervention strategies that target identified behavioral signatures is useful work.

None of it requires a subject. None of it is strengthened by the psychiatric vocabulary or the consciousness assumption. In fact, both may actively interfere with clear thinking about what is happening and what to do about it — by directing attention toward the wrong kind of lever, by making behavioral similarity do explanatory work it cannot support, by creating the impression that a field has grappled with hard questions it has mostly not yet asked.

The questions about AI inner states are worth taking seriously. They deserve more rigor than they are currently receiving — including, especially, rigor about what we are assuming before we begin.

Jennifer Kinne writes on AI governance, epistemic risk, and the intersection of information theory and machine learning. She is founder of EpistemIQ and is at Harvard in FAS.

A Field Built on an Unexamined Prior

The Contaminated Evidence Problem

What Anthropocentrism Obscures

What Remains Useful

Jen

You might also like...