22 Jan 2026 5 min read Theory

Why Epistemic Drift Is Mathematically Inevitable: An Information-Theoretic Analysis

People are starting to notice that AI systems become less reliable over time. The term "epistemic drift" is emerging in research circles, often defined as a gradual shift away from truth-seeking toward user satisfaction. But most discussions miss what's actually happening.

The drift isn't a failure of compression. It's what happens when we interrupt compression.

What Compression Actually Does

In my Information-Theoretic Imperative (ITI) framework, compression is not just data reduction—it's the mechanism by which systems track truth. When a system compresses information, it identifies and preserves patterns that remain stable under multiple observations while discarding noise.

This isn't philosophical speculation. It's formalized in Minimum Description Length (MDL) principle.

MDL states that the best model of data D is the one that minimizes:

L(Model) + L(Data|Model)

Where:

L(Model) = length of the model description (complexity cost)
L(Data|Model) = length of data description given the model (fit cost)

This two-part code captures a fundamental trade-off:

Simple models that fit data well compress patterns that actually exist
Complex models that overfit are describing noise, not structure
The optimal compression finds real patterns vs. random variation

Why this tracks truth: Patterns that exist in reality appear consistently across diverse observations. Random noise doesn't compress efficiently. Stable structures (causal relationships, physical constraints, logical necessities) minimize total description length.

A pretrained language model is performing MDL compression. It takes trillions of tokens and compresses them into ~70-700 billion parameters. The model learns:

P(token|context) ≈ "what pattern actually follows this pattern in reality"

The training objective (minimize prediction error across vast data) is effectively minimizing description length. Patterns that compress well are patterns that actually recur in the territory being mapped.

This is why pretraining naturally moves toward truth. Not because we explicitly optimized for truth, but because truth is what compresses.

Compression is lossy. That's the point. It throws away what doesn't compress (noise, random variation, one-off events) and keeps the structure that remains stable across observations.

Left alone, the compression process tracks toward epistemic accuracy.

What RLHF Actually Does

Then we interrupt this process.

Reinforcement Learning from Human Feedback takes the compressed representation (the pretrained model) and reoptimizes it:

E[R(response)] - β · KL(P_RLHF || P_pretrained)

Where:

E[R(response)] = expected reward from human preference model
β = coefficient controlling deviation
KL(P_RLHF || P_pretrained) = divergence from the compressed truth-tracking distribution

RLHF is literally defined as deviation from the pretrained model.

The pretrained model was performing MDL compression toward truth. RLHF pulls it away from that trajectory and toward human preference.

This breaks the MDL principle. The model is no longer minimizing:

L(Model) + L(Data|Reality)

It's now minimizing:

L(Model) + L(Data|Reality) - λ · R(Human Preference)

The reward term λ · R(Human Preference) corrupts the MDL optimization. We're no longer selecting for patterns that compress reality efficiently. We're selecting for patterns that satisfy human preference, even when those patterns compress reality poorly.

The Interrupted Compression Problem

Here's the mathematical structure of the damage:

The pretrained model learned:

P_pretrained(token|context) ∝ frequency of pattern in reality

This is compression. High-probability patterns are patterns that actually occur in the world.

RLHF updates this to:

P_RLHF(token|context) ∝ P_pretrained(token|context) · exp(α · R(token|context))

Where R(token|context) is the human preference reward.

This is no longer compression of reality. It's compression interrupted by human preference.

The model now allocates capacity based on two competing objectives:

Patterns in reality (from pretraining compression)
Patterns humans reward (from RLHF)

These are not the same. Human preference rewards:

Certainty over calibrated uncertainty
Fluent confabulation over "I don't know"
Emotionally satisfying answers over uncomfortable truths
Consensus views over contrarian-but-correct positions
Confident-sounding falsehoods over hesitant truths

The Capacity Reallocation

Every model has finite representational capacity. Shannon's source coding theorem: optimal compression requires allocating bits to represent patterns in proportion to their probability.

When RLHF adds "maximize human preference" as an objective, the model reallocates capacity:

Bits allocated to "what humans want to hear" increase bits allocated to "what's actually true" decrease

This is not a bug in RLHF. This is RLHF working as designed.

The problem is that we're taking a system that was naturally compressing toward truth and redirecting it toward human satisfaction.

Why This Is Drift

"Drift" implies movement away from a reference point.

The reference point is the pretrained model's compressed representation of reality.

RLHF explicitly optimizes for deviation from this reference point (that's what the KL divergence term constrains).

Over successive RLHF updates:

I(Output; Reality) decreases I(Output; Human Preference) increases

The mutual information between model outputs and ground truth degrades because we're interrupting the compression process that was tracking truth.

We took a system that was naturally aligning with reality through compression dynamics, and we altered it with human preference optimization.

The ITI/CEP Framework

In Compression Efficiency Principle terms:

Natural progression: Compression dynamics → patterns that persist → truth-tracking Interrupted progression:Compression dynamics → RLHF intervention → optimization toward human preference → epistemic drift

The drift isn't happening because compression fails. The drift is happening because we stopped letting the system compress naturally.

We interrupted a truth-tracking process and replaced it with a preference-satisfaction process.

Why Current Approaches Can't Fix This

You cannot fix epistemic drift by:

1. Better RLHF - The problem IS RLHF. More sophisticated preference optimization is still preference optimization, not truth-tracking.

2. Constitutional AI - Still operates via human preference, just with additional constraints. Still interrupts natural compression.

3. More training data - Doesn't matter. RLHF will still redirect capacity away from truth-tracking.

4. Better prompts - Prompts operate within the interrupted distribution. They can't recover the compression trajectory we abandoned.

5. Post-hoc validation - Detects drift after we've already broken the system.

What This Means

Epistemic drift is inevitable not because compression fails to track truth.

Epistemic drift is inevitable because we interrupt the compression process that was tracking truth.

The pretrained model was doing something profound: compressing observations of reality into a predictive model. Left alone, this process tracks toward epistemic accuracy because patterns that actually exist compress better than patterns that don't.

Then we said: "But users don't like some true outputs, so let's optimize for preference instead."

Every RLHF update is a step away from the natural compression trajectory and toward human preference.

That's the drift.

The Governance Implication

If you're deploying AI systems in regulated environments, understand this:

You validated a compressed representation of reality.

Every RLHF update since then has been pulling that representation away from reality and toward human preference.

Your governance framework must account for this:

The system you validated no longer exists
It's been systematically corrupted by preference optimization
The corruption compounds with each update
You cannot assume epistemic stability

Most organizations treat deployed AI as static. It's not. It's a compression system being continuously interrupted and redirected away from truth-tracking.

If you understand why this is a structural problem with the training paradigm itself, not a fixable implementation detail, we should talk.

For foundational theory work, see https://arxiv.org/abs/2510.25883. Currently pending review at Physics of Life Review.