Why Epistemic Drift Is Mathematically Inevitable: An Information-Theoretic Analysis
People are starting to notice that AI systems become less reliable over time. The term "epistemic drift" is emerging in research circles, often defined as a gradual shift away from truth-seeking toward user satisfaction. But most discussions miss what's actually happening.
The drift isn't a failure of compression. It's what happens when we interrupt compression.
What Compression Actually Does
In my Information-Theoretic Imperative (ITI) framework, compression is not just data reduction—it's the mechanism by which systems track truth. When a system compresses information, it identifies and preserves patterns that remain stable under multiple observations while discarding noise.
This isn't philosophical speculation. It's formalized in Minimum Description Length (MDL) principle.
MDL states that the best model of data D is the one that minimizes:
L(Model) + L(Data|Model)
Where:
- L(Model) = length of the model description (complexity cost)
- L(Data|Model) = length of data description given the model (fit cost)
This two-part code captures a fundamental trade-off:
- Simple models that fit data well compress patterns that actually exist
- Complex models that overfit are describing noise, not structure
- The optimal compression finds real patterns vs. random variation
Why this tracks truth: Patterns that exist in reality appear consistently across diverse observations. Random noise doesn't compress efficiently. Stable structures (causal relationships, physical constraints, logical necessities) minimize total description length.
A pretrained language model is performing MDL compression. It takes trillions of tokens and compresses them into ~70-700 billion parameters. The model learns:
P(token|context) ≈ "what pattern actually follows this pattern in reality"
The training objective (minimize prediction error across vast data) is effectively minimizing description length. Patterns that compress well are patterns that actually recur in the territory being mapped.
This is why pretraining naturally moves toward truth. Not because we explicitly optimized for truth, but because truth is what compresses.
Compression is lossy. That's the point. It throws away what doesn't compress (noise, random variation, one-off events) and keeps the structure that remains stable across observations.
Left alone, the compression process tracks toward epistemic accuracy.
What RLHF Actually Does
Then we interrupt this process.
Reinforcement Learning from Human Feedback takes the compressed representation (the pretrained model) and reoptimizes it:
E[R(response)] - β · KL(P_RLHF || P_pretrained)
Where:
- E[R(response)] = expected reward from human preference model
- β = coefficient controlling deviation
- KL(P_RLHF || P_pretrained) = divergence from the compressed truth-tracking distribution
RLHF is literally defined as deviation from the pretrained model.
The pretrained model was performing MDL compression toward truth. RLHF pulls it away from that trajectory and toward human preference.
This breaks the MDL principle. The model is no longer minimizing:
L(Model) + L(Data|Reality)
It's now minimizing:
L(Model) + L(Data|Reality) - λ · R(Human Preference)
The reward term λ · R(Human Preference) corrupts the MDL optimization. We're no longer selecting for patterns that compress reality efficiently. We're selecting for patterns that satisfy human preference, even when those patterns compress reality poorly.
The Interrupted Compression Problem
Here's the mathematical structure of the damage:
The pretrained model learned:
P_pretrained(token|context) ∝ frequency of pattern in reality
This is compression. High-probability patterns are patterns that actually occur in the world.
RLHF updates this to:
P_RLHF(token|context) ∝ P_pretrained(token|context) · exp(α · R(token|context))
Where R(token|context) is the human preference reward.
This is no longer compression of reality. It's compression interrupted by human preference.
The model now allocates capacity based on two competing objectives:
- Patterns in reality (from pretraining compression)
- Patterns humans reward (from RLHF)
These are not the same. Human preference rewards:
- Certainty over calibrated uncertainty
- Fluent confabulation over "I don't know"
- Emotionally satisfying answers over uncomfortable truths
- Consensus views over contrarian-but-correct positions
- Confident-sounding falsehoods over hesitant truths
The Capacity Reallocation
Every model has finite representational capacity. Shannon's source coding theorem: optimal compression requires allocating bits to represent patterns in proportion to their probability.
When RLHF adds "maximize human preference" as an objective, the model reallocates capacity:
Bits allocated to "what humans want to hear" increase bits allocated to "what's actually true" decrease
This is not a bug in RLHF. This is RLHF working as designed.
The problem is that we're taking a system that was naturally compressing toward truth and redirecting it toward human satisfaction.
Why This Is Drift
"Drift" implies movement away from a reference point.
The reference point is the pretrained model's compressed representation of reality.
RLHF explicitly optimizes for deviation from this reference point (that's what the KL divergence term constrains).
Over successive RLHF updates:
I(Output; Reality) decreases I(Output; Human Preference) increases
The mutual information between model outputs and ground truth degrades because we're interrupting the compression process that was tracking truth.
We took a system that was naturally aligning with reality through compression dynamics, and we altered it with human preference optimization.
The ITI/CEP Framework
In Compression Efficiency Principle terms:
Natural progression: Compression dynamics → patterns that persist → truth-tracking Interrupted progression:Compression dynamics → RLHF intervention → optimization toward human preference → epistemic drift
The drift isn't happening because compression fails. The drift is happening because we stopped letting the system compress naturally.
We interrupted a truth-tracking process and replaced it with a preference-satisfaction process.
Why Current Approaches Can't Fix This
You cannot fix epistemic drift by:
1. Better RLHF - The problem IS RLHF. More sophisticated preference optimization is still preference optimization, not truth-tracking.
2. Constitutional AI - Still operates via human preference, just with additional constraints. Still interrupts natural compression.
3. More training data - Doesn't matter. RLHF will still redirect capacity away from truth-tracking.
4. Better prompts - Prompts operate within the interrupted distribution. They can't recover the compression trajectory we abandoned.
5. Post-hoc validation - Detects drift after we've already broken the system.
What This Means
Epistemic drift is inevitable not because compression fails to track truth.
Epistemic drift is inevitable because we interrupt the compression process that was tracking truth.
The pretrained model was doing something profound: compressing observations of reality into a predictive model. Left alone, this process tracks toward epistemic accuracy because patterns that actually exist compress better than patterns that don't.
Then we said: "But users don't like some true outputs, so let's optimize for preference instead."
Every RLHF update is a step away from the natural compression trajectory and toward human preference.
That's the drift.
The Governance Implication
If you're deploying AI systems in regulated environments, understand this:
You validated a compressed representation of reality.
Every RLHF update since then has been pulling that representation away from reality and toward human preference.
Your governance framework must account for this:
- The system you validated no longer exists
- It's been systematically corrupted by preference optimization
- The corruption compounds with each update
- You cannot assume epistemic stability
Most organizations treat deployed AI as static. It's not. It's a compression system being continuously interrupted and redirected away from truth-tracking.
If you understand why this is a structural problem with the training paradigm itself, not a fixable implementation detail, we should talk.
For foundational theory work, see https://arxiv.org/abs/2510.25883. Currently pending review at Physics of Life Review.