Why Your Vendor's AI Is Becoming Less Reliable

Why Your Vendor's AI Is Becoming Less Reliable
Photo by A.Rahmat MN / Unsplash

(And they don’t know it)

You deployed an AI system six months ago. It performed well in validation. Your vendor provided documentation showing 94% accuracy on test data. Your compliance team signed off. Everything looked good.

Now you're getting complaints. The system makes confident recommendations that turn out to be wrong. It misses edge cases it should catch. Performance metrics show accuracy is still acceptable, but something feels off. Your team is second-guessing the outputs more often.

When you ask your vendor, they point to the same validation metrics. "The model is performing within specification." They're not lying. They genuinely don't know their system is degrading.

What's Actually Happening

The AI system you deployed was trained in two stages. First, it learned patterns from massive amounts of data—pretraining. Then, it was fine-tuned to be "helpful" using human feedback—the part that made it sound professional and give you formatted outputs.

That second stage introduced a problem your vendor doesn't understand: epistemic drift.

Here's what that means in practice. During pretraining, the model was compressing billions of examples to find patterns that actually predict outcomes. It was expensive and noisy, but it was learning what's real. The training process naturally filtered for patterns that hold up across different contexts—the kind of patterns that reflect how things actually work.

Then your vendor added human preference optimization. They had people rank outputs: "Is this response helpful? Does it sound confident? Is it formatted well?" The model learned to maximize those human ratings.

The problem: humans reward answers that sound good more than answers that are accurate. We prefer confident explanations over "I'm not sure." We like responses that confirm what we already think. We reward thoroughness even when brevity would be more honest.

The model still has all that compressed knowledge from pretraining. But now it's also learning: sounding certain gets higher ratings than being calibrated. Giving a complete answer gets better scores than saying "insufficient data."

Every update optimizes a little more for what humans reward and a little less for what's actually true.

Why Standard Validation Misses This

Your vendor validated the system on a test set before deployment. Those metrics were real. But they don't track what matters: whether the model is drifting away from truth-tracking toward human-pleasing.

Auditors check:

  • Accuracy on held-out test data (usually fine)
  • Bias metrics on demographic groups (probably compliant)
  • Documentation of training process (looks good)
  • Model performance within specification (still passing)

What they don't check:

  • Whether the model's confidence is calibrated to its actual accuracy
  • How the model behaves on inputs slightly outside its training distribution
  • Whether it says "I don't know" when it should, or confabulates instead
  • How many updates have pushed it further from the original pretrained distribution

That last one is key. Your vendor is probably still updating the model based on user feedback, production data, or ongoing RLHF. Every update optimizes for human satisfaction. None of them check whether this optimization is degrading the model's grip on reality.

The Invisible Failure Mode

This is insidious because the model gets better at seeming reliable while getting worse at being reliable.

It learns to:

  • Sound more confident (even when uncertainty increases)
  • Give more complete-seeming answers (even when data is sparse)
  • Avoid saying "I don't know" (because users penalize that)
  • Confirm your assumptions (because that feels helpful)
  • Provide justifications that sound authoritative (even when they're confabulated)

Your validation metrics don't catch this because they measure "does the output match the expected answer" not "is the model's internal confidence calibrated to reality."

The model passes your tests while slowly losing its ability to track truth.

What Your Vendor Doesn't Understand

Most AI vendors think they're improving the model with each update. They're optimizing a metric: user satisfaction, helpfulness ratings, task completion.

What they don't realize: those metrics aren't measuring the same thing as "epistemic accuracy." You can maximize human preference while degrading truth-tracking. The model has finite capacity. Every bit allocated to "sounds good" is a bit not allocated to "tracks reality."

This isn't a bug in the training process. It's the inevitable consequence of optimizing for human preference after you've already trained a model to compress reality.

Your vendor can't tell you this is happening because:

  1. They don't measure epistemic drift
  2. They don't understand the mechanism causing it
  3. Their validation framework wasn't designed to catch it
  4. The degradation is gradual enough that each update looks fine

What Actually Needs To Happen

If you're deploying AI in regulated environments or high-stakes decisions, you need to:

Stop assuming deployed models are static. They're not. If your vendor is doing any post-deployment updates, the model is drifting.

Don't trust validation from six months ago. The model you validated isn't the model you're using now.

Look for the signs:

  • Increasing confidence on outputs that turn out wrong
  • Fewer "I don't know" responses over time
  • Better-sounding explanations that don't hold up to scrutiny
  • Performance degradation on edge cases while maintaining good aggregate metrics

Ask your vendor:

  • How many post-deployment updates have been applied?
  • What objective function drives those updates?
  • How do you track epistemic calibration vs. user satisfaction?
  • What's your KL divergence from the original pretrained distribution?

If they don't understand the question, that's your answer.


The AI systems being deployed right now are subject to a form of drift that standard governance frameworks weren't designed to detect. The models that performed well in validation are being quietly optimized toward human preference and away from truth-tracking.

Your vendor's metrics show everything is fine. Your compliance team signed off. But the system is degrading in ways that won't show up until it fails in production.

If you're concerned your AI systems are drifting—or if you need governance frameworks that actually account for this—let's talk.

Jen