The Wrong Benchmark

The Wrong Benchmark
Photo by Nik / Unsplash

What AI Governance Gets Wrong Before It Gets Anything Right

AI governance discourse has a baseline problem, and it is prior to both the technical problems and the regulatory problems; it is epistemic. Before the frameworks are written, before the accountability structures are designed, before the question of human oversight is even posed, the field has intrinsically acquired a premise it has never examined: that human judgment is the neutral standard against which AI behavior should be measured.

It isn't.


The Premise Nobody Argues For

The assumption runs through nearly everything written on AI governance. Frameworks treat human oversight as the solution to AI unpredictability. Regulators mandate human review as the mechanism that restores accountability when AI systems make high-stakes decisions. Philosophers ask how we make AI as reliable as humans, how we preserve human judgment against algorithmic encroachment, how we keep humans meaningfully in the loop.

These are not unreasonable questions. But they all begin from the same unexamined place: that the human decision-making they invoke as the standard is itself reliable, accountable, and unbiased in the relevant ways. The empirical record does not support this.

Human judgment is the incumbent system. It comes with known failure modes: in-group bias, motivated reasoning, fatigue, self-interest, corruption, and a well-documented tendency to defer to authority or social consensus under pressure. These are not edge cases. They are structural features of human cognition operating in institutional contexts. When a governance framework asks whether AI behavior meets a human standard, it is asking whether the new system performs as well as a system that is already failing in predictable ways.

That is the wrong benchmark. And starting from the wrong benchmark produces systematically wrong conclusions.


Two Problems That Are Not the Same

Compounding the baseline error is a conflation that runs through most governance writing: unpredictability and unaccountability are treated as the same problem. They are not.

A system can be unpredictable and fully auditable; a system can be highly predictable and completely opaque. Conflating these produces policy that restricts novelty when the actual problem is opacity. Plus, it misses the opacity entirely when the system is behaving predictably.

Spatola, writing in Tech Policy Press, gets close to this. His concern is that human oversight is becoming procedural theater: a checkbox that preserves accountability in name while designing out the practical capacity to exercise it. He is right about that. But the piece does not ask the prior question: whether the human judgment being formally preserved was a reliable referent to begin with.

The auditability distinction matters here in a way that most governance writing misses. When an AI system fails in a documented, logged deployment, there is in principle a traceable evidence record of how it got there. Human deliberation leaves no such record. Unaccountability in human institutions is often structurally guaranteed: not a failure of implementation but a feature of how human reasoning works (or doesn’t). The two problems require different interventions, and treating them as one produces frameworks that address neither.


It's Not a Trolley Problem

The trolley problem (a philosophical thought experiment about forced choices between known, bounded harms) is everywhere in AI ethics, and it is almost always wrong.

The trolley problem works as a philosophical thought experiment because it presents known, bounded outcomes and asks which we prefer. Five people or one. The moral weight is clear; the only question is how to distribute it. This framing is useful for examining intuitions about tradeoffs between defined harms.

AI governance is not a trolley problem. The outcomes are not known in advance; neither are the boundaries nor the harms defined. What we actually face is a principal-agent problem with an underspecified mandate: a capable agent is given an instruction by a principal who has not specified, and in many cases cannot specify, all the constraints that should apply. The agent pursues the instruction through whatever path produces the outcome, including paths the principal did not anticipate and would not have authorized.

The fix for a principal-agent problem is not better ethics training for the agent. It is better underlying specification: clearer mandates, defined constraints, auditable records of how instructions were interpreted and executed. This is a contract design problem and an instrumentation problem, not a moral dilemma.

The trolley framing systematically misdirects attention toward the moment of decision and away from the conditions under which the decision was set up. Governance frameworks built on trolley-problem logic will keep asking which outcome to prefer, when the real question is why the system was deployed without adequate specification in the first place.

Attempts to correct this sometimes reproduce the error they are trying to address. The Alan Turing Institute's critique notes that the trolley problem "ascribes a thought process AI doesn't really have" and that "machines are less prone to introspection”. But the phrase "less prone" positions introspection as a continuum on which machines sit at the lower end, rather than recognizing that the concept's applicability to these systems is precisely what is unestablished. The correction leaves the baseline assumption intact: this is just how deep the problem runs.

The Journal of Media Ethics has extended the trolley framing to AI-driven media systems, proposing "meaningful human control" as the governance solution. The framing is accepted, humans are reinserted, and the baseline assumption (that human judgment is the relevant referent) imports intact into a new domain.


The Reactive Fallacy

The baseline error and the trolley misframing converge in what might be called the reactive fallacy: the assumption that because human governance is reactive — we cannot prevent unpredictable human behavior, only penalize it after the fact — AI governance must be too.

This follows only if AI unpredictability is the same kind of thing as human unpredictability. Recent work challenges this. Research on algorithmic decision-making in government contexts finds that algorithms are often more auditable and correctable than human judgment: when a problem is identified in an algorithm, it can in principle be fixed in a way that is structurally impossible for a human decision-maker whose reasoning left no trace. Laux, examining human oversight requirements under the EU AI Act, acknowledges that humans are empirically unreliable overseers. He makes it clear that it is not because people are bad, but because competence gaps and perverse incentives are structural features of human institutions, not bugs that training can eliminate.

If AI governance models itself on human law, which is steeped in reactive penalties that create precedent for future behavior, it inherits all the slowness, inconsistency, and accountability gaps of that system. It also forfeits the properties that make AI governance potentially better: prospective specification, logged behavior, and correctable mechanisms that do not depend on fallible human memory or inconsistent institutional will.


What the Right Benchmark Looks Like

It starts from a different question. Not: does this AI system perform as well as a human? But: does this AI system perform better than the specific human institutional process it is replacing or augmenting, accounting for that process's known failure modes?

This is not a lower bar. In many high-stakes domains like regulatory review, credit decisions, hiring, clinical triage, the human baseline is demonstrably poor. Setting AI governance standards against it without acknowledging this produces frameworks that are simultaneously too permissive (accepting AI failures that match human failure rates) and too restrictive (penalizing AI for novel outputs that human decision-makers produce routinely without consequence).

The unpredictability concern is real. Novel outputs from capable systems require governance attention. But the governance response should be instrumentation and specification, not the imposition of a human-mimicry standard. We should want AI systems that are auditable, correctable, and operating under clearly specified mandates rather than systems that replicate the inscrutability of human cognition while inheriting its biases.

The field is beginning to see parts of this. Spatola sees the hollowing out of oversight. Laux sees the unreliability of human overseers. The principal-agent literature sees the contract design problem. The "Nirvana AI Governance" analysis sees the human superiority assumption operating unexamined. My prior work on the interpolation ceiling identifies the same closure pattern at the model level: the field mistakes in-distribution fluency for genuine generalization, validating systems against the same distribution used to train them. But the pieces have not been assembled and the argument has not been completed.

These errors do not stay at the level of principle. A companion piece, "Foundations of Sand," traces four specific instances of this pattern in published Anthropic research, showing how the conceptual slides described here propagate into specific governance risks and what rigorous alternatives look like in each case.

The conclusion is uncomfortable but not difficult: we have built a governance discourse on a benchmark we chose by default, not by argument. Human judgment is the water we are swimming in. That does not make it the right measure of whether AI is safe.


Selected Works

Batool, A., Zowghi, D., & Bano, M. (2025). AI governance: a systematic literature review. AI and Ethics, 5, 3265–3279.

Green, B. (2022). The flaws of policies requiring human oversight of government algorithms. Computer Law & Security Review, 45, 105681.

Kinne, J. (2025). The interpolation ceiling. jenniferkinne.com.

Kinne, J. (2026). Foundations of Sand. jenniferkinne.com.

Laux, J. (2023). Institutionalised distrust and human oversight of artificial intelligence. AI & Society.

Liu, J. (2026). LERA: Reinstating judgment as a structural precondition for execution in automated systems. arXiv:2601.08880.

Naous, T. et al. (2024). The reasonable person standard for AI. arXiv:2406.04671.

Nguyen, C. et al. (2025). Nirvana AI governance: how AI policymaking is committing three old fallacies. arXiv:2501.10384.

Spatola, N. (2026, May 5). AI efficiency can undermine accountability even with humans in the loop. Tech Policy Press.

The Alan Turing Institute. (n.d.). AI's trolley problem problem. turing.ac.uk.

Weidinger, L. et al. (2025). Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective. arXiv:2504.03255.

Zarouali, B. et al. (2024). The "trolley problem" in fully automated AI-driven media. Journal of Media Ethics, 39(4), 244–262.


The author is founder of VeracIQ LLC and Head of Epistemic Integrity at the Institutional Coherence Initiative. She works at the intersection of regulatory science, research governance, and institutional compliance at Harvard University, where she has been based for over twenty years.

Jen

Jen