Essay 5
The Loop We Built
How human bias becomes AI bias becomes new human bias — and what to do about it.
Here is a scene that has become common enough in the AI-safety literature to be its own pattern. A senior radiologist in a teaching hospital notices something unsettling about the AI system her department has deployed to flag urgent findings on chest X-rays. The system is good. On clean test data it was better than the average resident. But after a few months in production, the radiologist notices that her own reading of borderline cases has drifted toward whatever the system says first. She looks at the image, forms an impression, sees the system’s flag, and, if the system disagrees with her, tends to revise her impression more often than she revises the system’s. Sometimes she is right. Sometimes the system is. Either way, her independent judgment, the thing the hospital is paying her for, has quietly become a kind of after-hours review of whatever the model has already concluded.
She isn’t being lazy. She is being human, in a specific and well-documented way that has a name in the literature: automation bias. When a calculation is offered alongside a snap human impression, the calculation tends to win, not because it’s better, but because it feels more authoritative, requires less effort, and looks bad to override. Pilots over-trust autopilots that are quietly drifting. Drivers over-trust GPS that’s quietly wrong. Reviewers over-trust models. It is a recurring story, well understood empirically, and Article 14(4)(b) of the European Union’s 2024 Artificial Intelligence Act is a recent attempt to legislate against it.
This essay is about the larger version of that pattern. The radiologist’s story is one snapshot of a system-level loop that has now been running, in production, in dozens of consequential domains, for long enough that we can see its shape. It starts with the kinds of judgment errors we’ve been talking about for four essays. It ends with those same errors at scale, in production, with regulatory weight, in places that don’t look anything like the cognitive science laboratory where they were first documented. The shape of the loop is not subtle.
The chain, link by link
Here is the path from a human cognitive shortcut to a deployed AI system that institutionalizes it. I’ll walk it in plain terms. None of this is speculative; every link has been documented in production systems in the last decade.
Link 1: data collection. Someone has to decide what data the model will learn from. They reach for what’s available. Availability is the oldest shortcut in the bias literature, and in the context of data collection it produces something the AI fairness field calls representation bias: the data overrepresents the populations and contexts that are easiest to collect from. The early datasets that trained much of modern computer vision were dominated by Western, English-speaking, internet-using populations, not because the engineers were prejudiced, but because that was the data sitting on the shelf.
Link 2: annotation and labeling. Once the data is collected, somebody has to mark it. Doctors label scans. Annotators label toxicity. Workers in low-cost-of-labor countries label thousands of ambiguous images per hour against a slim style guide. Wherever the labels are even slightly ambiguous, confirmation bias, the brain’s preference for evidence that confirms what it already believes, produces what the field calls label bias. Toxicity labels reflect the labeler’s social context. Pain ratings reflect what the labeler thinks the patient “should” be feeling. Job-applicant scores reflect whatever the labeler considers a fit (a word doing a lot of work) for the role.
Link 3: feature engineering and proxies. Models can’t see protected attributes like race directly, and in many jurisdictions they aren’t allowed to. So engineers, working from their own experience of what correlates with outcomes, often choose proxy features. Zip code, in the United States, correlates strongly with race because of the country’s history. School name correlates with class. Name correlates with gender and ethnicity. These aren’t usually chosen with discrimination in mind; they’re chosen because they have predictive power. The predictive power comes precisely from the social patterns the engineer is trying to abstract away from. The result is proxy discrimination: the model produces a demographic classifier without ever being told demographics.
Link 4: deployment and over-reliance. This is where the radiologist’s story lives. The model gets put in front of a person whose job is, on paper, to use it as one input among many. In practice, automation bias means the person tends to defer to the model. The model becomes the authoritative answer rather than a second opinion. Real disagreements get reframed as “the model probably knows something I don’t.” When the model is right, this is harmless or helpful. When the model is wrong, the human oversight that was supposed to catch the error has been smoothly converted into a rubber stamp.
Link 5: the new data. The decisions that get made (informed by the model, with the human nominally overseeing) become the next generation of training data. A model that flagged certain candidates as risky last year produces, this year, a dataset in which those candidates received fewer offers, which the next model learns from as “these candidates were not hired”: a fact about the world that confirms the prior bias. This is the part of the loop that earns the name. The bias from year one becomes the ground truth of year two.
And then the loop runs again.
The canonical case
If you want one concrete example to anchor the whole pattern, the one to know is Gender Shades. In 2018, two researchers, Joy Buolamwini (then at the MIT Media Lab) and Timnit Gebru, published a paper testing the major commercial facial-recognition systems of the time on a curated dataset of human faces, balanced by skin tone and gender. The result was difficult to argue with. On lighter-skinned men, the systems had error rates of about 0.8%, essentially perfect. On darker-skinned women, error rates ranged up to 34.7%. That’s a roughly 43-fold disparity in error, on the same kind of task, between two demographic groups.
The paper became one of the most-cited works in AI fairness, partly because the disparity is so large that no amount of statistical hedging can wave it away, and partly because the cause maps so cleanly onto Link 1 of the chain above. The systems hadn’t been designed to misidentify darker-skinned women. They had been trained on datasets that overwhelmingly contained lighter-skinned men. The accuracy gap was a downstream signature of an upstream availability heuristic: engineers reaching for the data on the shelf, not noticing whose faces weren’t in it.
After Gender Shades, the affected vendors retrained their systems on more balanced data and the disparities shrank dramatically. Which is the good news. The less good news is that the same pattern shows up, in less photogenic form, across loan approval, hiring, medical risk scoring, language translation, content moderation, and a growing list of other domains. The mechanism is general. The fix is harder than just adding balanced data, because the loop runs through institutions, not just datasets.
The human-oversight paradox
The most regulated version of Link 4 in the chain is now Article 14 of the European Union’s AI Act, which entered into force in 2024 and requires that high-risk AI systems be designed so that they can be “effectively overseen by natural persons” while in use. Article 14(4)(b) specifically asks deployers to “remain aware in particular of the possible tendency of automatically relying or over-relying on the output produced by a high-risk AI system (automation bias).” It is a piece of regulation that takes the empirical literature on automation bias seriously, which is rarer than it should be.
The interesting empirical question, the one we have only partial answers to, is whether the human oversight that regulation requires actually corrects errors in practice, or whether it adds a new kind of error on top.
Work by Ben Green and Yiling Chen on AI tools used by judges in U.S. pretrial detention decisions has produced something unsettling: not only did judges fail to consistently override the model on cases where they should have, but the patterns of override they did perform were themselves racially biased. The model was biased. The humans who oversaw the model were biased too, in ways the model wasn’t. The combined system was no fairer than either component alone, and was harder to audit because responsibility was now diffused between the algorithm and the overseer. The pattern, in some of the secondary writing on this material, gets called frictionless failure: the institutional safeguard is in place, the oversight is nominally happening, and the errors flow through anyway.
This is not an argument against human oversight of AI. It is an argument that oversight is itself a process that has to be designed, not an automatic property that emerges from putting a human’s name on the decision. The radiologist at the start of this essay was overseeing the AI. She was also being slowly trained by it. Both were happening at the same time and only one was on the org chart.
A recent finding worth your attention
One last beat to add to the chain, which is the most recent and probably the most under-discussed.
A study published in late 2025 looked at how the increasingly common use of large language models as everyday work assistants (for writing, coding, research, learning) affected people’s self-perception of their abilities. The setup was clean: participants completed tasks with and without AI assistance, were measured objectively on how much the assistance actually helped, and then were asked how much they thought it had helped.
The finding: people perceived their own performance improvements from using AI as roughly one third larger than the gains they actually achieved. The effect held across competence levels, which is its most interesting feature. The classic Dunning-Kruger pattern in self-assessment, where low performers overestimate themselves while high performers underestimate themselves, was flattened by AI use. Everybody got a uniform layer of confidence inflation. The mechanism, the authors suggest, is that AI assistance produces a fluent-feeling output quickly, and fluency is the brain’s main cue for “this is a good answer,” even when the answer is in fact wrong, or when most of the actual cognitive work was done by the machine.
Add this to the loop and the picture sharpens. The user of a model now over-trusts both the model and themselves. The combined system is more confident than either component would be in isolation. The errors that result feel, from the inside, like competent decisions reached after appropriate consultation. This is not science fiction. The data is from 2025. It is happening to most of us right now, in small ways, every time we ask a model for help with something we’re not sure of.
See the loop running
Below is an interactive visualization of the loop discussed in this essay. It runs through the five nodes (human, data, model, output, action) and shows, at each cycle, how a small initial bias gets compressed into training data, deployed back to humans, and re-introduced into the next round of data. Watch how the amplification meter on the right behaves over multiple cycles. You can step through manually, or let it run.
What this whole essay is and isn’t claiming
It is not claiming that AI is uniquely biased, or that the right response is to stop building it. The biases here are not biases of the machine. They are biases of the humans the machine was trained on, the humans who labeled its data, the humans who designed its features, and the humans who deploy and act on its outputs. The machine is faithful. That is what makes it dangerous.
It is not claiming that the European AI Act, or the NIST framework, or any specific governance regime is the answer. These are early, incomplete attempts to legislate around a problem we are still in the middle of mapping. Some of them will look prescient in ten years and some will look quaint. What they have in common, and what they have right, is the sociotechnical framing: the recognition that AI bias is not a software bug to be patched but a system property of how humans and machines decide together.
It is claiming that everything in the four previous essays (the difference between clear and obscured environments, the four problems your brain is always solving, the small interventions that actually help, the calibration practices for everyday life) now has an amplifier sitting next to it. The same gut feeling that produces the planning fallacy in one person’s renovation produces representation bias in a national hiring system. The same confirmation bias that makes a manager hire the wrong candidate makes a model encode the same pattern at scale. The same automation bias that lets a pilot over-trust the autopilot now lets a billion people over-trust the same model, in slightly different forms, in parallel.
There is a habit of mind I have been circling back to across these essays, sometimes through the modern empirical literature and sometimes through a Stoic sidebar. It is the habit of looking carefully at your own impressions before acting on them, not because the impressions are bad, but because they are not the territory. The thing that has changed in the last few years is that the impressions are no longer just yours. They are increasingly synthesized for you, very quickly, very fluently, by systems that learned to make them out of data drawn from people very much like you. The habit is the same. The stakes are different.
If you’ve read all five essays, the argument is now in front of you. Most popular writing on biases stops at the second one. Most popular writing on judgment stops at the fourth. The whole picture — the environment matters more than the cognitive style; the long list is shorter than it looks; what helps is mostly structural; everyone is forecasting all the time; and the whole pattern is now in the loop with our machines — is one continuous argument, and the way to use it isn’t to memorize any of it. It’s to notice, in your own life, which of these five frames is the one operating right now, and act accordingly.
If the bias explorer or the calibration trainer or the clear-obscured sandbox is useful to you, that’s the best outcome I could ask for. If they’re not, find the equivalents that are. The point isn’t the tools. The point is keeping the question — how sure am I? — alive long enough, in your day, that it has somewhere to land.