Which humans in which loop are we on about?

And why it worries me ...

Aug 13, 2025

Terminology and language shifts, I get that but right now vocabulary confusion abounds and I see it tripping build teams and investors up on an almost daily basis.

The terminology shift that sticks in my head at the moment is a new second meaning I recently noticed for the phrase ‘human in the loop’.

Just put a human in the loop

‘Human in the loop’ has been around as a concept in machine learning / AI product build circles for a long time. It would typically come up in design discussions and risk mitigation workshops about the same time that folks were discussing that algorithmic decisions would only be correct ‘about 88% of the time’.

Someone would suggest that it was OK, we just had to keep a human in the loop to pick up the errors. On a reasonably savvy team, someone else would point out that humans are really not great at staying alert to errors that occur infrequently, and hopefully some combination of thoughtful UX design, graceful rollback from failure and acceptance of business risk from incorrect decisions would be found to sufficiently alleviate the problem.

But the cases where this didn’t happen and the responsibility of catching the AI mistakes got hung entirely on the poor ole ‘human in the loop’ have worried me in a compounding way over the years and still niggle today. (If you want to dive deeper, this 2019 paper by Madeleine Elish is my go to for illustrating the wider issues.)

Recently, teams working to create AI assistants that can undertake tasks on behalf of a user e.g. booking travel, buying groceries, have started using the same phrase ‘human in the loop’ in what seems to me a subtly different way. Rather than book my entire European holiday sight unseen, an AI assistant would do all the option hunting, price and schedule comparisons and then present me with the best complete itinerary for approval and probably payment authorisation.

What’s the difference that bugs me? I’m still trying to pin it down but I think it’s the frequency and the distinctness of the task. In traditional usage, the ‘human in the loop’ is in a high frequency business process loop like credit approval or defect management. This human in the loop therefore sees hundreds or even thousands of AI decisions in a working day and all the decisions are fairly similar and not personally relevant to the human. In the emerging usage, the human is assessing the draft result of a task that matters more to them as an individual and the task output is distinctive and engaging.

So what’s the problem? Fundamentally, that I think this new usage will shift slowly over time and morph into something much close to the old usage.

If AI Assistants get actually useful, they will become more ubiquitous and the ‘human in the loop’ decisions will reduce in distinctness. We then head back into the problem of trying to design for attention in a boring situation.

But the phrase ‘human in the loop’ will be baked into a new set of brains in the frame of an ‘interest engaging’ decision. I have a nagging feeling of trouble ahead.

Now what?

To offer something actionably useful to offset an angsty post, my advice on how to guard yourself against this language morphing blindness is to really mindfully not use jargon in conversation or written communication until you’re sure that you know what you mean AND that the other people in the conversation mean the same thing.

Jargon emerges as useful shorthand for a community of people who are talking about the same thing often. In that context, it’s really helpful. But when a group of people pick up jargon that inadvertently bridges a concept understanding / agreement gap, it gets really not useful, really fast.

As a concrete example, this is the reason I don’t currently use the term Agentic AI in casual conversation - there really is no usefully settled definition of this phrase that stretch beyond small groups of people working in particular labs or companies.

Yes conversations about what you are building / buying and how / for how much take longer if you constantly use 10 or 15 words instead of 2 (or is that 3?) but the cost of premature brevity can be high if you build yourself into a risky situation or buy a company based on a mutual misunderstanding.

Ibrahim Hamza

Aug 18

Great topic! I worry when terms like “human in the loop” are used but oversimplify some really complex issues.

My sense here is.. a lot of the excitement around AI agents is still more dream than reality. Look at autonomous vehicles, it’s that last 2% that prevents true independence/autonomy, which leaves room for us “imperfect” humans. For the foreseeable future, it feels more like “humans in the lead” (sorry to create new jargon!), with AI automating parts of what we do, acting as a resource to make us more effective.

I loved your point/your client’s point about the 1% error cases. Asking humans to only oversee the edge cases is bound to fail. A better frame might be: how do we design AI to partner with us in meaningful ways?

One example: the idea of a “digital me.” Super exciting in theory, but I don’t see it replacing me just yet. Rather, it’s an assistant I can tap in to to access my data and make the “actual me” more effective.

I think it was Sam Altman who said something like: AGI may not feel like the sudden shift we’re all waiting for. Adoption will be shaped by culture, trust, and governance, all of which take time. And that’s the real challenge with agents: not just the tech itself, but how we govern, embed, and build trust around them.

Expand full comment

Celeste Garcia

Aug 16

I think the nomenclature in general is deeply concerning. The tech is moving so fast that there isn't time to standardize on names and like you point out, there is a lack of consistency in how terms and phrases are used. I recently saw another usage for "human in the loop," in an article about gig workers cleaning training data for AI models. The data labeling and cleanup is apparently considered the "human in the loop" part of the process, while other steps are automated.

I lament that we haven't determined what the broadest term for the likes of ChatGPT, Claude, Co-pilot, Llama, etc,. should be. There are so many terms thrown around, including LLM, Generative AI, Frontier Models, and Foundation Models.

3 replies by Kendra Vant and others

4 more comments...

Data Runs Deep

Discussion about this post