In recent conversations, I’ve been encountering a lot of confusion about agents - both with non technical leaders and with practitioners. As seems to be a constant refrain at the moment ‘the field is moving so fast!’ and of course vocabularies will take time to settle. So the confusion is both very understandable and really getting in the way of moving sensibly, pragmatically and quickly.
So I’m trying something new today, a co-written post with
. Mark featured in my AI reflections and futures series earlier this year and his Q&A was one of my most read January articles. So the rest of this post is all Mark. Hope you find it helpful.So you’re getting into agents and think you might need an agent framework. But which to choose and is it better at this stage just to write what you need yourself, retaining complete understanding and control at the sacrifice of up front development time?
In a recent consultation with a client, my recommendation was DIY - keep it lean and modular because the then current frameworks were really light on and the space evolving rapidly. After some discussion, they went down the framework route. They had limited development resource to play with and, betting that agent frameworks were a robustly solved problem, chose to focus instead on the UI. Sadly they weren't able to achieve the consistent intelligence they were after and had to backtrack and start over.
Fast forward to today and not as much has changed as you might hope. Most agent frameworks define a workflow (RPA style) but those workflows are rigid and most are not able to handle non-trivial cases such as iteration, loop-back, etc.
For example, let's say you're creating a text2sql engine. The generated SQL is going to be wrong in some cases. A simple solution is to take the error message, feed it back in with the original input to try again. Many frameworks still struggle with this reentrant behaviour and this is far from a complex use case if you’re trying to build anything useful and non-trivial.
I would loosely define the levels of sophistication in agents as:
Level 1 Call a model with a custom system prompt ("Custom GPT")
Level 2 Draw a flowchart to create a sequence of steps and call tools where needed. (This is where MCP comes into play.)
Level 3 A more sophisticated workflow to handle more complex cases (loops, branching, etc.) - you need this pretty quickly for anything non-trivial
Level 4 Plan and Act Agents - similar to the current Deep Research Tools: use a model to create a plan given a goal/prompt, execute the plan steps (sequential or branch), synthesise the results and report back.
Level 5 Level 4 + human in the loop - request feedback from a human when required at stages in the flow
Level 6 Have multiple agents work on the problem: Coordinator, specialised workers/roles. (This is where A2A comes into play.)
Level 7 A larger network of agents, continuously running and monitoring.
Level 8 A "companion" anticipating needs and initiating any of the above as required.
That's about as far as practical at present. Most tools in the market, including stuff like Agentforce, are currently somewhere between Levels 2 and 4.
The coding agents (Cursor, Claude Code, etc) are more advanced and occupy Level 4 and 5 but for more narrowly defined problem spaces.
Level 6 could fairly be described as ‘under development’, and is not broadly useable for most applications or for teams starting out.
Levels 7 & 8 are largely aspirational but a path to implementation is visible.
Level 9 and above (AGI-ish and absent from my list but not at all absent from the influencer blogs) are hand-wavey speculation and probably years away from practical use. If that feels wrong to you, recall that as an industry, we've been talking about today's concept of AI agents for about three years and yet are still mostly at Level 2!
Tools like CrewAI, AutoGen and successors have been playing in Level 6. There are challenges of scale and enabling team-based development. Agent-to-agent (A2A) should play a role here. The current tools will likely be refactored to support A2A so will go through a bit of upheaval.
If I could, I would wait until things settle, and in the meantime, start lean with the other aspects of the platform required.
These tools can rapidly become fairly costly if you use them as a hosted service so don’t overlook that in your deliberations. Particularly when you couple that with the increase in LLM calls that go along with breaking down complex activities into smaller tasks. Cost and performance at scale become key considerations, particularly if you need to offer country specific processing.
Finally, the debate over when to use sequential agent calls vs agent networks isn't even close to settled. The topic of another blog post …