As we progress along our Agentic transformation journey, we’re building more complex workflows involving multi-agent collaboration.

This requires mapping out our existing business process workflows, and translating them into modern, flexible agentic architectures.

We decompose these workflows into distinct, manageable sub-tasks, which are then assigned to a dedicated Agent – an autonomous component designed for that specific function.

Going through this process, a few interesting learnings have bubbled to the surface:

Human in the Loop (HITL)

It’s critical to implement mechanisms that provide transparency into the behaviour of the agents as they progress through multi-step workflows.

This principle is called “Human in the Loop” (HITL), and allows a human to approve, reject, or make adjustments to the output of an Agent, before moving on to the next step in the workflow.

This enables real-time monitoring and course correction, and reinforces the principle that these agents are tools to be used in-service of human subject matter experts, not fully autonomous replacements for them.

This verification and feedback mechanism, along with human oversight, is essential to provide the confidence to deploy in production environments.

Emerging UX/UI for Ambient Agents

“Ambient Agents” are agents continuously running in the background, processing input, performing tasks, and providing output.

These swarms of “Agent Teams” running concurrently will unlock massive value, delivering process automation at a scale not previously possible. However, they still require a “Human in the Loop” feedback mechanism to provide transparency.

If humans needed to continuously monitor every agent under their control, we would quickly hit a scaling wall where we are bound by the limits of human attention. Instead, we need a scalable, asynchronous mechanism to enable this human feedback loop.

This has triggered experimentation around different UX approaches to satisfy these requirements.

We have been using the foundation provided by Agent Inbox, which uses the “inbox queue” metaphor that we are all familiar with.

https://github.com/langchain-ai/agent-inbox

A single winner hasn’t emerged, but it’s been fascinating to see different design patterns being invented.

Agent Evals

Evaluation, or “Evals”, is an essential practice to monitor the performance of your application over time, measuring improvements, and flagging regressions. Eval methodologies for LLMs are well established, with a number of best practices and popular frameworks. They are designed to harness the non-deterministic behaviour of these models.

Agentic workflows introduce entirely new potential points of failure and opportunities for variability, whether it be in the ReAct reasoning loop, tool calling behaviour, or multi-agent collaboration. So, a robust strategy for Evals is even more critical.

New techniques are now being developed for evaluating agentic workflows, with supporting frameworks also emerging. We are using AgentEvals, an open-source package that focuses on the concept of Agent Trajectories, a metric to help measure the intermediate steps of an agent, and optimize the direction of travel.

https://github.com/langchain-ai/agentevals

We’re all sitting on the bleeding edge of a revolution, where techniques and platforms are being developed on the fly to solve new problems as they’re discovered. It’s a rollercoaster, but an exhilarating place to be!

Some learnings from building multi-agent architectures