ARTICLE  ·  18 MIN READ  ·  JANUARY 29, 2026

Chapter 7: Multi-Agent Collaboration

One agent hits walls. A team of specialized agents doesn't. Multi-agent collaboration lets you decompose complex problems into coordinated workstreams — each agent doing what it does best.


Why One Agent Isn’t Enough

Before You Start — Key Terms Explained

Specialization: Focusing on a narrow domain to become very good at it. A specialist cardiologist is better at heart problems than a general practitioner, even though the GP knows more topics overall. The same principle applies to AI agents.

Orchestrator: The agent that manages and coordinates other agents. It receives the high-level goal, decomposes it, assigns sub-tasks, and aggregates results. Think of it as the project manager in a team.

Sequential vs parallel: Sequential = one after another (like a queue at a counter). Parallel = simultaneously (like multiple checkout lanes open at once). Sequential is simpler to reason about; parallel is faster for independent tasks.

Why does more roles = worse LLM performance? When you stuff too many instructions into one system prompt ("be a researcher AND a writer AND a fact-checker"), the model splits its "attention" across all goals. It tends to do each poorly. Separate, focused system prompts produce better results for each role.

Every chapter up to now has been about making a single agent smarter — chain it, route it, run it in parallel, give it tools, make it reflect, make it plan. All of these improve a single agent’s performance.

But there’s a ceiling — and it’s a hard one.

The ceiling shows up in several distinct ways:

Knowledge breadth vs. depth. A single agent trying to be a domain expert in multiple fields simultaneously is like a company that puts one person in charge of engineering, legal, marketing, and finance all at once. Each role gets diluted attention, and the person’s expertise in any one domain never reaches the depth needed to excel.

System prompt dilution. When you write a system prompt that tries to make one agent do many things — “You are a researcher AND a writer AND a code reviewer AND a fact-checker” — you’re asking the model to activate multiple behavioral patterns simultaneously. These patterns often have conflicting heuristics. A researcher’s instinct is to gather more information; a writer’s instinct is to synthesize and move on. Held in tension within one agent, they produce mediocre results in both directions.

Context window saturation. A complex multi-domain task accumulates a lot of context: research findings, intermediate drafts, tool outputs, conversation history. A single agent accumulates all of this in one context window, which fills up fast. Multiple specialized agents each have their own context window focused only on their domain.

Error propagation. When a single agent makes a mistake early in a long task, that mistake often propagates through every subsequent step. In a multi-agent system, errors are isolated — a mistake by the Research Agent affects only the research phase, and the Writer Agent can still produce a good draft given the flawed research (though the final output will reflect the research quality).

The multi-agent solution addresses all four of these ceilings simultaneously: specialization deepens expertise, dedicated context windows stay focused, errors are isolated, and coordination mechanisms ensure coherent final outputs.

A single agent handling a complex research project needs to be: a domain expert, a search specialist, a statistical analyst, a fact-checker, and a polished writer — simultaneously. The more roles you pile onto one system prompt, the worse it performs at each of them.

The same problem that prompted humans to build teams applies to agents: specialization beats generalism at scale.

Multi-agent collaboration breaks a complex goal into sub-tasks, assigns each to an agent built specifically for that task, and coordinates their outputs into a unified result. The researcher does the research. The analyst runs the numbers. The writer assembles the draft. The reviewer catches errors. Each agent excels at its role. The system as a whole exceeds what any single agent could produce.


What Multi-Agent Collaboration Is

Multi-agent collaboration is a system design pattern where:

  1. A complex goal is decomposed into discrete sub-tasks
  2. Each sub-task is assigned to a specialized agent with the right tools, knowledge, or reasoning approach
  3. Agents coordinate through defined communication protocols — passing outputs, delegating tasks, sharing state
  4. A final output emerges from the coordinated work of the team
MULTI-AGENT COLLABORATION PATTERN
Complex Goal
Too large for any single agent
Orchestrator Agent
Decomposes goal · delegates to specialists
Specialist Agents — each owns a distinct role
Separate tools, knowledge, and system prompts per agent
Research Agent
Search · retrieve
Analysis Agent
Process · compute
Writer Agent
Draft · format
Synthesis Agent
Merges all specialist outputs · resolves conflicts
Final Result
Exceeds what any single agent could produce

The critical ingredient is inter-agent communication: a standardized way for agents to exchange data, delegate work, and signal completion. Without it, you just have isolated agents running in parallel — not a team.


The Six Interaction Models

Agent teams can be structured in fundamentally different ways. The choice of structure changes autonomy, fault tolerance, complexity, and scalability.

Single Agent
One agent handles everything autonomously. Simple to build and manage, but constrained by a single agent's scope and resources. No inter-agent communication overhead. Suitable only when the task is fully self-contained.
Autonomy
Fault tolerance
Scalability
Complexity
Best for: simple, self-contained tasks with no need for specialization

The Five Collaboration Patterns

Within multi-agent systems, agents interact through five fundamental patterns:

Sequential Handoff

Agent A completes its task and passes its output directly to Agent B. Output of A becomes input to B. Clean pipeline with clear dependencies.

Research → Write → Edit → Publish

Parallel Workstreams

Multiple agents work on independent sub-tasks simultaneously. Results are merged by a synthesizer. Reduces total latency for independent work.

News + Weather + Stocks → Report

Debate & Consensus

Agents with different perspectives evaluate options and discuss. The group reaches a more informed decision than any single agent could.

Pro + Con agents → Moderator → Verdict

Hierarchical Delegation

A manager agent dynamically assigns sub-tasks to worker agents based on their tool access. Workers report back, manager synthesizes.

Coordinator → Specialist A / B / C → Result
✓✗

Critic-Reviewer

A creator agent produces output. A critic agent evaluates it for quality, compliance, or correctness. Creator revises based on critique.

Generator → Critic → Revised Output

Watch a Two-Agent Team Work

A Researcher and a Writer collaborate on a blog post. Click Run Team to watch the sequential handoff:

MULTI-AGENT TEAM DEMO — Research + Write Pipeline
SHARED GOAL Write a blog post on the top 3 AI trends in 2025
🔍
Research Agent
Senior Research Analyst
Idle
Writer Agent
Technical Content Writer
Waiting

Use Cases Across Domains

01

Complex Research

Searcher agent finds sources, summarizer agent digests them, trend-spotter identifies patterns, synthesizer writes the final report.

02

Software Development

Requirements analyst, code generator, test writer, and documentation agent collaborate — each handing off to the next in sequence.

03

Financial Analysis

Agents fetch stock data, analyze news sentiment, run technical analysis, and generate investment recommendations in parallel.

04

Customer Support

Front-line agent handles general queries. Specialist agents (billing, technical, escalation) are activated only when needed.

05

Supply Chain

Agents representing suppliers, manufacturers, and distributors collaborate to optimize inventory and logistics in real time.

06

Network Remediation

Multiple triage agents work concurrently to pinpoint failures. Each integrates with existing ML tooling while contributing to a coordinated remediation plan.


CrewAI: Researcher + Writer Team

CrewAI is designed for exactly this — defining a team of agents with distinct roles, assigning tasks, and orchestrating collaboration.

from crewai import Agent, Task, Crew, Process
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

Defining the Agents

researcher = Agent(
    role      = 'Senior Research Analyst',
    goal      = 'Find and summarize the latest trends in AI.',
    backstory  = "You are an experienced research analyst with a knack for identifying key trends and synthesizing information.",
    verbose          = True,
    allow_delegation = False,
)

writer = Agent(
    role      = 'Technical Content Writer',
    goal      = 'Write a clear and engaging blog post based on research findings.',
    backstory  = "You are a skilled writer who can translate complex technical topics into accessible content.",
    verbose          = True,
    allow_delegation = False,
)

Two agents, two roles, zero overlap. The researcher knows nothing about writing; the writer knows nothing about searching. This clean separation means each agent’s system prompt is focused — no role confusion, no competing instructions.

allow_delegation=False — prevents each specialist from spawning sub-agents on their own. In a controlled pipeline, you want the delegation logic to live in the Crew definition, not in individual agents.

Defining the Tasks (with dependency)

research_task = Task(
    description      = "Research the top 3 emerging AI trends in 2024-2025. Focus on practical applications and impact.",
    expected_output  = "A detailed summary of the top 3 AI trends with key points and sources.",
    agent            = researcher,
)

writing_task = Task(
    description     = "Write a 500-word blog post based on the research findings. Engaging, accessible to a general audience.",
    expected_output = "A complete 500-word blog post about the latest AI trends.",
    agent           = writer,
    context         = [research_task],   # ← explicit dependency
)

context=[research_task] is the sequential handoff mechanism. It tells CrewAI: “the writing task cannot begin until research_task is complete, and the research output should be available to the writer as context.”

Without this, the writer would run without access to the researcher’s findings — generating generic content rather than building on real research.

Assembling the Crew

blog_creation_crew = Crew(
    agents  = [researcher, writer],
    tasks   = [research_task, writing_task],
    process = Process.sequential,
    llm     = llm,
    verbose = 2,
)

result = blog_creation_crew.kickoff()

Process.sequential ensures tasks run in the order they’re listed — researcher first, writer second. The context dependency enforces this even if the process allows parallelism.

llm at the Crew level sets a default model for all agents. Individual agents can still override with their own llm= parameter if specialization is needed — e.g., a more powerful model for the researcher, a faster one for the writer.

CrewAI Data Flow

CREWAI DATA FLOW — research output becomes writer input via context=[task]
Topic
"Top 3 AI trends 2025"
Researcher Agent
Senior Research Analyst · finds sources · summarizes trends
Crew Context
research output stored · context=[research_task] passes it automatically
Writer Agent
Technical Content Writer · sees research output · writes blog post
500-word Blog Post

Google ADK: Four Orchestration Patterns

The ADK provides explicit primitives for each collaboration structure. Here are the four most important.

1. Hierarchical: Parent-Child Agents

from google.adk.agents import LlmAgent, BaseAgent
from google.adk.agents.invocation_context import InvocationContext
from google.adk.events import Event
from typing import AsyncGenerator

class TaskExecutor(BaseAgent):
    """A specialized agent with custom, non-LLM behavior."""
    name:        str = "TaskExecutor"
    description: str = "Executes a predefined task."

    async def _run_async_impl(
        self, context: InvocationContext
    ) -> AsyncGenerator[Event, None]:
        yield Event(author=self.name, content="Task finished successfully.")

BaseAgent is the extension point for non-LLM agents. When you need deterministic logic, API calls, or custom execution — instead of LLM reasoning — subclass BaseAgent and implement _run_async_impl. It yields Event objects as output, integrating cleanly with the ADK event stream.

greeter     = LlmAgent(name="Greeter", model="gemini-2.0-flash-exp", instruction="You are a friendly greeter.")
task_doer   = TaskExecutor()

coordinator = LlmAgent(
    name        = "Coordinator",
    model       = "gemini-2.0-flash-exp",
    instruction = "When asked to greet, delegate to the Greeter. When asked to perform a task, delegate to the TaskExecutor.",
    sub_agents  = [greeter, task_doer],
)

# ADK sets parent_agent automatically
assert greeter.parent_agent   == coordinator
assert task_doer.parent_agent == coordinator

sub_agents establishes the parent-child relationship. The coordinator’s LLM reads both agents’ descriptions and decides which to delegate to. The parent_agent attribute is set automatically by the ADK framework — you don’t need to wire it manually.

ADK HIERARCHICAL AGENTS — parent delegates to children via sub_agents
Coordinator LlmAgent
reads user query · decides which child to delegate to based on each agent's description
Greeter LlmAgent
activated when greeting is needed · LLM-powered · uses Gemini
TaskExecutor BaseAgent
custom logic · non-LLM · runs deterministic code

2. LoopAgent: Iterative Workflows

from google.adk.agents import LoopAgent, LlmAgent, BaseAgent
from google.adk.events import Event, EventActions

class ConditionChecker(BaseAgent):
    """Stops the loop when session state signals completion."""
    name:        str = "ConditionChecker"
    description: str = "Checks if a process is complete and signals the loop to stop."

    async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:
        status  = context.session.state.get("status", "pending")
        is_done = (status == "completed")
        if is_done:
            yield Event(author=self.name, actions=EventActions(escalate=True))  # stop loop
        else:
            yield Event(author=self.name, content="Condition not met, continuing loop.")

EventActions(escalate=True) is the loop-termination signal. When ConditionChecker yields this, the LoopAgent stops regardless of remaining iterations. Without it, the loop runs to max_iterations.

context.session.state is the shared key-value store accessible to all agents within a session. ProcessingStep writes "status": "completed" there; ConditionChecker reads it. This is how agents communicate state in ADK without direct message passing.

process_step = LlmAgent(
    name        = "ProcessingStep",
    model       = "gemini-2.0-flash-exp",
    instruction = "Perform your task. If you are the final step, set session state 'status' to 'completed'.",
)

poller = LoopAgent(
    name           = "StatusPoller",
    max_iterations = 10,
    sub_agents     = [process_step, ConditionChecker()],
)

LoopAgent runs its sub_agents sequentially on each iteration. Each iteration: ProcessingStep executes → ConditionChecker evaluates → if not done, repeat. Maximum 10 iterations as a safety cap. Use this for polling, retry workflows, and iterative refinement loops.

3. SequentialAgent: Linear Pipelines

from google.adk.agents import SequentialAgent, Agent

step1 = Agent(name="Step1_Fetch",   output_key="data")
step2 = Agent(name="Step2_Process", instruction="Analyze the information in state['data'] and provide a summary.")

pipeline = SequentialAgent(name="MyPipeline", sub_agents=[step1, step2])

output_key="data" — when Step1_Fetch completes, its output is automatically stored in session.state["data"]. Step2_Process reads it from state via its instruction. This is the ADK equivalent of CrewAI’s context=[task] — explicit state-passing between sequential steps.

4. ParallelAgent: Concurrent Workers

from google.adk.agents import Agent, ParallelAgent

weather_fetcher = Agent(
    name        = "weather_fetcher",
    model       = "gemini-2.0-flash-exp",
    instruction = "Fetch the weather for the given location and return only the report.",
    output_key  = "weather_data",   # → session.state["weather_data"]
)

news_fetcher = Agent(
    name        = "news_fetcher",
    model       = "gemini-2.0-flash-exp",
    instruction = "Fetch the top news story for the given topic and return only that story.",
    output_key  = "news_data",      # → session.state["news_data"]
)

data_gatherer = ParallelAgent(
    name       = "data_gatherer",
    sub_agents = [weather_fetcher, news_fetcher],
)

ParallelAgent fires both sub-agents concurrently. Each writes to a different output_key in session state — weather_data and news_data — with no conflict since the keys are distinct. After both complete, a downstream agent (or your application code) can read both from state.

ADK PARALLEL AGENT — two workers fire simultaneously, results in shared state
ParallelAgent
fires both sub-agents at the exact same time — doesn't wait for one before starting the other
weather_fetcher
output_key: "weather_data" → written to Session State
news_fetcher
output_key: "news_data" → written to Session State
Session State
shared key-value store · both results land here · no conflict since keys are different
Synthesizer reads both
downstream agent combines weather_data + news_data into final response

5. Agent as a Tool

from google.adk.agents import LlmAgent
from google.adk.tools import agent_tool

def generate_image(prompt: str) -> dict:
    """Generates an image from a textual prompt. Returns image bytes."""
    mock_bytes = b"mock_image_data_for_a_cat_wearing_a_hat"
    return {"status": "success", "image_bytes": mock_bytes, "mime_type": "image/png"}

image_generator_agent = LlmAgent(
    name        = "ImageGen",
    model       = "gemini-2.0-flash",
    description = "Generates an image based on a detailed text prompt.",
    instruction = "Take the user's request and use the generate_image tool to create the image.",
    tools       = [generate_image],
)

image_tool = agent_tool.AgentTool(
    agent       = image_generator_agent,
    description = "Use this to generate an image. Input should be a descriptive prompt.",
)

artist_agent = LlmAgent(
    name        = "Artist",
    model       = "gemini-2.0-flash",
    instruction = "Invent a creative image prompt. Then use the ImageGen tool to generate it.",
    tools       = [image_tool],
)

AgentTool wraps a sub-agent and makes it callable as a tool from a parent agent’s perspective. The artist_agent sees image_tool the same way it sees generate_image — as a tool it can call with arguments. The tool’s description is what the parent’s LLM reads to decide when to invoke it.

Why “agent as a tool” instead of sub_agents? With sub_agents, the coordinator’s LLM decides routing. With AgentTool, the parent agent explicitly invokes the sub-agent like a function call — passing specific arguments and receiving a return value. More precise, less autonomous.


Framework Comparison

Feature CrewAI Google ADK
Team definition Agent + Task + Crew LlmAgent / BaseAgent + orchestrators
Sequential flow Process.sequential + context=[] SequentialAgent + output_key
Parallel flow Multiple tasks with no context dependency ParallelAgent
Loop flow Custom code / while loop LoopAgent + EventActions(escalate=True)
Hierarchical allow_delegation=True sub_agents=[] on LlmAgent
Agent as tool Not native (use LangChain tools) AgentTool wrapper
State passing context=[task] in task definition output_keysession.state
Custom agent logic Not supported natively BaseAgent._run_async_impl()

At a Glance

WHAT

A system of specialized agents, each owning a distinct role and set of tools, coordinating to achieve a complex shared goal that no single agent could handle effectively alone.

WHY

Specialization beats generalism at scale. Decomposing a complex task into focused sub-tasks — each assigned to an agent built for it — produces higher quality, more reliable, and more scalable outcomes.

RULE OF THUMB

Use when a task requires diverse expertise, multiple distinct phases, or benefits from parallel workstreams. If a single agent's system prompt is trying to do too many jobs — split it into a team.


How Multi-Agent Communication Actually Works

In both CrewAI and ADK, agents don’t “talk to each other” directly — they communicate through shared data structures managed by the framework.

CrewAI: Task context as the communication channel. When you write context=[research_task] in a writing task definition, CrewAI stores the research task’s output after it completes. When the writing task runs, CrewAI injects that stored output into the writer agent’s context. The writer doesn’t know or care that a researcher ran before it — it just receives text in its context that says “here is the research that was done.” This is called indirect communication — agents share information through a coordinator (CrewAI) rather than directly messaging each other.

ADK: Session state as the communication channel. In ADK, agents communicate through session.state — a shared dictionary that all agents in a session can read from and write to. When researcher_agent_1 finishes, its output is stored as session.state["renewable_energy_result"] (because output_key="renewable_energy_result" was set). When synthesis_agent runs, its instruction template includes {renewable_energy_result} — ADK automatically fills this from session state.

Why indirect communication is better than direct messaging. Direct agent-to-agent messaging introduces tight coupling — agent A needs to know about agent B’s interface, location, and availability. If B changes or fails, A breaks. Indirect communication through shared state decouples them — A writes to a named key, B reads from a named key, neither knows about the other. You can swap out A or B independently without changing the other. You can add a new agent C that also reads from the same key. You can replay any agent’s behavior by pre-populating the key manually for debugging.

How the orchestrator decides which agent to call. In ADK’s hierarchical mode, the coordinator’s LLM reads every sub-agent’s description field and decides which one to invoke based on the user’s request. This is the same mechanism as tool routing: the sub-agent’s description is the “docstring” that the coordinator LLM reads to make its routing decision. Write sub-agent descriptions with the same precision you’d use for tool docstrings.

Common Mistakes When Building Multi-Agent Systems

Mistake 1: Over-specialization. Creating 15 highly specialized agents when 3 would do. Each agent boundary adds communication overhead (API calls, context window usage, potential errors in handoffs). Start with fewer, broader agents and specialize only when a single agent clearly fails at a specific task.

Mistake 2: No shared vocabulary. If the Research Agent produces “EV market growth: 40% YoY” and the Writer Agent expects a structured JSON report, the handoff fails. Define the exact format each agent should produce and each agent should expect. Use expected_output in CrewAI tasks or explicit instructions in ADK agents.

Mistake 3: Parallel agents with conflicting writes. In ADK’s ParallelAgent, if two sub-agents write to the same output_key, the second one overwrites the first. Always give parallel agents unique output_key values. If you forget, you’ll lose data silently — no error is raised.

Mistake 4: Not handling agent failure gracefully. If one agent in a sequential pipeline fails (API timeout, invalid output, exception), the pipeline crashes. In production, wrap each agent call in error handling and decide: should the pipeline retry, skip the failed step, or abort entirely?

Mistake 5: Building multi-agent before single-agent. Don’t jump to multi-agent complexity until a single agent provably fails at the task. Multi-agent systems are harder to debug, more expensive to run, and introduce new failure modes. Always start simple and add agents only when you’ve demonstrated a clear need.

Key Takeaways

  • Specialization wins. A focused agent with a clear role outperforms a generalist agent trying to do everything. Split roles aggressively.
  • The six interaction models — Single, Network, Supervisor, Supervisor-as-Tool, Hierarchical, Custom — each trade off autonomy vs. control vs. complexity. Most production systems use Hierarchical or Supervisor.
  • CrewAI wires tasks, not agents. The context=[task] parameter is the key to sequential handoffs — it passes one agent’s output as context to the next without any manual wiring.
  • ADK provides explicit primitives: SequentialAgent for linear pipelines, ParallelAgent for concurrent work, LoopAgent for iterative workflows, sub_agents for hierarchy, AgentTool for invoking an agent like a function.
  • output_key is ADK’s state bus. Every agent in an ADK session shares session.state. Writing to a key with output_key and reading from it in another agent’s instruction is the standard inter-agent communication mechanism.
  • EventActions(escalate=True) is how a LoopAgent stops. Always provide a clear stopping condition — and max_iterations as a safety cap.
  • Failure isolation is a feature. When one agent in a multi-agent system fails, the others can continue. Compare this to a monolithic agent where one failed step can derail the entire task.

Next up — Chapter 8: Memory, where agents stop starting from scratch on every turn and begin building persistent context across conversations and sessions.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Chapter 10: Contributing to AI Safety — Paths, Skills, and Getting Started
  • Chapter 9: AI Control — Safety Without Trusting the Model
  • Chapter 19: Evaluation and Monitoring
  • Chapter 18: Guardrails and Safety Patterns
  • Chapter 17: Reasoning Techniques