ARTICLE  ·  15 MIN READ  ·  JANUARY 09, 2026

Chapter 2: Routing

Prompt chains are predictable. The real world isn't. Routing gives agents the ability to make decisions — picking the right tool, sub-agent, or workflow based on what's actually in front of them.


The Limitation of Chains

Before You Start — Key Terms Explained

Intent classification: Figuring out what a user actually *wants* from their message. "My package hasn't arrived" → intent is "delivery complaint". This is the first step in any routing system.

Embeddings / Vectors: A way to convert text into a list of numbers (a vector) that captures its *meaning*. Similar sentences get similar vectors. This lets us find "semantically related" content without exact keyword matching. e.g., "what's the weather?" and "is it raining?" would have similar vectors even though they share no words.

Cosine similarity: A math formula that measures how similar two vectors are. Two identical vectors = similarity of 1. Completely unrelated vectors ≈ 0. Used in embedding-based routing to find the closest-matching route.

Docstring: A comment at the top of a Python function (inside triple quotes) that explains what it does. In LangChain and ADK, the LLM reads the docstring to decide when to call that function — so the docstring is actually part of the interface.

In Chapter 1, we saw how breaking a task into sequential steps makes LLMs more reliable. Every step has one job. Output of step N feeds step N+1. Clean, predictable.

But here’s the problem: what if you don’t know which sequence to run until you see the input?

Imagine a customer support bot. Every incoming message is different:

  • “Where’s my order?” → check the database
  • “Your product broke after one use” → escalate to a human
  • “How do I reset my password?” → search the knowledge base
  • “blah blah blah gibberish” → ask for clarification

You can’t write a fixed chain for this. The right action depends on what the user actually said. You need the system to decide before it acts.

That’s routing.

Routing is one of the most important patterns in software engineering generally, not just AI. Any time a system receives varied inputs and needs to direct them to different handlers — a router does that job. Web servers route HTTP requests to different endpoints. Email servers route messages to different inboxes. Customer support call centers route callers to different departments.

In LLM-powered systems, routing is the pattern that transforms a single intelligent interface into a system that can handle arbitrarily diverse requests with specialist-level precision for each.

The key insight: Routing separates the decision from the action. The router decides which path to take. Each handler executes one specific action perfectly. This separation means you can upgrade any individual handler without touching the routing logic, and you can update routing rules without touching any handler. In software engineering, this is called separation of concerns — one of the most powerful principles in the field.

What Routing Is

Routing adds conditional logic to an agent’s execution. Instead of always following the same path, the agent first evaluates the input — then chooses which path to take.

ROUTING PATTERN
User Query
Unclassified input arrives
Router
Classifies intent
Database Agent
Order status
Escalation
Complaint → human
Knowledge Base
How-to questions
Clarify
Unclear input
Response
Right handler, right answer

The router is the decision-maker. Everything downstream is a handler — a function, tool, sub-agent, or prompt chain that handles one specific type of request.

The key insight: Routing separates what to do from how to do it. The router decides. The handler executes. Neither knows about the other’s internals.

This separation is what makes routed systems easy to extend. Adding a new capability means: (1) write a new handler for it, and (2) add one routing rule pointing to that handler. Nothing else changes. If you instead had one massive prompt that handled everything, adding a new capability means carefully editing that prompt without breaking anything that already works — much harder.

This is why routing is used everywhere in professional software systems. It’s not just about LLMs — it’s about building systems that can grow without becoming a tangled mess.

The Four Types of Routing

Not all routers work the same way. There are four distinct mechanisms — each with dramatically different trade-offs in speed, flexibility, and accuracy.

Understanding why each mechanism works the way it does requires understanding what the router is actually doing in each case:

The fundamental routing question: Given a user’s message, how do you decide which handler to invoke?

You could:

  1. Ask an LLM to think about it and tell you (LLM-based routing)
  2. Check the message for specific keywords or patterns (rule-based routing)
  3. Find which pre-defined route is most semantically similar to the message (embedding-based routing)
  4. Run the message through a small machine learning classifier (ML classifier routing)

Each approach answers the same question — “which handler?” — but through completely different mechanisms with different strengths and weaknesses.

ROUTING METHOD COMPARISON Hover a bar to see details
Flexibility
Speed
Novel-input accuracy
Cost-efficiency

Quick summary of each method:

LLM-based routing — You ask the LLM itself: “Read this query. Output exactly one word: booking, info, or unclear.” The most flexible approach. Handles nuanced or unusual inputs. Trade-off: one extra API call per request, which adds latency and cost.

Rule-based routing — Pure code. if "flight" in query or "hotel" in query → booking. Zero API cost, sub-millisecond speed. Falls apart the moment a user says something you didn’t explicitly anticipate.

Embedding-based routing — Convert the query into a vector (a list of numbers capturing its meaning). Compare to pre-computed vectors for each route. Route to the closest match. Handles semantic variation well (“get me a room” matches “book a hotel”). Needs embedding infrastructure.

ML classifier routing — A small discriminative model fine-tuned on labelled examples. “Here are 500 examples of booking requests, 500 info requests…” Fast at inference, very accurate for known categories. Needs training data, and retraining every time you add a new route.

How It Works: Step by Step

Let’s walk through exactly what happens when a routing system processes a request.

INTERACTIVE — click a query to route it
QUERY
Select a query above
LLM analyses intent
ROUTER
Waiting…
routes to correct handler
Booking Agent
Interacts with flight/hotel APIs
📚
Info Agent
Searches knowledge base
Clarifier
Asks for more information
Pick a query to see how it flows through the router.

Practical Applications

Customer Service Bots

The most obvious use case. A single entry point receives all user messages. The router classifies intent and dispatches to the right sub-agent:

CUSTOMER SERVICE ROUTING
Incoming Message
Router
classifies intent
Order DB Agent
order status
Refund Workflow
refund request
Troubleshooting Chain
technical issue
Human Escalation
complaint
Clarification Prompt
unclear intent

Without routing, you’d need separate endpoints for each query type — and users would have to know which one to use.

Document & Email Pipelines

Incoming emails get classified before any processing happens:

  • Sales lead → CRM ingestion workflow
  • Support ticket → ticketing system + priority score
  • Invoice → accounts payable extraction chain
  • Spam → discard

The router is the first step. Everything after it is specialised.

Multi-Agent Research Systems

A research system with separate agents for web search, paper summarisation, data analysis, and report writing needs a router to decide which agent gets each sub-task. The router looks at the current objective and dispatches accordingly.

AI Coding Assistants

Before passing a code snippet to any tool, the assistant routes based on language × intent:

  • Python + debug → Python linter + error explainer
  • TypeScript + explain → TypeScript-aware explainer
  • SQL + optimise → query plan analyser

Two routing dimensions at once: language detection and intent classification.

LangChain Implementation

Here’s a working routing system using LangChain and Google’s Gemini. A coordinator LLM classifies the intent, then RunnableBranch dispatches to the right handler.

Install

pip install langchain langgraph langchain-google-genai python-dotenv

The Full Code, Explained Line by Line

1. Imports

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableBranch
  • ChatGoogleGenerativeAI — LangChain’s wrapper around Gemini. Swap for ChatOpenAI if using OpenAI.
  • ChatPromptTemplate — builds reusable prompts with named placeholders.
  • StrOutputParser — strips the LLM’s response object down to a plain Python string.
  • RunnablePassthrough — passes input through unchanged. Used to keep the original request available downstream.
  • RunnableBranch — the routing mechanism. Takes a list of (condition, runnable) pairs. Runs the first branch whose condition is True.

2. Initialise the model

import os
from dotenv import load_dotenv
load_dotenv()  # reads GOOGLE_API_KEY from .env

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
# temperature=0 → deterministic output
# Critical for routing: you need a consistent, single-word answer every time

3. Define the handler functions

These simulate what each specialised sub-agent would actually do:

def booking_handler(request: str) -> str:
    """Called when the router decides this is a booking request."""
    # In production: call a flight/hotel API, check availability, etc.
    print("→ DELEGATING TO: Booking Handler")
    return f"Booking confirmed for: '{request}'"

def info_handler(request: str) -> str:
    """Called when the router decides this is an information request."""
    # In production: search a knowledge base, fetch from a database, etc.
    print("→ DELEGATING TO: Info Handler")
    return f"Information retrieved for: '{request}'"

def unclear_handler(request: str) -> str:
    """Fallback for anything the router can't classify."""
    print("→ DELEGATING TO: Clarification Handler")
    return f"Could not process: '{request}'. Please clarify your request."

4. Build the router chain

This is the classification step — the LLM reads the request and outputs exactly one word:

router_prompt = ChatPromptTemplate.from_messages([
    ("system", """Analyze the user's request and output the correct category.

Rules:
- If the request involves booking flights or hotels → output: booker
- If the request is a general information question  → output: info
- If the request is unclear or doesn't fit either   → output: unclear

IMPORTANT: Output exactly ONE word. No punctuation, no explanation."""),
    ("user", "{request}")
])

# Chain: fill the prompt → call LLM → strip to plain string
# Result is a single word: "booker", "info", or "unclear"
router_chain = router_prompt | llm | StrOutputParser()

Why temperature=0 matters here: if temperature > 0, the model might output "Booker" (capitalised), "booking", or "I think this is a booking request..." — all of which break the downstream string comparison.

5. Build the routing branch

# RunnableBranch takes:
#   [(condition_fn, runnable), (condition_fn, runnable), ..., default_runnable]
# It evaluates conditions in order and runs the FIRST matching branch.
# The default (no condition) runs if nothing matches.

delegation = RunnableBranch(
    # condition: is the decision "booker"?
    # x is the dict {"decision": "booker", "request": {...}}
    (
        lambda x: x["decision"].strip().lower() == "booker",
        RunnablePassthrough.assign(
            output=lambda x: booking_handler(x["request"]["request"])
        )
    ),
    (
        lambda x: x["decision"].strip().lower() == "info",
        RunnablePassthrough.assign(
            output=lambda x: info_handler(x["request"]["request"])
        )
    ),
    # Default branch — runs if neither condition matched
    RunnablePassthrough.assign(
        output=lambda x: unclear_handler(x["request"]["request"])
    ),
)

6. Combine into a single runnable

# Step 1: run router_chain → get decision
# Step 2: keep original request alongside the decision
# Step 3: delegation branch picks the right handler
# Step 4: extract just the output string

coordinator = (
    {
        "decision": router_chain,       # ← runs classifier, stores result
        "request": RunnablePassthrough() # ← passes original input through unchanged
    }
    | delegation
    | (lambda x: x["output"])           # ← extract the handler's response string
)

The RunnablePassthrough() here is critical: without it, the delegation branch would only see the router’s decision (a single word), and couldn’t pass the original request to the handler.

7. Run it

# Each invoke() call: classify → route → handle → return

print(coordinator.invoke({"request": "Book me a flight to London."}))
# → DELEGATING TO: Booking Handler
# → "Booking confirmed for: 'Book me a flight to London.'"

print(coordinator.invoke({"request": "What is the capital of Italy?"}))
# → DELEGATING TO: Info Handler
# → "Information retrieved for: 'What is the capital of Italy?'"

print(coordinator.invoke({"request": "Do the thing."}))
# → DELEGATING TO: Clarification Handler
# → "Could not process: 'Do the thing.'. Please clarify your request."

What the Data Flow Looks Like

LANGCHAIN DATA FLOW — How input travels through the routing chain
User Request
"Book flight to London"
router_chain
LLM outputs: "booker" or "info"
RunnableBranch
checks decision string
booking_handler
decision == "booker"
info_handler
decision == "info"
unclear_handler
default fallback
Final Output
string response to user

Google ADK Implementation

The Agent Development Kit (ADK) takes a fundamentally different approach. Instead of building explicit routing logic in code, you define agents with descriptions and tools with docstrings, and let the framework figure out routing.

The Philosophy

LangChain approach ADK approach
You write the routing logic explicitly The framework routes via the LLM’s understanding of agent descriptions
RunnableBranch with explicit conditions sub_agents= list — ADK Auto-Flow handles dispatch
Full control over routing criteria More automatic, less code
Better for complex conditional logic Better for simple delegation patterns

Install

pip install google-adk google-generativeai python-dotenv

The Full Code, Explained

1. Define tool functions

In ADK, a tool is just a Python function with a descriptive docstring. The docstring is critical — ADK’s LLM reads it to decide when to call this tool.

from google.adk.tools import FunctionTool

def booking_handler(request: str) -> str:
    """
    Handles booking requests for flights and hotels.
    Call this tool when the user wants to book, reserve, or schedule
    a flight, hotel room, or accommodation.

    Args:
        request: The user's booking request in natural language.
    Returns:
        Confirmation that the booking action was simulated.
    """
    print("→ Booking Handler called")
    return f"Booking action simulated for: '{request}'"

def info_handler(request: str) -> str:
    """
    Handles general information and factual questions.
    Call this tool when the user asks a factual question, wants
    an explanation, or is looking for general knowledge.

    Args:
        request: The user's information request.
    Returns:
        The result of the simulated information retrieval.
    """
    print("→ Info Handler called")
    return f"Information retrieved for: '{request}'"

# Wrap functions as FunctionTool objects
booking_tool = FunctionTool(booking_handler)
info_tool    = FunctionTool(info_handler)

2. Define specialised sub-agents

Each sub-agent has a description — used by the coordinator to decide which agent to delegate to:

from google.adk.agents import Agent

booking_agent = Agent(
    name="Booker",
    model="gemini-2.0-flash",
    description="Specialist agent for all flight and hotel booking requests.",
    tools=[booking_tool]  # this agent can call booking_handler
)

info_agent = Agent(
    name="Info",
    model="gemini-2.0-flash",
    description="Specialist agent for general information and factual questions.",
    tools=[info_tool]  # this agent can call info_handler
)

3. Define the coordinator

The coordinator’s instruction tells it what its job is. The sub_agents list enables ADK’s Auto-Flow: the coordinator reads the user message and the sub-agent descriptions, then delegates automatically.

coordinator = Agent(
    name="Coordinator",
    model="gemini-2.0-flash",
    instruction="""
    You are a coordinator. Your ONLY job is to route incoming requests.
    Do NOT answer the user directly. Delegate every request to a sub-agent.

    - Flight/hotel booking requests  → delegate to Booker
    - All information/factual questions → delegate to Info
    """,
    description="Routes user requests to the appropriate specialist.",
    sub_agents=[booking_agent, info_agent]  # enables Auto-Flow routing
)

4. Run it

import asyncio, uuid
from google.adk.runners import InMemoryRunner
from google.genai import types

async def run(runner: InMemoryRunner, request: str) -> str:
    user_id    = "user_001"
    session_id = str(uuid.uuid4())

    # Create a fresh session for this request
    await runner.session_service.create_session(
        app_name=runner.app_name,
        user_id=user_id,
        session_id=session_id,
    )

    final = ""
    for event in runner.run(
        user_id=user_id,
        session_id=session_id,
        new_message=types.Content(
            role="user",
            parts=[types.Part(text=request)]
        ),
    ):
        if event.is_final_response() and event.content:
            # Extract the text from the final response event
            if event.content.parts:
                final = "".join(p.text for p in event.content.parts if p.text)
            break
    return final

async def main():
    runner = InMemoryRunner(coordinator)

    result = await run(runner, "Book me a hotel in Paris.")
    print("Result:", result)
    # → Booking Handler called
    # → "Booking action simulated for: 'Book me a hotel in Paris.'"

    result = await run(runner, "What is the tallest mountain in the world?")
    print("Result:", result)
    # → Info Handler called
    # → "Information retrieved for: '...'"

if __name__ == "__main__":
    import nest_asyncio; nest_asyncio.apply()
    asyncio.run(main())

ADK Auto-Flow vs. Explicit Routing

Here’s what ADK does automatically that you write manually in LangChain:

User message
     ↓
Coordinator LLM reads:
  - The user's message
  - The description of each sub-agent
  - Its own instruction
     ↓
Decides: "This sounds like a booking. Delegate to Booker."
     ↓
Booker agent activates
     ↓
Booker LLM reads the request + its tools
     ↓
Calls booking_handler(request=...)
     ↓
Returns result to coordinator
     ↓
Coordinator returns final response to user

The routing decision is embedded in the LLM’s reasoning — you don’t write it explicitly. This is more concise, but also harder to debug when it routes incorrectly.

LangChain vs. ADK: Which to Use

LANGCHAIN vs ADK — WHICH TO USE
How complex is your routing?
Simple, clear categories
Use ADK Auto-Flow
Less code · LLM decides
Coordinator + sub_agents
Faster to build
Complex, state-dependent
Use LangChain / LangGraph
Explicit control flow
RunnableBranch
LangGraph conditional edges
  LangChain + RunnableBranch Google ADK Auto-Flow
Routing control You write explicit conditions LLM decides based on descriptions
Debuggability Easy — conditions are plain Python Harder — routing is inside LLM reasoning
Flexibility Very high Medium
Code volume More Less
Best for Complex multi-step routing, state machines Simple agent delegation

Common Mistakes When Building Routers

Mistake 1: Router categories that overlap. If your router has categories for “billing questions” and “account issues,” many queries will legitimately fit both. The LLM will pick one inconsistently. Define categories that are mutually exclusive — no query should reasonably fit more than one. If you find overlap, merge the categories or narrow their definitions.

Mistake 2: Relying on the router’s raw output without normalization. The LLM might output "Booker", " booker ", "BOOKER", or even "booker." (with a period). Always normalize: .strip().lower(). For critical routing, also handle punctuation: .strip().lower().rstrip(".,!?").

Mistake 3: No fallback handler. What happens when the router produces a category you didn’t expect? Without a default handler, your code raises a KeyError or routes to nothing. Always include a “catch-all” handler that either asks for clarification or provides a generic helpful response.

Mistake 4: Router system prompt is too vague. “Classify this message” is not enough guidance. Tell the router exactly what categories exist, what distinguishes them, and what edge cases should be handled. Include few-shot examples for ambiguous cases. The more specific your routing prompt, the more reliable and consistent the routing becomes.

Mistake 5: Testing routing with only clear-cut examples. Your router may work perfectly on “Book me a flight to London” (clearly a booking request) but fail on “What if I wanted to change my booking?” (is this info or booking?). Test with ambiguous, edge-case inputs — these are where routing systems fail in production.

Key Takeaways

What routing is. Routing adds conditional logic to an agent — instead of always following the same path, the agent evaluates the input and chooses the appropriate handler.

The four methods. LLM-based is most flexible. Rule-based is fastest. Embedding-based handles semantic variation. ML classifier is most accurate for known categories.

The mechanism. A router classifies the input → outputs a decision → the dispatcher picks the matching handler → the handler runs.

In LangChain: explicit routing with RunnableBranch + conditions. Full control. More code.

In ADK: implicit routing via sub_agents and agent descriptions. Less code. Less debuggable.

Rule of thumb: If you can write an if statement to describe the routing logic — use rules or LangChain. If the categories are fuzzy and overlap — use LLM-based or embedding routing.

References

  1. LangGraph Documentation — langchain.com
  2. Google ADK Documentation — google.github.io/adk-docs
  3. LangChain RunnableBranch — python.langchain.com



Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Chapter 10: Contributing to AI Safety — Paths, Skills, and Getting Started
  • Chapter 9: AI Control — Safety Without Trusting the Model
  • Chapter 19: Evaluation and Monitoring
  • Chapter 18: Guardrails and Safety Patterns
  • Chapter 17: Reasoning Techniques