Chapter 2: Routing | Kohsheen Tiku

The Limitation of Chains

Before You Start — Key Terms Explained

Intent classification: Figuring out what a user actually *wants* from their message. "My package hasn't arrived" → intent is "delivery complaint". This is the first step in any routing system.

Embeddings / Vectors: A way to convert text into a list of numbers (a vector) that captures its *meaning*. Similar sentences get similar vectors. This lets us find "semantically related" content without exact keyword matching. e.g., "what's the weather?" and "is it raining?" would have similar vectors even though they share no words.

Cosine similarity: A math formula that measures how similar two vectors are. Two identical vectors = similarity of 1. Completely unrelated vectors ≈ 0. Used in embedding-based routing to find the closest-matching route.

Docstring: A comment at the top of a Python function (inside triple quotes) that explains what it does. In LangChain and ADK, the LLM reads the docstring to decide when to call that function — so the docstring is actually part of the interface.

In Chapter 1, we saw how breaking a task into sequential steps makes LLMs more reliable. Every step has one job. Output of step N feeds step N+1. Clean, predictable.

But here’s the problem: what if you don’t know which sequence to run until you see the input?

Imagine a customer support bot. Every incoming message is different:

“Where’s my order?” → check the database
“Your product broke after one use” → escalate to a human
“How do I reset my password?” → search the knowledge base
“blah blah blah gibberish” → ask for clarification

You can’t write a fixed chain for this. The right action depends on what the user actually said. You need the system to decide before it acts.

That’s routing.

Routing is one of the most important patterns in software engineering generally, not just AI. Any time a system receives varied inputs and needs to direct them to different handlers — a router does that job. Web servers route HTTP requests to different endpoints. Email servers route messages to different inboxes. Customer support call centers route callers to different departments.

In LLM-powered systems, routing is the pattern that transforms a single intelligent interface into a system that can handle arbitrarily diverse requests with specialist-level precision for each.

The key insight: Routing separates the decision from the action. The router decides which path to take. Each handler executes one specific action perfectly. This separation means you can upgrade any individual handler without touching the routing logic, and you can update routing rules without touching any handler. In software engineering, this is called separation of concerns — one of the most powerful principles in the field.

What Routing Is

Routing adds conditional logic to an agent’s execution. Instead of always following the same path, the agent first evaluates the input — then chooses which path to take.

ROUTING PATTERN

User Query

Unclassified input arrives

Router

Classifies intent

Database Agent

Order status

Escalation

Complaint → human

Knowledge Base

How-to questions

Clarify

Unclear input

Response

Right handler, right answer

The router is the decision-maker. Everything downstream is a handler — a function, tool, sub-agent, or prompt chain that handles one specific type of request.

The key insight: Routing separates what to do from how to do it. The router decides. The handler executes. Neither knows about the other’s internals.

This separation is what makes routed systems easy to extend. Adding a new capability means: (1) write a new handler for it, and (2) add one routing rule pointing to that handler. Nothing else changes. If you instead had one massive prompt that handled everything, adding a new capability means carefully editing that prompt without breaking anything that already works — much harder.

This is why routing is used everywhere in professional software systems. It’s not just about LLMs — it’s about building systems that can grow without becoming a tangled mess.

The Four Types of Routing

Not all routers work the same way. There are four distinct mechanisms — each with dramatically different trade-offs in speed, flexibility, and accuracy.

Understanding why each mechanism works the way it does requires understanding what the router is actually doing in each case:

The fundamental routing question: Given a user’s message, how do you decide which handler to invoke?

You could:

Ask an LLM to think about it and tell you (LLM-based routing)
Check the message for specific keywords or patterns (rule-based routing)
Find which pre-defined route is most semantically similar to the message (embedding-based routing)
Run the message through a small machine learning classifier (ML classifier routing)

Each approach answers the same question — “which handler?” — but through completely different mechanisms with different strengths and weaknesses.

ROUTING METHOD COMPARISON Hover a bar to see details

Flexibility

Speed

Novel-input accuracy

Cost-efficiency

Quick summary of each method:

LLM-based routing — You ask the LLM itself: “Read this query. Output exactly one word: booking, info, or unclear.” The most flexible approach. Handles nuanced or unusual inputs. Trade-off: one extra API call per request, which adds latency and cost.

Rule-based routing — Pure code. if "flight" in query or "hotel" in query → booking. Zero API cost, sub-millisecond speed. Falls apart the moment a user says something you didn’t explicitly anticipate.

Embedding-based routing — Convert the query into a vector (a list of numbers capturing its meaning). Compare to pre-computed vectors for each route. Route to the closest match. Handles semantic variation well (“get me a room” matches “book a hotel”). Needs embedding infrastructure.

ML classifier routing — A small discriminative model fine-tuned on labelled examples. “Here are 500 examples of booking requests, 500 info requests…” Fast at inference, very accurate for known categories. Needs training data, and retraining every time you add a new route.

How It Works: Step by Step

Let’s walk through exactly what happens when a routing system processes a request.

INTERACTIVE — click a query to route it

QUERY

Select a query above

↓

LLM analyses intent

ROUTER

Waiting…

↓

routes to correct handler

✈

Booking Agent

Interacts with flight/hotel APIs

📚

Info Agent

Searches knowledge base

❓

Clarifier

Asks for more information

Pick a query to see how it flows through the router.

Practical Applications

Customer Service Bots

The most obvious use case. A single entry point receives all user messages. The router classifies intent and dispatches to the right sub-agent:

CUSTOMER SERVICE ROUTING

Incoming Message

→

Router

classifies intent

→

Order DB Agent

order status

Refund Workflow

refund request

Troubleshooting Chain

technical issue

Human Escalation

complaint

Clarification Prompt

unclear intent

Without routing, you’d need separate endpoints for each query type — and users would have to know which one to use.

Document & Email Pipelines

Incoming emails get classified before any processing happens:

Sales lead → CRM ingestion workflow
Support ticket → ticketing system + priority score
Invoice → accounts payable extraction chain
Spam → discard

The router is the first step. Everything after it is specialised.

Multi-Agent Research Systems

A research system with separate agents for web search, paper summarisation, data analysis, and report writing needs a router to decide which agent gets each sub-task. The router looks at the current objective and dispatches accordingly.

AI Coding Assistants

Before passing a code snippet to any tool, the assistant routes based on language × intent:

Python + debug → Python linter + error explainer
TypeScript + explain → TypeScript-aware explainer
SQL + optimise → query plan analyser

Two routing dimensions at once: language detection and intent classification.

LangChain Implementation

Here’s a working routing system using LangChain and Google’s Gemini. A coordinator LLM classifies the intent, then RunnableBranch dispatches to the right handler.

Install

pip install langchain langgraph langchain-google-genai python-dotenv

The Full Code, Explained Line by Line

1. Imports

from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableBranch

ChatGoogleGenerativeAI — LangChain’s wrapper around Gemini. Swap for ChatOpenAI if using OpenAI.
ChatPromptTemplate — builds reusable prompts with named placeholders.
StrOutputParser — strips the LLM’s response object down to a plain Python string.
RunnablePassthrough — passes input through unchanged. Used to keep the original request available downstream.
RunnableBranch — the routing mechanism. Takes a list of (condition, runnable) pairs. Runs the first branch whose condition is True.

2. Initialise the model

import os
from dotenv import load_dotenv
load_dotenv()  # reads GOOGLE_API_KEY from .env

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
# temperature=0 → deterministic output
# Critical for routing: you need a consistent, single-word answer every time

3. Define the handler functions

These simulate what each specialised sub-agent would actually do:

def booking_handler(request: str) -> str:
    """Called when the router decides this is a booking request."""
    # In production: call a flight/hotel API, check availability, etc.
    print("→ DELEGATING TO: Booking Handler")
    return f"Booking confirmed for: '{request}'"

def info_handler(request: str) -> str:
    """Called when the router decides this is an information request."""
    # In production: search a knowledge base, fetch from a database, etc.
    print("→ DELEGATING TO: Info Handler")
    return f"Information retrieved for: '{request}'"

def unclear_handler(request: str) -> str:
    """Fallback for anything the router can't classify."""
    print("→ DELEGATING TO: Clarification Handler")
    return f"Could not process: '{request}'. Please clarify your request."

4. Build the router chain

This is the classification step — the LLM reads the request and outputs exactly one word:

router_prompt = ChatPromptTemplate.from_messages([
    ("system", """Analyze the user's request and output the correct category.

Rules:
- If the request involves booking flights or hotels → output: booker
- If the request is a general information question  → output: info
- If the request is unclear or doesn't fit either   → output: unclear

IMPORTANT: Output exactly ONE word. No punctuation, no explanation."""),
    ("user", "{request}")
])

# Chain: fill the prompt → call LLM → strip to plain string
# Result is a single word: "booker", "info", or "unclear"
router_chain = router_prompt | llm | StrOutputParser()

Why temperature=0 matters here: if temperature > 0, the model might output "Booker" (capitalised), "booking", or "I think this is a booking request..." — all of which break the downstream string comparison.

5. Build the routing branch

# RunnableBranch takes:
#   [(condition_fn, runnable), (condition_fn, runnable), ..., default_runnable]
# It evaluates conditions in order and runs the FIRST matching branch.
# The default (no condition) runs if nothing matches.

delegation = RunnableBranch(
    # condition: is the decision "booker"?
    # x is the dict {"decision": "booker", "request": {...}}
    (
        lambda x: x["decision"].strip().lower() == "booker",
        RunnablePassthrough.assign(
            output=lambda x: booking_handler(x["request"]["request"])
        )
    ),
    (
        lambda x: x["decision"].strip().lower() == "info",
        RunnablePassthrough.assign(
            output=lambda x: info_handler(x["request"]["request"])
        )
    ),
    # Default branch — runs if neither condition matched
    RunnablePassthrough.assign(
        output=lambda x: unclear_handler(x["request"]["request"])
    ),
)

6. Combine into a single runnable

# Step 1: run router_chain → get decision
# Step 2: keep original request alongside the decision
# Step 3: delegation branch picks the right handler
# Step 4: extract just the output string

coordinator = (
    {
        "decision": router_chain,       # ← runs classifier, stores result
        "request": RunnablePassthrough() # ← passes original input through unchanged
    }
    | delegation
    | (lambda x: x["output"])           # ← extract the handler's response string
)

The RunnablePassthrough() here is critical: without it, the delegation branch would only see the router’s decision (a single word), and couldn’t pass the original request to the handler.

7. Run it

# Each invoke() call: classify → route → handle → return

print(coordinator.invoke({"request": "Book me a flight to London."}))
# → DELEGATING TO: Booking Handler
# → "Booking confirmed for: 'Book me a flight to London.'"

print(coordinator.invoke({"request": "What is the capital of Italy?"}))
# → DELEGATING TO: Info Handler
# → "Information retrieved for: 'What is the capital of Italy?'"

print(coordinator.invoke({"request": "Do the thing."}))
# → DELEGATING TO: Clarification Handler
# → "Could not process: 'Do the thing.'. Please clarify your request."

What the Data Flow Looks Like

LANGCHAIN DATA FLOW — How input travels through the routing chain

User Request

"Book flight to London"

→

router_chain

LLM outputs: "booker" or "info"

→

RunnableBranch

checks decision string

→

booking_handler

decision == "booker"

info_handler

decision == "info"

unclear_handler

default fallback

→

Final Output

string response to user

Google ADK Implementation

The Agent Development Kit (ADK) takes a fundamentally different approach. Instead of building explicit routing logic in code, you define agents with descriptions and tools with docstrings, and let the framework figure out routing.

The Philosophy

LangChain approach	ADK approach
You write the routing logic explicitly	The framework routes via the LLM’s understanding of agent descriptions
`RunnableBranch` with explicit conditions	`sub_agents=` list — ADK Auto-Flow handles dispatch
Full control over routing criteria	More automatic, less code
Better for complex conditional logic	Better for simple delegation patterns

Install

pip install google-adk google-generativeai python-dotenv

The Full Code, Explained

1. Define tool functions

In ADK, a tool is just a Python function with a descriptive docstring. The docstring is critical — ADK’s LLM reads it to decide when to call this tool.

from google.adk.tools import FunctionTool

def booking_handler(request: str) -> str:
    """
    Handles booking requests for flights and hotels.
    Call this tool when the user wants to book, reserve, or schedule
    a flight, hotel room, or accommodation.

    Args:
        request: The user's booking request in natural language.
    Returns:
        Confirmation that the booking action was simulated.
    """
    print("→ Booking Handler called")
    return f"Booking action simulated for: '{request}'"

def info_handler(request: str) -> str:
    """
    Handles general information and factual questions.
    Call this tool when the user asks a factual question, wants
    an explanation, or is looking for general knowledge.

    Args:
        request: The user's information request.
    Returns:
        The result of the simulated information retrieval.
    """
    print("→ Info Handler called")
    return f"Information retrieved for: '{request}'"

# Wrap functions as FunctionTool objects
booking_tool = FunctionTool(booking_handler)
info_tool    = FunctionTool(info_handler)

2. Define specialised sub-agents

Each sub-agent has a description — used by the coordinator to decide which agent to delegate to:

from google.adk.agents import Agent

booking_agent = Agent(
    name="Booker",
    model="gemini-2.0-flash",
    description="Specialist agent for all flight and hotel booking requests.",
    tools=[booking_tool]  # this agent can call booking_handler
)

info_agent = Agent(
    name="Info",
    model="gemini-2.0-flash",
    description="Specialist agent for general information and factual questions.",
    tools=[info_tool]  # this agent can call info_handler
)

3. Define the coordinator

The coordinator’s instruction tells it what its job is. The sub_agents list enables ADK’s Auto-Flow: the coordinator reads the user message and the sub-agent descriptions, then delegates automatically.

coordinator = Agent(
    name="Coordinator",
    model="gemini-2.0-flash",
    instruction="""
    You are a coordinator. Your ONLY job is to route incoming requests.
    Do NOT answer the user directly. Delegate every request to a sub-agent.

    - Flight/hotel booking requests  → delegate to Booker
    - All information/factual questions → delegate to Info
    """,
    description="Routes user requests to the appropriate specialist.",
    sub_agents=[booking_agent, info_agent]  # enables Auto-Flow routing
)

4. Run it

import asyncio, uuid
from google.adk.runners import InMemoryRunner
from google.genai import types

async def run(runner: InMemoryRunner, request: str) -> str:
    user_id    = "user_001"
    session_id = str(uuid.uuid4())

    # Create a fresh session for this request
    await runner.session_service.create_session(
        app_name=runner.app_name,
        user_id=user_id,
        session_id=session_id,
    )

    final = ""
    for event in runner.run(
        user_id=user_id,
        session_id=session_id,
        new_message=types.Content(
            role="user",
            parts=[types.Part(text=request)]
        ),
    ):
        if event.is_final_response() and event.content:
            # Extract the text from the final response event
            if event.content.parts:
                final = "".join(p.text for p in event.content.parts if p.text)
            break
    return final

async def main():
    runner = InMemoryRunner(coordinator)

    result = await run(runner, "Book me a hotel in Paris.")
    print("Result:", result)
    # → Booking Handler called
    # → "Booking action simulated for: 'Book me a hotel in Paris.'"

    result = await run(runner, "What is the tallest mountain in the world?")
    print("Result:", result)
    # → Info Handler called
    # → "Information retrieved for: '...'"

if __name__ == "__main__":
    import nest_asyncio; nest_asyncio.apply()
    asyncio.run(main())

ADK Auto-Flow vs. Explicit Routing

Here’s what ADK does automatically that you write manually in LangChain:

User message
     ↓
Coordinator LLM reads:
  - The user's message
  - The description of each sub-agent
  - Its own instruction
     ↓
Decides: "This sounds like a booking. Delegate to Booker."
     ↓
Booker agent activates
     ↓
Booker LLM reads the request + its tools
     ↓
Calls booking_handler(request=...)
     ↓
Returns result to coordinator
     ↓
Coordinator returns final response to user

The routing decision is embedded in the LLM’s reasoning — you don’t write it explicitly. This is more concise, but also harder to debug when it routes incorrectly.

LangChain vs. ADK: Which to Use

LANGCHAIN vs ADK — WHICH TO USE

How complex is your routing?

Simple, clear categories

Use ADK Auto-Flow

Less code · LLM decides

Coordinator + sub_agents

Faster to build

Complex, state-dependent

Use LangChain / LangGraph

Explicit control flow

RunnableBranch

LangGraph conditional edges

	LangChain + RunnableBranch	Google ADK Auto-Flow
Routing control	You write explicit conditions	LLM decides based on descriptions
Debuggability	Easy — conditions are plain Python	Harder — routing is inside LLM reasoning
Flexibility	Very high	Medium
Code volume	More	Less
Best for	Complex multi-step routing, state machines	Simple agent delegation

Common Mistakes When Building Routers

Mistake 1: Router categories that overlap. If your router has categories for “billing questions” and “account issues,” many queries will legitimately fit both. The LLM will pick one inconsistently. Define categories that are mutually exclusive — no query should reasonably fit more than one. If you find overlap, merge the categories or narrow their definitions.

Mistake 2: Relying on the router’s raw output without normalization. The LLM might output "Booker", " booker ", "BOOKER", or even "booker." (with a period). Always normalize: .strip().lower(). For critical routing, also handle punctuation: .strip().lower().rstrip(".,!?").

Mistake 3: No fallback handler. What happens when the router produces a category you didn’t expect? Without a default handler, your code raises a KeyError or routes to nothing. Always include a “catch-all” handler that either asks for clarification or provides a generic helpful response.

Mistake 4: Router system prompt is too vague. “Classify this message” is not enough guidance. Tell the router exactly what categories exist, what distinguishes them, and what edge cases should be handled. Include few-shot examples for ambiguous cases. The more specific your routing prompt, the more reliable and consistent the routing becomes.

Mistake 5: Testing routing with only clear-cut examples. Your router may work perfectly on “Book me a flight to London” (clearly a booking request) but fail on “What if I wanted to change my booking?” (is this info or booking?). Test with ambiguous, edge-case inputs — these are where routing systems fail in production.

Key Takeaways

What routing is. Routing adds conditional logic to an agent — instead of always following the same path, the agent evaluates the input and chooses the appropriate handler.

The four methods. LLM-based is most flexible. Rule-based is fastest. Embedding-based handles semantic variation. ML classifier is most accurate for known categories.

The mechanism. A router classifies the input → outputs a decision → the dispatcher picks the matching handler → the handler runs.

In LangChain: explicit routing with RunnableBranch + conditions. Full control. More code.

In ADK: implicit routing via sub_agents and agent descriptions. Less code. Less debuggable.

Rule of thumb: If you can write an if statement to describe the routing logic — use rules or LangChain. If the categories are fuzzy and overlap — use LLM-based or embedding routing.

References

LangGraph Documentation — langchain.com
Google ADK Documentation — google.github.io/adk-docs
LangChain RunnableBranch — python.langchain.com

The Limitation of Chains

What Routing Is

The Four Types of Routing

How It Works: Step by Step

Practical Applications

Customer Service Bots

Document & Email Pipelines

Multi-Agent Research Systems

AI Coding Assistants

LangChain Implementation

Install

The Full Code, Explained Line by Line

What the Data Flow Looks Like

Google ADK Implementation

The Philosophy

Install

The Full Code, Explained

ADK Auto-Flow vs. Explicit Routing

LangChain vs. ADK: Which to Use

Common Mistakes When Building Routers

Key Takeaways

References

Enjoy Reading This Article?