ARTICLE · 15 MIN READ · JANUARY 09, 2026
Chapter 2: Routing
Prompt chains are predictable. The real world isn't. Routing gives agents the ability to make decisions — picking the right tool, sub-agent, or workflow based on what's actually in front of them.
The Limitation of Chains
Intent classification: Figuring out what a user actually *wants* from their message. "My package hasn't arrived" → intent is "delivery complaint". This is the first step in any routing system.
Embeddings / Vectors: A way to convert text into a list of numbers (a vector) that captures its *meaning*. Similar sentences get similar vectors. This lets us find "semantically related" content without exact keyword matching. e.g., "what's the weather?" and "is it raining?" would have similar vectors even though they share no words.
Cosine similarity: A math formula that measures how similar two vectors are. Two identical vectors = similarity of 1. Completely unrelated vectors ≈ 0. Used in embedding-based routing to find the closest-matching route.
Docstring: A comment at the top of a Python function (inside triple quotes) that explains what it does. In LangChain and ADK, the LLM reads the docstring to decide when to call that function — so the docstring is actually part of the interface.
In Chapter 1, we saw how breaking a task into sequential steps makes LLMs more reliable. Every step has one job. Output of step N feeds step N+1. Clean, predictable.
But here’s the problem: what if you don’t know which sequence to run until you see the input?
Imagine a customer support bot. Every incoming message is different:
- “Where’s my order?” → check the database
- “Your product broke after one use” → escalate to a human
- “How do I reset my password?” → search the knowledge base
- “blah blah blah gibberish” → ask for clarification
You can’t write a fixed chain for this. The right action depends on what the user actually said. You need the system to decide before it acts.
That’s routing.
Routing is one of the most important patterns in software engineering generally, not just AI. Any time a system receives varied inputs and needs to direct them to different handlers — a router does that job. Web servers route HTTP requests to different endpoints. Email servers route messages to different inboxes. Customer support call centers route callers to different departments.
In LLM-powered systems, routing is the pattern that transforms a single intelligent interface into a system that can handle arbitrarily diverse requests with specialist-level precision for each.
The key insight: Routing separates the decision from the action. The router decides which path to take. Each handler executes one specific action perfectly. This separation means you can upgrade any individual handler without touching the routing logic, and you can update routing rules without touching any handler. In software engineering, this is called separation of concerns — one of the most powerful principles in the field.
What Routing Is
Routing adds conditional logic to an agent’s execution. Instead of always following the same path, the agent first evaluates the input — then chooses which path to take.
The router is the decision-maker. Everything downstream is a handler — a function, tool, sub-agent, or prompt chain that handles one specific type of request.
The key insight: Routing separates what to do from how to do it. The router decides. The handler executes. Neither knows about the other’s internals.
This separation is what makes routed systems easy to extend. Adding a new capability means: (1) write a new handler for it, and (2) add one routing rule pointing to that handler. Nothing else changes. If you instead had one massive prompt that handled everything, adding a new capability means carefully editing that prompt without breaking anything that already works — much harder.
This is why routing is used everywhere in professional software systems. It’s not just about LLMs — it’s about building systems that can grow without becoming a tangled mess.
The Four Types of Routing
Not all routers work the same way. There are four distinct mechanisms — each with dramatically different trade-offs in speed, flexibility, and accuracy.
Understanding why each mechanism works the way it does requires understanding what the router is actually doing in each case:
The fundamental routing question: Given a user’s message, how do you decide which handler to invoke?
You could:
- Ask an LLM to think about it and tell you (LLM-based routing)
- Check the message for specific keywords or patterns (rule-based routing)
- Find which pre-defined route is most semantically similar to the message (embedding-based routing)
- Run the message through a small machine learning classifier (ML classifier routing)
Each approach answers the same question — “which handler?” — but through completely different mechanisms with different strengths and weaknesses.
Quick summary of each method:
LLM-based routing — You ask the LLM itself: “Read this query. Output exactly one word: booking, info, or unclear.” The most flexible approach. Handles nuanced or unusual inputs. Trade-off: one extra API call per request, which adds latency and cost.
Rule-based routing — Pure code. if "flight" in query or "hotel" in query → booking. Zero API cost, sub-millisecond speed. Falls apart the moment a user says something you didn’t explicitly anticipate.
Embedding-based routing — Convert the query into a vector (a list of numbers capturing its meaning). Compare to pre-computed vectors for each route. Route to the closest match. Handles semantic variation well (“get me a room” matches “book a hotel”). Needs embedding infrastructure.
ML classifier routing — A small discriminative model fine-tuned on labelled examples. “Here are 500 examples of booking requests, 500 info requests…” Fast at inference, very accurate for known categories. Needs training data, and retraining every time you add a new route.
How It Works: Step by Step
Let’s walk through exactly what happens when a routing system processes a request.
Practical Applications
Customer Service Bots
The most obvious use case. A single entry point receives all user messages. The router classifies intent and dispatches to the right sub-agent:
Without routing, you’d need separate endpoints for each query type — and users would have to know which one to use.
Document & Email Pipelines
Incoming emails get classified before any processing happens:
- Sales lead → CRM ingestion workflow
- Support ticket → ticketing system + priority score
- Invoice → accounts payable extraction chain
- Spam → discard
The router is the first step. Everything after it is specialised.
Multi-Agent Research Systems
A research system with separate agents for web search, paper summarisation, data analysis, and report writing needs a router to decide which agent gets each sub-task. The router looks at the current objective and dispatches accordingly.
AI Coding Assistants
Before passing a code snippet to any tool, the assistant routes based on language × intent:
- Python + debug → Python linter + error explainer
- TypeScript + explain → TypeScript-aware explainer
- SQL + optimise → query plan analyser
Two routing dimensions at once: language detection and intent classification.
LangChain Implementation
Here’s a working routing system using LangChain and Google’s Gemini. A coordinator LLM classifies the intent, then RunnableBranch dispatches to the right handler.
Install
pip install langchain langgraph langchain-google-genai python-dotenv
The Full Code, Explained Line by Line
1. Imports
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough, RunnableBranch
-
ChatGoogleGenerativeAI— LangChain’s wrapper around Gemini. Swap forChatOpenAIif using OpenAI. -
ChatPromptTemplate— builds reusable prompts with named placeholders. -
StrOutputParser— strips the LLM’s response object down to a plain Python string. -
RunnablePassthrough— passes input through unchanged. Used to keep the original request available downstream. -
RunnableBranch— the routing mechanism. Takes a list of(condition, runnable)pairs. Runs the first branch whose condition isTrue.
2. Initialise the model
import os
from dotenv import load_dotenv
load_dotenv() # reads GOOGLE_API_KEY from .env
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
# temperature=0 → deterministic output
# Critical for routing: you need a consistent, single-word answer every time
3. Define the handler functions
These simulate what each specialised sub-agent would actually do:
def booking_handler(request: str) -> str:
"""Called when the router decides this is a booking request."""
# In production: call a flight/hotel API, check availability, etc.
print("→ DELEGATING TO: Booking Handler")
return f"Booking confirmed for: '{request}'"
def info_handler(request: str) -> str:
"""Called when the router decides this is an information request."""
# In production: search a knowledge base, fetch from a database, etc.
print("→ DELEGATING TO: Info Handler")
return f"Information retrieved for: '{request}'"
def unclear_handler(request: str) -> str:
"""Fallback for anything the router can't classify."""
print("→ DELEGATING TO: Clarification Handler")
return f"Could not process: '{request}'. Please clarify your request."
4. Build the router chain
This is the classification step — the LLM reads the request and outputs exactly one word:
router_prompt = ChatPromptTemplate.from_messages([
("system", """Analyze the user's request and output the correct category.
Rules:
- If the request involves booking flights or hotels → output: booker
- If the request is a general information question → output: info
- If the request is unclear or doesn't fit either → output: unclear
IMPORTANT: Output exactly ONE word. No punctuation, no explanation."""),
("user", "{request}")
])
# Chain: fill the prompt → call LLM → strip to plain string
# Result is a single word: "booker", "info", or "unclear"
router_chain = router_prompt | llm | StrOutputParser()
Why temperature=0 matters here: if temperature > 0, the model might output "Booker" (capitalised), "booking", or "I think this is a booking request..." — all of which break the downstream string comparison.
5. Build the routing branch
# RunnableBranch takes:
# [(condition_fn, runnable), (condition_fn, runnable), ..., default_runnable]
# It evaluates conditions in order and runs the FIRST matching branch.
# The default (no condition) runs if nothing matches.
delegation = RunnableBranch(
# condition: is the decision "booker"?
# x is the dict {"decision": "booker", "request": {...}}
(
lambda x: x["decision"].strip().lower() == "booker",
RunnablePassthrough.assign(
output=lambda x: booking_handler(x["request"]["request"])
)
),
(
lambda x: x["decision"].strip().lower() == "info",
RunnablePassthrough.assign(
output=lambda x: info_handler(x["request"]["request"])
)
),
# Default branch — runs if neither condition matched
RunnablePassthrough.assign(
output=lambda x: unclear_handler(x["request"]["request"])
),
)
6. Combine into a single runnable
# Step 1: run router_chain → get decision
# Step 2: keep original request alongside the decision
# Step 3: delegation branch picks the right handler
# Step 4: extract just the output string
coordinator = (
{
"decision": router_chain, # ← runs classifier, stores result
"request": RunnablePassthrough() # ← passes original input through unchanged
}
| delegation
| (lambda x: x["output"]) # ← extract the handler's response string
)
The RunnablePassthrough() here is critical: without it, the delegation branch would only see the router’s decision (a single word), and couldn’t pass the original request to the handler.
7. Run it
# Each invoke() call: classify → route → handle → return
print(coordinator.invoke({"request": "Book me a flight to London."}))
# → DELEGATING TO: Booking Handler
# → "Booking confirmed for: 'Book me a flight to London.'"
print(coordinator.invoke({"request": "What is the capital of Italy?"}))
# → DELEGATING TO: Info Handler
# → "Information retrieved for: 'What is the capital of Italy?'"
print(coordinator.invoke({"request": "Do the thing."}))
# → DELEGATING TO: Clarification Handler
# → "Could not process: 'Do the thing.'. Please clarify your request."
What the Data Flow Looks Like
Google ADK Implementation
The Agent Development Kit (ADK) takes a fundamentally different approach. Instead of building explicit routing logic in code, you define agents with descriptions and tools with docstrings, and let the framework figure out routing.
The Philosophy
| LangChain approach | ADK approach |
|---|---|
| You write the routing logic explicitly | The framework routes via the LLM’s understanding of agent descriptions |
RunnableBranch with explicit conditions | sub_agents= list — ADK Auto-Flow handles dispatch |
| Full control over routing criteria | More automatic, less code |
| Better for complex conditional logic | Better for simple delegation patterns |
Install
pip install google-adk google-generativeai python-dotenv
The Full Code, Explained
1. Define tool functions
In ADK, a tool is just a Python function with a descriptive docstring. The docstring is critical — ADK’s LLM reads it to decide when to call this tool.
from google.adk.tools import FunctionTool
def booking_handler(request: str) -> str:
"""
Handles booking requests for flights and hotels.
Call this tool when the user wants to book, reserve, or schedule
a flight, hotel room, or accommodation.
Args:
request: The user's booking request in natural language.
Returns:
Confirmation that the booking action was simulated.
"""
print("→ Booking Handler called")
return f"Booking action simulated for: '{request}'"
def info_handler(request: str) -> str:
"""
Handles general information and factual questions.
Call this tool when the user asks a factual question, wants
an explanation, or is looking for general knowledge.
Args:
request: The user's information request.
Returns:
The result of the simulated information retrieval.
"""
print("→ Info Handler called")
return f"Information retrieved for: '{request}'"
# Wrap functions as FunctionTool objects
booking_tool = FunctionTool(booking_handler)
info_tool = FunctionTool(info_handler)
2. Define specialised sub-agents
Each sub-agent has a description — used by the coordinator to decide which agent to delegate to:
from google.adk.agents import Agent
booking_agent = Agent(
name="Booker",
model="gemini-2.0-flash",
description="Specialist agent for all flight and hotel booking requests.",
tools=[booking_tool] # this agent can call booking_handler
)
info_agent = Agent(
name="Info",
model="gemini-2.0-flash",
description="Specialist agent for general information and factual questions.",
tools=[info_tool] # this agent can call info_handler
)
3. Define the coordinator
The coordinator’s instruction tells it what its job is. The sub_agents list enables ADK’s Auto-Flow: the coordinator reads the user message and the sub-agent descriptions, then delegates automatically.
coordinator = Agent(
name="Coordinator",
model="gemini-2.0-flash",
instruction="""
You are a coordinator. Your ONLY job is to route incoming requests.
Do NOT answer the user directly. Delegate every request to a sub-agent.
- Flight/hotel booking requests → delegate to Booker
- All information/factual questions → delegate to Info
""",
description="Routes user requests to the appropriate specialist.",
sub_agents=[booking_agent, info_agent] # enables Auto-Flow routing
)
4. Run it
import asyncio, uuid
from google.adk.runners import InMemoryRunner
from google.genai import types
async def run(runner: InMemoryRunner, request: str) -> str:
user_id = "user_001"
session_id = str(uuid.uuid4())
# Create a fresh session for this request
await runner.session_service.create_session(
app_name=runner.app_name,
user_id=user_id,
session_id=session_id,
)
final = ""
for event in runner.run(
user_id=user_id,
session_id=session_id,
new_message=types.Content(
role="user",
parts=[types.Part(text=request)]
),
):
if event.is_final_response() and event.content:
# Extract the text from the final response event
if event.content.parts:
final = "".join(p.text for p in event.content.parts if p.text)
break
return final
async def main():
runner = InMemoryRunner(coordinator)
result = await run(runner, "Book me a hotel in Paris.")
print("Result:", result)
# → Booking Handler called
# → "Booking action simulated for: 'Book me a hotel in Paris.'"
result = await run(runner, "What is the tallest mountain in the world?")
print("Result:", result)
# → Info Handler called
# → "Information retrieved for: '...'"
if __name__ == "__main__":
import nest_asyncio; nest_asyncio.apply()
asyncio.run(main())
ADK Auto-Flow vs. Explicit Routing
Here’s what ADK does automatically that you write manually in LangChain:
User message
↓
Coordinator LLM reads:
- The user's message
- The description of each sub-agent
- Its own instruction
↓
Decides: "This sounds like a booking. Delegate to Booker."
↓
Booker agent activates
↓
Booker LLM reads the request + its tools
↓
Calls booking_handler(request=...)
↓
Returns result to coordinator
↓
Coordinator returns final response to user
The routing decision is embedded in the LLM’s reasoning — you don’t write it explicitly. This is more concise, but also harder to debug when it routes incorrectly.
LangChain vs. ADK: Which to Use
| LangChain + RunnableBranch | Google ADK Auto-Flow | |
|---|---|---|
| Routing control | You write explicit conditions | LLM decides based on descriptions |
| Debuggability | Easy — conditions are plain Python | Harder — routing is inside LLM reasoning |
| Flexibility | Very high | Medium |
| Code volume | More | Less |
| Best for | Complex multi-step routing, state machines | Simple agent delegation |
Common Mistakes When Building Routers
Mistake 1: Router categories that overlap. If your router has categories for “billing questions” and “account issues,” many queries will legitimately fit both. The LLM will pick one inconsistently. Define categories that are mutually exclusive — no query should reasonably fit more than one. If you find overlap, merge the categories or narrow their definitions.
Mistake 2: Relying on the router’s raw output without normalization. The LLM might output "Booker", " booker ", "BOOKER", or even "booker." (with a period). Always normalize: .strip().lower(). For critical routing, also handle punctuation: .strip().lower().rstrip(".,!?").
Mistake 3: No fallback handler. What happens when the router produces a category you didn’t expect? Without a default handler, your code raises a KeyError or routes to nothing. Always include a “catch-all” handler that either asks for clarification or provides a generic helpful response.
Mistake 4: Router system prompt is too vague. “Classify this message” is not enough guidance. Tell the router exactly what categories exist, what distinguishes them, and what edge cases should be handled. Include few-shot examples for ambiguous cases. The more specific your routing prompt, the more reliable and consistent the routing becomes.
Mistake 5: Testing routing with only clear-cut examples. Your router may work perfectly on “Book me a flight to London” (clearly a booking request) but fail on “What if I wanted to change my booking?” (is this info or booking?). Test with ambiguous, edge-case inputs — these are where routing systems fail in production.
Key Takeaways
What routing is. Routing adds conditional logic to an agent — instead of always following the same path, the agent evaluates the input and chooses the appropriate handler.
The four methods. LLM-based is most flexible. Rule-based is fastest. Embedding-based handles semantic variation. ML classifier is most accurate for known categories.
The mechanism. A router classifies the input → outputs a decision → the dispatcher picks the matching handler → the handler runs.
In LangChain: explicit routing with RunnableBranch + conditions. Full control. More code.
In ADK: implicit routing via sub_agents and agent descriptions. Less code. Less debuggable.
Rule of thumb: If you can write an
ifstatement to describe the routing logic — use rules or LangChain. If the categories are fuzzy and overlap — use LLM-based or embedding routing.
References
- LangGraph Documentation — langchain.com
- Google ADK Documentation — google.github.io/adk-docs
- LangChain RunnableBranch — python.langchain.com
Enjoy Reading This Article?
Here are some more articles you might like to read next: