ARTICLE · 19 MIN READ · JANUARY 21, 2026
Chapter 5: Tool Use (Function Calling)
LLMs are frozen in time and disconnected from the world. Tool use breaks those walls — letting agents call APIs, run code, query databases, and trigger real actions.
The Wall Every LLM Hits
Function calling: A specific feature of modern LLMs where instead of generating prose, the model outputs a structured request to call a specific function. The model says "call get_weather with city=London" as structured data (JSON), not free text.
JSON (JavaScript Object Notation): A simple text format for structured data. Example: {"name": "get_weather", "city": "London"}. It's language-agnostic and used everywhere APIs send data. LLMs output JSON for function calls because it's machine-parseable.
Python decorator (@): A function that wraps another function to add behavior. @langchain_tool before a function definition tells Python: "convert this function into a LangChain tool object." The @ is syntactic sugar — it's equivalent to writing my_func = langchain_tool(my_func).
Type hints: Optional annotations in Python that say what type a variable should be: def get_weather(city: str) -> dict:. The LangChain @tool decorator reads these hints to automatically build the tool's parameter schema.
Schema: A formal description of what data should look like. A tool's schema describes what parameters it accepts and their types. The LLM reads this schema to know how to call the tool correctly.
Every pattern in this series — chaining, routing, parallelization, reflection — operates entirely inside the model’s head. Input goes in, text comes out. The model reasons over what it already knows.
That’s a wall.
The model’s knowledge is frozen at its training cutoff. It can’t check today’s stock price. It can’t query your company’s database. It can’t send an email. It can’t run your Python code. For all its sophistication, without external connections it’s an encyclopedia — impressively complete, but static.
Tool use is what breaks the wall. Instead of generating a text answer, the model generates a function call — a structured request to execute an external piece of code. The framework runs the function, returns the result, and the model incorporates it into its response.
Why can’t the LLM just “do” things directly? This is a natural question. The LLM is a neural network — mathematically, it is a very large function that maps input tokens to probabilities over the next token. It cannot make HTTP requests. It cannot write to a database. It cannot execute code. It cannot control external systems. All it can do is produce text. Everything else requires the surrounding code — the framework, the agent executor, your application — to do the actual work.
Function calling works through a clean separation of responsibilities:
- The LLM’s role: Decide whether to call a tool and what arguments to pass, expressed as structured JSON text.
- The framework’s role: Parse that JSON, actually execute the function, and return the result as additional context.
- The loop: After getting the tool result, the LLM reads it and decides whether it has enough information to answer or whether to call another tool.
This separation gives you something critically important: observability and control. You can see exactly what the LLM wants to do (the JSON function call) before it happens. You can validate it, log it, rate-limit it, or reject it entirely. The LLM never has direct access to your systems — it just generates text describing what it would like to do, and your code decides whether to allow it.
What the JSON actually looks like. When the LLM decides it needs a tool, its output looks like:
{"name": "get_weather", "arguments": {"city": "London", "units": "celsius"}}
This is unambiguous and machine-parseable. There’s no interpretation needed — name tells you which function to call, arguments tells you exactly what to pass. Your code parses this JSON, calls get_weather(city="London", units="celsius"), gets back {"temperature": 15, "condition": "cloudy"}, and passes that result back to the LLM as additional context for generating its final response.
This is how an LLM becomes an agent that can sense, reason, and act.
How Function Calling Works: The 6 Steps
get_weather.{"name":"get_weather","args":{"city":"London"}}The critical insight is in step 3: the LLM does not run the tool — it describes which tool to run and with what arguments, as structured JSON. The framework does the actual execution. This separation keeps the model stateless and the tool execution safe and auditable.
The Three Categories of Tools
Information Retrieval
Pull live data the model doesn't have. APIs, databases, search engines, knowledge bases.
Action Execution
Trigger real-world effects. Send messages, update records, control systems.
Code Execution
Run code in a sandboxed interpreter. Get deterministic results for calculations, data processing.
Watch It Happen: Live Function Call Simulator
Pick a query. Watch the agent decide which tool to call, generate the JSON, get a result, and respond.
The JSON in step 3 is the LLM’s actual output — not prose, but a structured object your code can parse. The framework reads "name" to know which function to call, and "arguments" to know what to pass it.
Six Use Cases
Real-Time Data
Weather, stock prices, sports scores, news — anything the model's training data can't contain.
Tool: external REST APIDatabase Queries
Query company-specific data — orders, inventory, customer records — that will never be in training data.
Tool: SQL / NoSQL queryCalculations
Complex math, statistics, currency conversions — offload to deterministic code, not probabilistic generation.
Tool: code interpreterSending Communications
Email, Slack, SMS — the agent generates the content, the tool actually sends it.
Tool: messaging APICode Execution
Run user-provided code in a sandbox, debug errors, analyze outputs — beyond what the model can simulate.
Tool: sandboxed interpreterSystem Control
Smart home, IoT, browser automation — the agent instructs, the tool acts in the physical or digital world.
Tool: device / browser APILangChain: @tool + AgentExecutor
LangChain makes tool definition as simple as a Python decorator. The framework handles everything else: tool registration, LLM binding, execution, and response formatting.
import os
import asyncio
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool as langchain_tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
Why
langchain_core.tools.tool? The@tooldecorator inspects the function and converts it into aStructuredToolobject that contains the function name, the parameters schema (extracted from type hints), and the description (extracted from the docstring). The LLM reads all three to decide when and how to call it.
Defining a Tool
@langchain_tool
def search_information(query: str) -> str:
"""
Provides factual information on a given topic.
Use this tool to answer questions like 'capital of France'
or 'weather in London'.
"""
simulated_results = {
"weather in london": "Cloudy, 15°C.",
"capital of france": "The capital of France is Paris.",
"population of earth": "Approximately 8 billion people.",
}
return simulated_results.get(
query.lower(),
f"No specific data for '{query}'."
)
tools = [search_information]
The docstring is critical. When the LLM receives the tool definition, it sees:
name:"search_information"— what to put in"name"of the JSON calldescription: the full docstring — the LLM reads this to decide when to use the toolparameters:{"query": {"type": "string"}}— extracted from type hintsA vague docstring → wrong tool usage. A precise docstring → accurate routing. The docstring IS the tool’s interface contract with the LLM.
Building the Agent
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)
agent_prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{input}"),
("placeholder", "{agent_scratchpad}"), # ← the agent's working memory
])
agent = create_tool_calling_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
What is
{agent_scratchpad}? This placeholder is where the agent writes its intermediate steps — the tool calls it made, the results it received, and its reasoning about what to do next. Without it, the agent has no place to “think” between steps. It’s the equivalent of scratch paper for multi-step reasoning.
AgentExecutoris the loop. It repeatedly calls the agent until either:
- The agent produces a final answer (no more tool calls needed)
- A max iteration limit is hit
Each iteration: agent runs → decides tool → executor calls tool → result fed back → agent runs again.
The Execution Loop Visualized
get_weather based on docstring matchRunning Queries
async def run_agent(query: str):
response = await agent_executor.ainvoke({"input": query})
print(response["output"])
async def main():
await asyncio.gather(
run_agent("What is the capital of France?"),
run_agent("What's the weather like in London?"),
)
asyncio.run(main())
Why
ainvokeandgather?ainvokeis the async version ofinvoke. Usingasyncio.gatherfires multiple agent queries concurrently — each agent call is I/O-bound (waiting for the LLM API), so they run in parallel without blocking each other. This is the parallelization pattern applied to tool-using agents.
CrewAI: @tool + Role-Based Agents
CrewAI takes a different philosophy: agents are defined by role, goal, and backstory — almost like hiring a specialist. Tools are assigned to an agent’s persona rather than to a chain.
from crewai import Agent, Task, Crew
from crewai.tools import tool
Defining the Tool
@tool("Stock Price Lookup Tool")
def get_stock_price(ticker: str) -> float:
"""
Fetches the latest simulated stock price for a given ticker.
Returns the price as a float. Raises ValueError if not found.
"""
prices = {"AAPL": 178.15, "GOOGL": 1750.30, "MSFT": 425.50}
price = prices.get(ticker.upper())
if price is None:
raise ValueError(f"Ticker '{ticker}' not found.")
return price
Why raise
ValueErrorinstead of returning a string? Returning a string like"Not found"makes the agent think it has a valid answer — it might confidently report a price of “Not found”. Raising an exception forces the agent to handle the failure explicitly, leading to better error acknowledgment in the final response.
The Agent, Task, and Crew
financial_analyst = Agent(
role = 'Senior Financial Analyst',
goal = 'Analyze stock data and report key prices accurately.',
backstory = "Experienced analyst adept at using data sources to find stock information.",
verbose = True,
tools = [get_stock_price],
allow_delegation = False,
)
Why
role,goal,backstory? CrewAI injects these into the agent’s system prompt. The role frames what the agent IS (“Senior Financial Analyst”). The goal tells it what it’s trying to achieve. The backstory adds professional context that influences how it reasons and writes. Together they create a coherent persona — not just a generic assistant.
allow_delegation=False: prevents the agent from spawning sub-agents. Keep itFalsefor simple single-agent tasks.
task = Task(
description = (
"Find the simulated stock price for Apple (ticker: AAPL). "
"Use the Stock Price Lookup Tool. "
"If the ticker is not found, report that clearly."
),
expected_output = "A single sentence stating the AAPL price. e.g. 'The simulated stock price for AAPL is $178.15.'",
agent = financial_analyst,
)
crew = Crew(
agents = [financial_analyst],
tasks = [task],
verbose = True,
)
result = crew.kickoff()
expected_output: This is a key CrewAI feature. It tells the agent exactly what format the output should take. The agent evaluates its own output against this expectation — essentially a built-in reflection step.
crew.kickoff(): Starts the execution. CrewAI handles the agent loop internally, including tool invocation, result parsing, and task completion detection.
CrewAI Agent-Task-Crew Relationship
Google ADK: Built-in Tools + Code Execution
The ADK provides pre-built tools that require zero custom code. You just import them and pass them to an agent.
Google Search (Zero Configuration)
from google.adk.agents import Agent
from google.adk.tools import google_search
root_agent = Agent(
name = "basic_search_agent",
model = "gemini-2.0-flash-exp",
description = "Answers questions by searching the internet.",
instruction = "Answer questions by searching the web.",
tools = [google_search], # one line — full search capability
)
google_searchis a pre-built ADK tool — it connects to the Google Search API automatically, handles authentication, pagination, and result formatting. No API key configuration, no schema definition, no docstring to write. You get production-quality web search in one import.
async def call_agent(query: str):
session_service = InMemorySessionService()
session = await session_service.create_session(
app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID
)
runner = Runner(agent=root_agent, app_name=APP_NAME, session_service=session_service)
content = types.Content(role='user', parts=[types.Part(text=query)])
for event in runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content):
if event.is_final_response():
print(event.content.parts[0].text)
asyncio.run(call_agent("What's the latest AI news?"))
InMemorySessionService: stores conversation state in RAM. Everyrun()call has asession_id— this is how the agent maintains multi-turn context. Sessions are covered in depth in Chapter 8.
event.is_final_response(): the ADK streams events as the agent works. Most events are intermediate (tool calls, reasoning steps). Only one event marks the final answer. Always filter for this before reading the output.
Code Execution (Built-in Sandboxed Interpreter)
from google.adk.agents import LlmAgent
from google.adk.code_executors import BuiltInCodeExecutor
code_agent = LlmAgent(
name = "calculator_agent",
model = "gemini-2.0-flash",
code_executor = BuiltInCodeExecutor(), # sandboxed Python
instruction = """You are a calculator agent.
When given a math problem, write Python code to solve it.
Return only the final numerical result as plain text.""",
)
BuiltInCodeExecutor: gives the agent a sandboxed Python interpreter. The agent writes Python code, the executor runs it securely, and the result is returned. This is how the agent handles:
- Exact arithmetic (no rounding errors from the LLM)
- Data manipulation (sorting, filtering, aggregation)
- Problems that require deterministic computation
The LLM is good at reasoning about code.
BuiltInCodeExecutormakes it actually run code. The combination is far more powerful than either alone.
ADK Tool Execution Flow
Framework Comparison
| LangChain | CrewAI | Google ADK | |
|---|---|---|---|
| Tool definition | @tool decorator | @tool("Name") decorator | FunctionTool or built-ins |
| Built-in tools | Community integrations | Relies on LangChain tools | google_search, BuiltInCodeExecutor, VSearchAgent |
| Agent model | Chain / graph | Role + goal + backstory | LlmAgent + orchestrators |
| Best for | Composable pipelines | Role-based multi-agent teams | Google Cloud / enterprise |
| Async | Full (ainvoke, astream) | Partial | run_async streaming |
At a Glance
LLMs can't call APIs, query databases, or run code on their own. Tool use lets them generate a structured JSON request — which the framework executes — bridging language reasoning with real-world action.
Training data is frozen. Real-world tasks require live data, exact computation, and system interaction. Tool use is the mechanism that makes agents actually useful — not just fluent.
Use tool calling whenever the agent needs to break out of its training data: real-time info, private data, exact math, code execution, or triggering real-world actions.
How Tool Calling Works Under the Hood
When an LLM produces a function call, what exactly happens inside LangChain or ADK? Let’s trace through the complete execution of a single tool-using agent from start to finish.
Step 1: Tool definition sent to the model. When you create AgentExecutor(agent=agent, tools=[search_information]), LangChain serializes your tool’s metadata — its name, description (from the docstring), and parameter schema (from type hints) — into a format the LLM can read. For OpenAI-compatible models, this becomes a tools JSON array in the API request. For Gemini, it becomes a function_declarations array. The LLM sees these tool definitions before it sees your user message.
Step 2: LLM decides whether to use a tool. The LLM reads the user’s message and the tool definitions simultaneously. If the message is “What’s the capital of France?” and the tool definition says “Use this tool to find answers to factual questions,” the LLM reasons: “This is a factual question, I should use the tool.” This reasoning happens inside the model — you don’t write code for it. The model’s training on millions of examples of “when to use tools” informs this decision.
Step 3: LLM generates the function call. If the LLM decides to use a tool, instead of generating prose, it outputs a structured function call: {"name": "search_information", "arguments": {"query": "capital of france"}}. This is still text output from the model — just formatted as JSON instead of prose.
Step 4: AgentExecutor intercepts and executes. LangChain’s AgentExecutor parses the model’s output. If it sees a function call (not prose), it: (1) looks up the function name in its registered tools, (2) calls the actual Python function with the provided arguments, (3) captures the return value.
Step 5: Result injected back as context. The tool’s return value is formatted as a “Tool Result” message and added to the conversation history. The full conversation — original message + tool call + tool result — is sent back to the LLM. Now the LLM has the factual information it needed and can formulate a final answer.
Step 6: Loop terminates. The loop ends when the LLM produces a final text response (not a function call), or when max_iterations is reached.
This is called the ReAct pattern (Reason + Act) — the model alternates between reasoning about what to do and acting (calling a tool) until it has enough information to answer.
Why the Docstring Is the Most Important Part of Your Tool
The docstring is how the LLM decides whether to call your tool. The LLM reads the docstring and matches it against the user’s request. A vague docstring means unpredictable tool selection. A precise docstring means reliable tool selection.
Compare these two docstrings for the same function:
Vague (bad): def get_stock_price(ticker: str) -> float: "Gets stock price."
Precise (good):
def get_stock_price(ticker: str) -> float:
"""
Fetches the current market price for a publicly traded stock.
Use this when the user asks for the current price, value, or quote
of a stock, share, or publicly traded company.
Args:
ticker: The stock ticker symbol (e.g., 'AAPL' for Apple, 'GOOGL' for Alphabet).
Must be uppercase.
Returns:
The current price in USD as a float.
Raises:
ValueError: If the ticker symbol is not found.
"""
The second docstring tells the LLM: when to use this tool, what the parameter means and its exact format requirements, what it returns, and what can go wrong. The LLM uses all of this to make better decisions.
Common Mistakes When Building Tool-Using Agents
Mistake 1: Vague docstrings. As explained above, the docstring is the interface between your tool and the LLM’s decision-making. “Gets weather” is useless. “Returns current weather conditions for a specified city, including temperature in Celsius, precipitation, and wind speed. Use this for any question about current weather, forecast, or climate conditions in a specific location” is what the LLM needs.
Mistake 2: Tools that return raw API responses. Never return a raw API JSON blob from your tool. The LLM has to parse it and reason about it. Instead, extract the relevant fields and return a clean, readable summary: “Temperature: 15°C, Conditions: Cloudy, Wind: 12 km/h” rather than 200 lines of JSON.
Mistake 3: No error handling in tool functions. If your tool throws an uncaught exception, the agent executor crashes. Always catch exceptions and return an error description string: return f"Error: Could not find ticker '{ticker}'. Please use a valid stock ticker symbol." This allows the LLM to understand what went wrong and either retry with different arguments or tell the user.
Mistake 4: Too many tools. LLMs perform worse when given many tools — it’s harder to decide which one to use when there are 50 options. Keep tool sets focused. If you need many tools, consider hierarchical routing: a primary router that selects a specialist sub-agent, each of which has a small focused tool set.
Mistake 5: Tools that have side effects the user doesn’t expect. “Send email” and “Process payment” are irrevocable actions. Never make these tools directly callable by the agent without a confirmation step. Either require human approval before executing, or make the tool first return a preview: “I will send this email to john@example.com with the subject ‘Meeting tomorrow’ and body ‘…’. Confirm? (yes/no)”.
Key Takeaways
- The LLM doesn’t run the tool. It generates a structured JSON object specifying which tool to call and with what arguments. The framework executes it. This separation is what makes tool use safe and auditable.
- The docstring is the interface. In LangChain and CrewAI, the function’s docstring is what the LLM reads to decide when to use a tool. A clear, specific docstring = accurate tool routing.
- ADK built-in tools are production-ready.
google_search,BuiltInCodeExecutor, andVSearchAgentrequire zero configuration. For custom tools, all three frameworks use decorator-based definitions. -
AgentExecutor(LangChain) is an execution loop — it keeps calling the agent until no more tool calls are needed. The{agent_scratchpad}placeholder is the agent’s scratch pad for intermediate thoughts. - CrewAI’s role/goal/backstory creates an agent persona that influences reasoning style and output formatting — not just a system prompt, but a professional identity.
- Code execution is qualitatively different from text generation — it gives agents deterministic, exact answers for math, data manipulation, and computation. The LLM reasons about what code to write; the executor runs it.
Next up — Chapter 6: Planning, where agents stop reacting to individual inputs and start building structured multi-step plans to achieve complex goals.
Enjoy Reading This Article?
Here are some more articles you might like to read next: