ARTICLE  ·  19 MIN READ  ·  JANUARY 21, 2026

Chapter 5: Tool Use (Function Calling)

LLMs are frozen in time and disconnected from the world. Tool use breaks those walls — letting agents call APIs, run code, query databases, and trigger real actions.


The Wall Every LLM Hits

Before You Start — Key Terms Explained

Function calling: A specific feature of modern LLMs where instead of generating prose, the model outputs a structured request to call a specific function. The model says "call get_weather with city=London" as structured data (JSON), not free text.

JSON (JavaScript Object Notation): A simple text format for structured data. Example: {"name": "get_weather", "city": "London"}. It's language-agnostic and used everywhere APIs send data. LLMs output JSON for function calls because it's machine-parseable.

Python decorator (@): A function that wraps another function to add behavior. @langchain_tool before a function definition tells Python: "convert this function into a LangChain tool object." The @ is syntactic sugar — it's equivalent to writing my_func = langchain_tool(my_func).

Type hints: Optional annotations in Python that say what type a variable should be: def get_weather(city: str) -> dict:. The LangChain @tool decorator reads these hints to automatically build the tool's parameter schema.

Schema: A formal description of what data should look like. A tool's schema describes what parameters it accepts and their types. The LLM reads this schema to know how to call the tool correctly.

Every pattern in this series — chaining, routing, parallelization, reflection — operates entirely inside the model’s head. Input goes in, text comes out. The model reasons over what it already knows.

That’s a wall.

The model’s knowledge is frozen at its training cutoff. It can’t check today’s stock price. It can’t query your company’s database. It can’t send an email. It can’t run your Python code. For all its sophistication, without external connections it’s an encyclopedia — impressively complete, but static.

Tool use is what breaks the wall. Instead of generating a text answer, the model generates a function call — a structured request to execute an external piece of code. The framework runs the function, returns the result, and the model incorporates it into its response.

Why can’t the LLM just “do” things directly? This is a natural question. The LLM is a neural network — mathematically, it is a very large function that maps input tokens to probabilities over the next token. It cannot make HTTP requests. It cannot write to a database. It cannot execute code. It cannot control external systems. All it can do is produce text. Everything else requires the surrounding code — the framework, the agent executor, your application — to do the actual work.

Function calling works through a clean separation of responsibilities:

  • The LLM’s role: Decide whether to call a tool and what arguments to pass, expressed as structured JSON text.
  • The framework’s role: Parse that JSON, actually execute the function, and return the result as additional context.
  • The loop: After getting the tool result, the LLM reads it and decides whether it has enough information to answer or whether to call another tool.

This separation gives you something critically important: observability and control. You can see exactly what the LLM wants to do (the JSON function call) before it happens. You can validate it, log it, rate-limit it, or reject it entirely. The LLM never has direct access to your systems — it just generates text describing what it would like to do, and your code decides whether to allow it.

What the JSON actually looks like. When the LLM decides it needs a tool, its output looks like:

{"name": "get_weather", "arguments": {"city": "London", "units": "celsius"}}

This is unambiguous and machine-parseable. There’s no interpretation needed — name tells you which function to call, arguments tells you exactly what to pass. Your code parses this JSON, calls get_weather(city="London", units="celsius"), gets back {"temperature": 15, "condition": "cloudy"}, and passes that result back to the LLM as additional context for generating its final response.

This is how an LLM becomes an agent that can sense, reason, and act.


How Function Calling Works: The 6 Steps

TOOL CALL LIFECYCLE
01
User Query
"What's the weather in London?"
02
LLM Decides
Reads query + tool definitions. Picks get_weather.
03
JSON Generated
{"name":"get_weather","args":{"city":"London"}}
06
Final Response
"It's 15°C and cloudy in London right now."
05
LLM Processes
Receives tool result, formulates response.
04
Tool Executes
Framework calls actual weather API. Returns JSON result.

The critical insight is in step 3: the LLM does not run the tool — it describes which tool to run and with what arguments, as structured JSON. The framework does the actual execution. This separation keeps the model stateless and the tool execution safe and auditable.


The Three Categories of Tools

🔍

Information Retrieval

Pull live data the model doesn't have. APIs, databases, search engines, knowledge bases.

Weather APIStock pricesGoogle SearchCompany DB

Action Execution

Trigger real-world effects. Send messages, update records, control systems.

Send emailPost Slack msgSmart homePayment API
💻

Code Execution

Run code in a sandboxed interpreter. Get deterministic results for calculations, data processing.

Python sandboxSQL runnerMath evalData analysis

Watch It Happen: Live Function Call Simulator

Pick a query. Watch the agent decide which tool to call, generate the JSON, get a result, and respond.

FUNCTION CALL SIMULATOR
USER QUERY What's the weather in Tokyo right now?

The JSON in step 3 is the LLM’s actual output — not prose, but a structured object your code can parse. The framework reads "name" to know which function to call, and "arguments" to know what to pass it.


Six Use Cases

01

Real-Time Data

Weather, stock prices, sports scores, news — anything the model's training data can't contain.

Tool: external REST API
02

Database Queries

Query company-specific data — orders, inventory, customer records — that will never be in training data.

Tool: SQL / NoSQL query
03

Calculations

Complex math, statistics, currency conversions — offload to deterministic code, not probabilistic generation.

Tool: code interpreter
04

Sending Communications

Email, Slack, SMS — the agent generates the content, the tool actually sends it.

Tool: messaging API
05

Code Execution

Run user-provided code in a sandbox, debug errors, analyze outputs — beyond what the model can simulate.

Tool: sandboxed interpreter
06

System Control

Smart home, IoT, browser automation — the agent instructs, the tool acts in the physical or digital world.

Tool: device / browser API

LangChain: @tool + AgentExecutor

LangChain makes tool definition as simple as a Python decorator. The framework handles everything else: tool registration, LLM binding, execution, and response formatting.

import os
import asyncio
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.tools import tool as langchain_tool
from langchain.agents import create_tool_calling_agent, AgentExecutor

Why langchain_core.tools.tool? The @tool decorator inspects the function and converts it into a StructuredTool object that contains the function name, the parameters schema (extracted from type hints), and the description (extracted from the docstring). The LLM reads all three to decide when and how to call it.

Defining a Tool

@langchain_tool
def search_information(query: str) -> str:
    """
    Provides factual information on a given topic.
    Use this tool to answer questions like 'capital of France'
    or 'weather in London'.
    """
    simulated_results = {
        "weather in london":   "Cloudy, 15°C.",
        "capital of france":   "The capital of France is Paris.",
        "population of earth": "Approximately 8 billion people.",
    }
    return simulated_results.get(
        query.lower(),
        f"No specific data for '{query}'."
    )

tools = [search_information]

The docstring is critical. When the LLM receives the tool definition, it sees:

  • name: "search_information" — what to put in "name" of the JSON call
  • description: the full docstring — the LLM reads this to decide when to use the tool
  • parameters: {"query": {"type": "string"}} — extracted from type hints

A vague docstring → wrong tool usage. A precise docstring → accurate routing. The docstring IS the tool’s interface contract with the LLM.

Building the Agent

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)

agent_prompt = ChatPromptTemplate.from_messages([
    ("system",      "You are a helpful assistant."),
    ("human",       "{input}"),
    ("placeholder", "{agent_scratchpad}"),   # ← the agent's working memory
])

agent = create_tool_calling_agent(llm, tools, agent_prompt)
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)

What is {agent_scratchpad}? This placeholder is where the agent writes its intermediate steps — the tool calls it made, the results it received, and its reasoning about what to do next. Without it, the agent has no place to “think” between steps. It’s the equivalent of scratch paper for multi-step reasoning.

AgentExecutor is the loop. It repeatedly calls the agent until either:

  1. The agent produces a final answer (no more tool calls needed)
  2. A max iteration limit is hit

Each iteration: agent runs → decides tool → executor calls tool → result fed back → agent runs again.

The Execution Loop Visualized

TOOL CALL LIFECYCLE
01 · User Query
"What's the weather in London?"
02 · LLM Receives Query + Tool Definitions
Model reads available tools and their descriptions
03 · LLM Decides: Tool Needed
Selects get_weather based on docstring match
04 · JSON Call Generated
{"name":"get_weather","args":{"city":"London"}}
05 · Framework Executes Tool
Calls actual API · returns structured result
06 · LLM Formulates Response
"It's 15°C and cloudy in London right now."

Running Queries

async def run_agent(query: str):
    response = await agent_executor.ainvoke({"input": query})
    print(response["output"])

async def main():
    await asyncio.gather(
        run_agent("What is the capital of France?"),
        run_agent("What's the weather like in London?"),
    )

asyncio.run(main())

Why ainvoke and gather? ainvoke is the async version of invoke. Using asyncio.gather fires multiple agent queries concurrently — each agent call is I/O-bound (waiting for the LLM API), so they run in parallel without blocking each other. This is the parallelization pattern applied to tool-using agents.


CrewAI: @tool + Role-Based Agents

CrewAI takes a different philosophy: agents are defined by role, goal, and backstory — almost like hiring a specialist. Tools are assigned to an agent’s persona rather than to a chain.

from crewai import Agent, Task, Crew
from crewai.tools import tool

Defining the Tool

@tool("Stock Price Lookup Tool")
def get_stock_price(ticker: str) -> float:
    """
    Fetches the latest simulated stock price for a given ticker.
    Returns the price as a float. Raises ValueError if not found.
    """
    prices = {"AAPL": 178.15, "GOOGL": 1750.30, "MSFT": 425.50}
    price = prices.get(ticker.upper())
    if price is None:
        raise ValueError(f"Ticker '{ticker}' not found.")
    return price

Why raise ValueError instead of returning a string? Returning a string like "Not found" makes the agent think it has a valid answer — it might confidently report a price of “Not found”. Raising an exception forces the agent to handle the failure explicitly, leading to better error acknowledgment in the final response.

The Agent, Task, and Crew

financial_analyst = Agent(
    role      = 'Senior Financial Analyst',
    goal      = 'Analyze stock data and report key prices accurately.',
    backstory = "Experienced analyst adept at using data sources to find stock information.",
    verbose   = True,
    tools     = [get_stock_price],
    allow_delegation = False,
)

Why role, goal, backstory? CrewAI injects these into the agent’s system prompt. The role frames what the agent IS (“Senior Financial Analyst”). The goal tells it what it’s trying to achieve. The backstory adds professional context that influences how it reasons and writes. Together they create a coherent persona — not just a generic assistant.

allow_delegation=False: prevents the agent from spawning sub-agents. Keep it False for simple single-agent tasks.

task = Task(
    description = (
        "Find the simulated stock price for Apple (ticker: AAPL). "
        "Use the Stock Price Lookup Tool. "
        "If the ticker is not found, report that clearly."
    ),
    expected_output = "A single sentence stating the AAPL price. e.g. 'The simulated stock price for AAPL is $178.15.'",
    agent = financial_analyst,
)

crew = Crew(
    agents = [financial_analyst],
    tasks  = [task],
    verbose = True,
)

result = crew.kickoff()

expected_output: This is a key CrewAI feature. It tells the agent exactly what format the output should take. The agent evaluates its own output against this expectation — essentially a built-in reflection step.

crew.kickoff(): Starts the execution. CrewAI handles the agent loop internally, including tool invocation, result parsing, and task completion detection.

CrewAI Agent-Task-Crew Relationship

CREWAI AGENT-TASK-CREW — how the three pieces fit together
User Request
"What is AAPL's stock price?"
Crew
orchestrates agents and tasks · calls kickoff()
Task: Find AAPL Price
description + expected_output · assigned to the Financial Analyst agent
Agent — Senior Financial Analyst
role + goal + backstory define the persona · has access to get_stock_price tool
get_stock_price tool
agent calls it with ticker="AAPL" · tool returns 178.15 · agent formats the response
Final Result
"The simulated stock price for AAPL is $178.15"

Google ADK: Built-in Tools + Code Execution

The ADK provides pre-built tools that require zero custom code. You just import them and pass them to an agent.

Google Search (Zero Configuration)

from google.adk.agents import Agent
from google.adk.tools import google_search

root_agent = Agent(
    name        = "basic_search_agent",
    model       = "gemini-2.0-flash-exp",
    description = "Answers questions by searching the internet.",
    instruction = "Answer questions by searching the web.",
    tools       = [google_search],   # one line — full search capability
)

google_search is a pre-built ADK tool — it connects to the Google Search API automatically, handles authentication, pagination, and result formatting. No API key configuration, no schema definition, no docstring to write. You get production-quality web search in one import.

async def call_agent(query: str):
    session_service = InMemorySessionService()
    session = await session_service.create_session(
        app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID
    )
    runner  = Runner(agent=root_agent, app_name=APP_NAME, session_service=session_service)
    content = types.Content(role='user', parts=[types.Part(text=query)])

    for event in runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content):
        if event.is_final_response():
            print(event.content.parts[0].text)

asyncio.run(call_agent("What's the latest AI news?"))

InMemorySessionService: stores conversation state in RAM. Every run() call has a session_id — this is how the agent maintains multi-turn context. Sessions are covered in depth in Chapter 8.

event.is_final_response(): the ADK streams events as the agent works. Most events are intermediate (tool calls, reasoning steps). Only one event marks the final answer. Always filter for this before reading the output.

Code Execution (Built-in Sandboxed Interpreter)

from google.adk.agents import LlmAgent
from google.adk.code_executors import BuiltInCodeExecutor

code_agent = LlmAgent(
    name          = "calculator_agent",
    model         = "gemini-2.0-flash",
    code_executor = BuiltInCodeExecutor(),    # sandboxed Python
    instruction   = """You are a calculator agent.
When given a math problem, write Python code to solve it.
Return only the final numerical result as plain text.""",
)

BuiltInCodeExecutor: gives the agent a sandboxed Python interpreter. The agent writes Python code, the executor runs it securely, and the result is returned. This is how the agent handles:

  • Exact arithmetic (no rounding errors from the LLM)
  • Data manipulation (sorting, filtering, aggregation)
  • Problems that require deterministic computation

The LLM is good at reasoning about code. BuiltInCodeExecutor makes it actually run code. The combination is far more powerful than either alone.

ADK Tool Execution Flow

ADK CODE EXECUTION — agent writes code, sandbox runs it, agent reports the answer
User Query
"What is 10 factorial?"
Runner
manages the session · passes query to the agent
LlmAgent (Calculator)
reasons: "I should write Python code to calculate this precisely"
BuiltInCodeExecutor
receives Python code from agent · runs it in a safe sandbox · cannot harm your system
Python Sandbox
executes: import math; math.factorial(10) → returns 3628800
Final Response
"10! = 3,628,800" — exact answer, no guessing

Framework Comparison

FRAMEWORK CAPABILITY COMPARISON Hover a bar to see details
LangChain
CrewAI
Google ADK
  LangChain CrewAI Google ADK
Tool definition @tool decorator @tool("Name") decorator FunctionTool or built-ins
Built-in tools Community integrations Relies on LangChain tools google_search, BuiltInCodeExecutor, VSearchAgent
Agent model Chain / graph Role + goal + backstory LlmAgent + orchestrators
Best for Composable pipelines Role-based multi-agent teams Google Cloud / enterprise
Async Full (ainvoke, astream) Partial run_async streaming

At a Glance

WHAT

LLMs can't call APIs, query databases, or run code on their own. Tool use lets them generate a structured JSON request — which the framework executes — bridging language reasoning with real-world action.

WHY

Training data is frozen. Real-world tasks require live data, exact computation, and system interaction. Tool use is the mechanism that makes agents actually useful — not just fluent.

RULE OF THUMB

Use tool calling whenever the agent needs to break out of its training data: real-time info, private data, exact math, code execution, or triggering real-world actions.


How Tool Calling Works Under the Hood

When an LLM produces a function call, what exactly happens inside LangChain or ADK? Let’s trace through the complete execution of a single tool-using agent from start to finish.

Step 1: Tool definition sent to the model. When you create AgentExecutor(agent=agent, tools=[search_information]), LangChain serializes your tool’s metadata — its name, description (from the docstring), and parameter schema (from type hints) — into a format the LLM can read. For OpenAI-compatible models, this becomes a tools JSON array in the API request. For Gemini, it becomes a function_declarations array. The LLM sees these tool definitions before it sees your user message.

Step 2: LLM decides whether to use a tool. The LLM reads the user’s message and the tool definitions simultaneously. If the message is “What’s the capital of France?” and the tool definition says “Use this tool to find answers to factual questions,” the LLM reasons: “This is a factual question, I should use the tool.” This reasoning happens inside the model — you don’t write code for it. The model’s training on millions of examples of “when to use tools” informs this decision.

Step 3: LLM generates the function call. If the LLM decides to use a tool, instead of generating prose, it outputs a structured function call: {"name": "search_information", "arguments": {"query": "capital of france"}}. This is still text output from the model — just formatted as JSON instead of prose.

Step 4: AgentExecutor intercepts and executes. LangChain’s AgentExecutor parses the model’s output. If it sees a function call (not prose), it: (1) looks up the function name in its registered tools, (2) calls the actual Python function with the provided arguments, (3) captures the return value.

Step 5: Result injected back as context. The tool’s return value is formatted as a “Tool Result” message and added to the conversation history. The full conversation — original message + tool call + tool result — is sent back to the LLM. Now the LLM has the factual information it needed and can formulate a final answer.

Step 6: Loop terminates. The loop ends when the LLM produces a final text response (not a function call), or when max_iterations is reached.

This is called the ReAct pattern (Reason + Act) — the model alternates between reasoning about what to do and acting (calling a tool) until it has enough information to answer.

Why the Docstring Is the Most Important Part of Your Tool

The docstring is how the LLM decides whether to call your tool. The LLM reads the docstring and matches it against the user’s request. A vague docstring means unpredictable tool selection. A precise docstring means reliable tool selection.

Compare these two docstrings for the same function:

Vague (bad): def get_stock_price(ticker: str) -> float: "Gets stock price."

Precise (good):

def get_stock_price(ticker: str) -> float:
    """
    Fetches the current market price for a publicly traded stock.
    Use this when the user asks for the current price, value, or quote
    of a stock, share, or publicly traded company.
    Args:
        ticker: The stock ticker symbol (e.g., 'AAPL' for Apple, 'GOOGL' for Alphabet).
                Must be uppercase.
    Returns:
        The current price in USD as a float.
    Raises:
        ValueError: If the ticker symbol is not found.
    """

The second docstring tells the LLM: when to use this tool, what the parameter means and its exact format requirements, what it returns, and what can go wrong. The LLM uses all of this to make better decisions.

Common Mistakes When Building Tool-Using Agents

Mistake 1: Vague docstrings. As explained above, the docstring is the interface between your tool and the LLM’s decision-making. “Gets weather” is useless. “Returns current weather conditions for a specified city, including temperature in Celsius, precipitation, and wind speed. Use this for any question about current weather, forecast, or climate conditions in a specific location” is what the LLM needs.

Mistake 2: Tools that return raw API responses. Never return a raw API JSON blob from your tool. The LLM has to parse it and reason about it. Instead, extract the relevant fields and return a clean, readable summary: “Temperature: 15°C, Conditions: Cloudy, Wind: 12 km/h” rather than 200 lines of JSON.

Mistake 3: No error handling in tool functions. If your tool throws an uncaught exception, the agent executor crashes. Always catch exceptions and return an error description string: return f"Error: Could not find ticker '{ticker}'. Please use a valid stock ticker symbol." This allows the LLM to understand what went wrong and either retry with different arguments or tell the user.

Mistake 4: Too many tools. LLMs perform worse when given many tools — it’s harder to decide which one to use when there are 50 options. Keep tool sets focused. If you need many tools, consider hierarchical routing: a primary router that selects a specialist sub-agent, each of which has a small focused tool set.

Mistake 5: Tools that have side effects the user doesn’t expect. “Send email” and “Process payment” are irrevocable actions. Never make these tools directly callable by the agent without a confirmation step. Either require human approval before executing, or make the tool first return a preview: “I will send this email to john@example.com with the subject ‘Meeting tomorrow’ and body ‘…’. Confirm? (yes/no)”.

Key Takeaways

  • The LLM doesn’t run the tool. It generates a structured JSON object specifying which tool to call and with what arguments. The framework executes it. This separation is what makes tool use safe and auditable.
  • The docstring is the interface. In LangChain and CrewAI, the function’s docstring is what the LLM reads to decide when to use a tool. A clear, specific docstring = accurate tool routing.
  • ADK built-in tools are production-ready. google_search, BuiltInCodeExecutor, and VSearchAgent require zero configuration. For custom tools, all three frameworks use decorator-based definitions.
  • AgentExecutor (LangChain) is an execution loop — it keeps calling the agent until no more tool calls are needed. The {agent_scratchpad} placeholder is the agent’s scratch pad for intermediate thoughts.
  • CrewAI’s role/goal/backstory creates an agent persona that influences reasoning style and output formatting — not just a system prompt, but a professional identity.
  • Code execution is qualitatively different from text generation — it gives agents deterministic, exact answers for math, data manipulation, and computation. The LLM reasons about what code to write; the executor runs it.

Next up — Chapter 6: Planning, where agents stop reacting to individual inputs and start building structured multi-step plans to achieve complex goals.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Chapter 10: Contributing to AI Safety — Paths, Skills, and Getting Started
  • Chapter 9: AI Control — Safety Without Trusting the Model
  • Chapter 19: Evaluation and Monitoring
  • Chapter 18: Guardrails and Safety Patterns
  • Chapter 17: Reasoning Techniques