AI Agents and Tool Calling
Build autonomous AI agents that reason, plan, and act using tools. Master the ReAct pattern, agent architectures, and orchestration frameworks to create production-ready agent systems.
1. What is an AI Agent?
Definition: LLM + Tools + Memory + Planning
An AI Agent is an autonomous system built on top of a Large Language Model (LLM) that can perceive its environment, make decisions, take actions using external tools, maintain memory of past interactions, and plan multi-step strategies to accomplish goals. Unlike a simple chatbot that only generates text responses, an agent can actually do things in the real world.
The four pillars of an AI Agent are:
- LLM (Brain): The reasoning engine that understands natural language, generates plans, and makes decisions. Models like GPT-4o, Claude 4, and Gemini 2.0 serve as the "brain" of the agent.
- Tools (Hands): External functions, APIs, and services the agent can invoke to interact with the world -- web search, code execution, database queries, file operations, API calls, etc.
- Memory (Experience): The ability to store and recall information from past interactions, both within a conversation (short-term) and across sessions (long-term). This enables learning and personalization.
- Planning (Strategy): The capacity to break down complex tasks into subtasks, create execution plans, adapt when things go wrong, and iterate toward a goal.
LLM vs Chatbot vs Agent -- Key Differences
Understanding the spectrum from raw LLMs to full agents is crucial:
| Feature | Raw LLM | Chatbot | AI Agent |
|---|---|---|---|
| Input/Output | Text in, text out | Conversational text | Goals in, actions + results out |
| Memory | None (stateless) | Conversation history | Short-term + long-term + episodic |
| Tools | None | Limited/none | Multiple external tools & APIs |
| Planning | None | Basic flow control | Multi-step planning & replanning |
| Autonomy | Zero | Low | High (can act independently) |
| Example | GPT-4 API call | ChatGPT, Claude.ai | Devin, Claude Computer Use, Operator |
Agent Architectures Overview
There are several fundamental approaches to building agents, each with different trade-offs:
- Simple Reflex Agents: Respond directly to the current input with predefined tool mappings. No memory or planning. Fast but limited. Example: a chatbot that always searches when it detects a question.
- Model-Based Agents: Maintain an internal model of the world state. Can reason about what has changed and what to do next. Example: an agent tracking inventory state across actions.
- Goal-Based Agents: Given a goal, they plan a sequence of actions to achieve it. Can reason about future states. Example: a coding agent planning steps to build a feature.
- Utility-Based Agents: Optimize for a utility function, choosing the "best" action among alternatives. Example: an agent choosing between speed and quality in task completion.
- Learning Agents: Improve their performance over time based on feedback. Example: an agent that refines its approach based on user ratings.
The Evolution: Prompting → Chains → Agents → Multi-Agent Systems
The AI engineering field has evolved through distinct phases:
Phase 1: Prompting (2022-2023)
Single LLM calls with carefully crafted prompts. Zero-shot, few-shot, chain-of-thought. Simple but limited -- everything had to fit in one prompt and one response.
Phase 2: Chains (2023)
Sequential LLM calls where the output of one becomes the input of the next. LangChain popularized this. Example: summarize a document, then translate the summary, then format it. Deterministic pipelines with LLMs.
Phase 3: Agents (2023-2024)
LLMs that can choose which tools to use and when. The LLM decides the control flow dynamically. ReAct pattern, function calling, autonomous reasoning loops. Non-deterministic -- the LLM decides what to do next.
Phase 4: Multi-Agent Systems (2024-2026)
Multiple specialized agents collaborating on complex tasks. Each agent has different expertise and tools. Frameworks like CrewAI, AutoGen, LangGraph, and OpenAI Swarm enable this. Standards like MCP and A2A enable interoperability.
Real-World Agent Examples (2025-2026)
Devin (Cognition Labs): An AI software engineer agent that can plan, write code, debug, deploy, and iterate on entire projects. Uses a full development environment with terminal, browser, and editor. Demonstrates long-horizon task completion -- can work on tasks for hours autonomously.
Claude Computer Use (Anthropic): Claude can see your screen (via screenshots), move the mouse, click, type, and interact with any desktop application. Effectively turns Claude into an agent that can use any software a human can use. This represents a general-purpose agent interface.
OpenAI Operator: An agent that can browse the web, fill out forms, make purchases, and complete multi-step tasks on websites. Uses a built-in browser with visual understanding to navigate the web like a human.
Google Gemini with Extensions: Gemini agents that can interact with Google Workspace (Gmail, Docs, Calendar), Maps, YouTube, and other Google services natively. Deep integration with Google's ecosystem.
Microsoft Copilot Agents: Agents built into Microsoft 365 that can automate workflows across Teams, Outlook, Excel, and other Microsoft products. Can be customized with Copilot Studio.
2. Tool Calling / Function Calling
How Tool Calling Works
Tool calling (also called function calling) is the mechanism by which an LLM requests the execution of external functions. Here is the core flow:
- Tool Definition: You define available tools with their names, descriptions, and parameter schemas (typically JSON Schema). These definitions are sent with the API request.
- User Query: The user asks a question or gives a task that might require tool use.
- LLM Decision: The model analyzes the query and decides whether to use a tool. If yes, it generates a structured tool call (function name + arguments) instead of a text response.
- Tool Execution: Your code receives the tool call, executes the actual function, and gets the result.
- Result Integration: You send the tool result back to the LLM, which uses it to generate the final response to the user.
The key insight is that the LLM never actually executes the tools itself. It generates structured JSON requesting a tool call, and your application code handles the execution. This is a critical security boundary.
User: "What's the weather in Tokyo?"
1. Your app sends to LLM:
- System prompt
- User message: "What's the weather in Tokyo?"
- Available tools: [get_weather(location: string)]
2. LLM responds with (NOT text, but structured output):
{
"tool_calls": [{
"id": "call_abc123",
"function": {
"name": "get_weather",
"arguments": '{"location": "Tokyo, Japan"}'
}
}]
}
3. Your app executes: get_weather("Tokyo, Japan") => {"temp": 15, "condition": "cloudy"}
4. Your app sends tool result back to LLM:
- role: "tool", content: '{"temp": 15, "condition": "cloudy"}'
5. LLM generates final response:
"The current weather in Tokyo is 15°C with cloudy skies."
OpenAI Function Calling API
OpenAI introduced function calling in June 2023 and has since evolved it into a robust tool-calling system. The API uses JSON Schema to define tool parameters, and the model outputs structured JSON for tool calls.
import openai
import json
client = openai.OpenAI()
# Step 1: Define the tools with JSON Schema
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a given location. Use this when the user asks about weather conditions.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state/country, e.g., 'San Francisco, CA' or 'London, UK'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit preference"
}
},
"required": ["location"]
}
}
},
{
"type": "function",
"function": {
"name": "search_database",
"description": "Search for products in the company database by name, category, or price range.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query string"
},
"category": {
"type": "string",
"enum": ["electronics", "clothing", "food", "books"],
"description": "Product category to filter by"
},
"max_price": {
"type": "number",
"description": "Maximum price filter"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate",
"description": "Perform mathematical calculations. Supports basic arithmetic, percentages, and unit conversions.",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "The mathematical expression to evaluate, e.g., '(15 * 23) + 17'"
}
},
"required": ["expression"]
}
}
}
]
# Step 2: Define the actual tool implementations
def get_weather(location: str, unit: str = "celsius") -> dict:
"""Simulate a weather API call."""
# In production, this would call a real weather API like OpenWeatherMap
weather_data = {
"San Francisco, CA": {"temp": 18, "condition": "foggy", "humidity": 75},
"London, UK": {"temp": 12, "condition": "rainy", "humidity": 85},
"Tokyo, Japan": {"temp": 22, "condition": "sunny", "humidity": 60},
}
data = weather_data.get(location, {"temp": 20, "condition": "unknown", "humidity": 50})
if unit == "fahrenheit":
data["temp"] = data["temp"] * 9/5 + 32
data["unit"] = unit
data["location"] = location
return data
def search_database(query: str, category: str = None, max_price: float = None) -> dict:
"""Simulate a database search."""
products = [
{"name": "Wireless Headphones", "category": "electronics", "price": 79.99},
{"name": "Python Programming Book", "category": "books", "price": 39.99},
{"name": "Smart Watch", "category": "electronics", "price": 249.99},
{"name": "Organic Green Tea", "category": "food", "price": 12.99},
]
results = [p for p in products if query.lower() in p["name"].lower()]
if category:
results = [p for p in results if p["category"] == category]
if max_price:
results = [p for p in results if p["price"] <= max_price]
return {"results": results, "total": len(results)}
def calculate(expression: str) -> dict:
"""Safely evaluate a math expression."""
try:
# WARNING: In production, use a proper math parser, not eval()
# Libraries like 'numexpr' or 'sympy' are safer alternatives
allowed_chars = set("0123456789+-*/().% ")
if all(c in allowed_chars for c in expression):
result = eval(expression)
return {"expression": expression, "result": result}
else:
return {"error": "Invalid characters in expression"}
except Exception as e:
return {"error": str(e)}
# Map function names to implementations
tool_functions = {
"get_weather": get_weather,
"search_database": search_database,
"calculate": calculate,
}
# Step 3: The main agent loop
def run_agent(user_message: str):
"""Run the agent with tool calling support."""
messages = [
{
"role": "system",
"content": "You are a helpful assistant with access to weather data, "
"a product database, and a calculator. Use tools when needed "
"to provide accurate information."
},
{"role": "user", "content": user_message}
]
print(f"\n{'='*60}")
print(f"User: {user_message}")
print(f"{'='*60}")
# Allow multiple rounds of tool calling
max_iterations = 10
for iteration in range(max_iterations):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto" # Let the model decide
)
assistant_message = response.choices[0].message
# If no tool calls, we have the final answer
if not assistant_message.tool_calls:
print(f"\nAssistant: {assistant_message.content}")
return assistant_message.content
# Process each tool call
messages.append(assistant_message) # Add assistant's tool call message
for tool_call in assistant_message.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
print(f"\n[Tool Call] {function_name}({arguments})")
# Execute the tool
if function_name in tool_functions:
result = tool_functions[function_name](**arguments)
else:
result = {"error": f"Unknown function: {function_name}"}
print(f"[Tool Result] {json.dumps(result, indent=2)}")
# Add tool result to messages
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": json.dumps(result)
})
return "Max iterations reached"
# Step 4: Test with various queries
if __name__ == "__main__":
# Single tool call
run_agent("What's the weather like in Tokyo?")
# Multiple tool calls (the model may call multiple tools in parallel)
run_agent("Compare the weather in San Francisco and London")
# Tool + reasoning
run_agent("Find electronics under $100 and calculate 15% tax on the total")
# No tool needed
run_agent("What is an AI agent?")
Anthropic Tool Use API
Anthropic's Claude models support tool use with a slightly different API structure but the same core concept. Claude's tool use is particularly strong at multi-step reasoning about when and how to use tools.
import anthropic
import json
client = anthropic.Anthropic()
# Define tools for Claude
tools = [
{
"name": "get_stock_price",
"description": "Get the current stock price and basic info for a given ticker symbol. "
"Returns price, change, volume, and market cap.",
"input_schema": {
"type": "object",
"properties": {
"ticker": {
"type": "string",
"description": "Stock ticker symbol, e.g., 'AAPL', 'GOOGL', 'MSFT'"
}
},
"required": ["ticker"]
}
},
{
"name": "search_news",
"description": "Search for recent news articles about a topic. Returns headlines and summaries.",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query for news articles"
},
"num_results": {
"type": "integer",
"description": "Number of results to return (default 5, max 10)"
}
},
"required": ["query"]
}
},
{
"name": "create_chart",
"description": "Create a chart or visualization from data. Returns a URL to the generated image.",
"input_schema": {
"type": "object",
"properties": {
"chart_type": {
"type": "string",
"enum": ["line", "bar", "pie", "scatter"],
"description": "Type of chart to create"
},
"title": {
"type": "string",
"description": "Chart title"
},
"data": {
"type": "object",
"description": "Chart data with 'labels' and 'values' arrays"
}
},
"required": ["chart_type", "title", "data"]
}
}
]
# Tool implementations
def get_stock_price(ticker: str) -> dict:
"""Simulate stock price lookup."""
stocks = {
"AAPL": {"price": 198.50, "change": +2.3, "volume": "45M", "market_cap": "3.1T"},
"GOOGL": {"price": 178.20, "change": -1.1, "volume": "22M", "market_cap": "2.2T"},
"MSFT": {"price": 445.80, "change": +5.2, "volume": "28M", "market_cap": "3.3T"},
"NVDA": {"price": 890.40, "change": +15.6, "volume": "55M", "market_cap": "2.2T"},
}
data = stocks.get(ticker.upper())
if data:
return {"ticker": ticker.upper(), **data}
return {"error": f"Ticker {ticker} not found"}
def search_news(query: str, num_results: int = 5) -> dict:
"""Simulate news search."""
return {
"results": [
{"headline": f"Latest developments in {query}", "source": "TechNews", "date": "2026-03-07"},
{"headline": f"{query}: What analysts are saying", "source": "Financial Times", "date": "2026-03-06"},
{"headline": f"Breaking: Major update on {query}", "source": "Reuters", "date": "2026-03-05"},
][:num_results]
}
def create_chart(chart_type: str, title: str, data: dict) -> dict:
"""Simulate chart creation."""
return {"url": f"https://charts.example.com/{chart_type}_{hash(title) % 10000}.png", "title": title}
tool_functions = {
"get_stock_price": get_stock_price,
"search_news": search_news,
"create_chart": create_chart,
}
def run_claude_agent(user_message: str):
"""Run an agent powered by Claude with tool use."""
messages = [{"role": "user", "content": user_message}]
print(f"\nUser: {user_message}")
print("-" * 50)
max_iterations = 10
for _ in range(max_iterations):
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system="You are a financial analyst assistant with access to stock data, "
"news search, and charting tools. Provide thorough analysis.",
tools=tools,
messages=messages
)
# Check if we need to process tool calls
if response.stop_reason == "tool_use":
# Add Claude's response (which contains both text and tool_use blocks)
messages.append({"role": "assistant", "content": response.content})
# Process all tool use blocks
tool_results = []
for block in response.content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
tool_use_id = block.id
print(f"[Tool Call] {tool_name}({json.dumps(tool_input)})")
# Execute the tool
result = tool_functions[tool_name](**tool_input)
print(f"[Tool Result] {json.dumps(result, indent=2)}")
tool_results.append({
"type": "tool_result",
"tool_use_id": tool_use_id,
"content": json.dumps(result)
})
elif block.type == "text" and block.text:
print(f"[Thinking] {block.text}")
messages.append({"role": "user", "content": tool_results})
else:
# Final response - extract text
final_text = ""
for block in response.content:
if hasattr(block, "text"):
final_text += block.text
print(f"\nAssistant: {final_text}")
return final_text
return "Max iterations reached"
# Test the agent
if __name__ == "__main__":
run_claude_agent("Compare AAPL and MSFT stocks and search for recent news about both")
run_claude_agent("Get NVDA stock price and create a bar chart comparing it with AAPL and GOOGL")
How Models Are Trained for Tool Calling
Tool calling is not a magical feature -- models are specifically trained to do it:
- Data Collection: Training data is created by having humans demonstrate correct tool usage. Given a user query and available tools, annotators write the correct tool call with proper arguments.
- Special Tokens: Models learn special tokens or output formats that signal a tool call. For example, outputting
<tool_call>or a specific JSON structure. - Fine-Tuning: The base model is fine-tuned on this tool-use data so it learns when to call tools, which tool to pick, and how to format arguments correctly.
- RLHF: The model is further refined with reinforcement learning from human feedback to improve tool selection accuracy and argument quality.
- Schema Understanding: Models are trained to read JSON Schema definitions and map user intent to the correct parameters. This is why good tool descriptions are crucial.
Key Insight: The quality of your tool descriptions and parameter schemas directly affects how well the model uses them. Vague descriptions lead to incorrect tool usage. Always write clear, specific tool descriptions with examples.
Tool Definition Best Practices
# BAD: Vague description, no parameter details
bad_tool = {
"name": "search",
"description": "Search for stuff",
"parameters": {
"type": "object",
"properties": {
"q": {"type": "string"}
}
}
}
# GOOD: Clear description, detailed parameters, examples
good_tool = {
"name": "search_knowledge_base",
"description": (
"Search the company knowledge base for articles, FAQs, and documentation. "
"Use this when the user asks questions about company products, policies, "
"or procedures. Returns the top matching articles with relevance scores. "
"Example queries: 'return policy', 'shipping times', 'product warranty'"
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language search query. Be specific -- "
"'laptop warranty policy' is better than 'warranty'"
},
"category": {
"type": "string",
"enum": ["products", "policies", "troubleshooting", "billing"],
"description": "Optional category filter to narrow results"
},
"max_results": {
"type": "integer",
"description": "Maximum number of results (1-20, default 5)",
"minimum": 1,
"maximum": 20,
"default": 5
}
},
"required": ["query"]
}
}
# BEST PRACTICE: Include negative examples of when NOT to use the tool
best_tool = {
"name": "execute_sql_query",
"description": (
"Execute a read-only SQL query against the orders database. "
"Use this to look up specific order information, check order status, "
"or generate reports. "
"ONLY use for order-related queries. "
"DO NOT use for: user account info (use get_user_profile instead), "
"product info (use search_products instead), "
"or any write/update operations (use update_order instead). "
"The database has tables: orders, order_items, shipping."
),
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "SQL SELECT query. Must be read-only (SELECT only, no INSERT/UPDATE/DELETE)."
}
},
"required": ["query"]
}
}
3. ReAct Pattern (Reasoning + Acting)
The ReAct Paper Explained
ReAct (Reasoning + Acting) was introduced by Yao et al. in 2022 and is one of the most influential agent patterns. The core idea is simple but powerful: alternate between thinking (reasoning) and doing (acting).
Before ReAct, there were two separate approaches:
- Chain-of-Thought (CoT): The model reasons step-by-step but cannot take actions or get external information. It may hallucinate facts because it is purely reasoning from internal knowledge.
- Action-only agents: The model takes actions (tool calls) but does not explicitly reason about what to do. It may take incorrect actions because it does not think before acting.
ReAct combines both: the model explicitly states its reasoning (Thought), decides on an action (Action), observes the result (Observation), and then reasons again. This creates a powerful feedback loop.
The Thought → Action → Observation Loop
Question: "What is the population of the capital of France?"
Thought 1: I need to find the capital of France, then look up its population.
The capital of France is Paris. Let me search for the current population.
Action 1: search("Paris population 2026")
Observation 1: Paris has a population of approximately 2.1 million in the city proper
and 12.3 million in the metropolitan area as of 2026.
Thought 2: I now have the information. The capital of France is Paris, and its
population is about 2.1 million (city proper) or 12.3 million (metro).
I can provide a complete answer.
Action 2: finish("The capital of France is Paris. Its population is approximately
2.1 million in the city proper and 12.3 million in the metropolitan area.")
Why ReAct Improves Over Simple Chain-of-Thought
- Grounded Reasoning: By taking actions and observing results, the model grounds its reasoning in real data rather than potentially hallucinated facts.
- Dynamic Planning: The model can adjust its plan based on observations. If a search returns unexpected results, it can change course.
- Transparency: The explicit Thought steps make the agent's reasoning visible and debuggable.
- Error Recovery: When an action fails, the model can reason about why and try a different approach.
PRACTICAL: Implement ReAct Pattern from Scratch
import openai
import json
import re
from typing import Callable
client = openai.OpenAI()
class ReActAgent:
"""
A ReAct (Reasoning + Acting) agent that alternates between
thinking and acting to solve complex tasks.
"""
def __init__(self, tools: dict[str, Callable], model: str = "gpt-4o"):
self.tools = tools # name -> function mapping
self.model = model
self.max_iterations = 10
self.history = []
def _build_system_prompt(self) -> str:
"""Build the system prompt with tool descriptions."""
tool_descriptions = []
for name, func in self.tools.items():
doc = func.__doc__ or "No description available"
tool_descriptions.append(f" - {name}: {doc.strip()}")
tools_text = "\n".join(tool_descriptions)
return f"""You are a ReAct agent that solves problems by alternating between
Thinking and Acting.
Available tools:
{tools_text}
You MUST follow this EXACT format for every step:
Thought: [Your reasoning about what to do next]
Action: [tool_name(arg1, arg2, ...)]
OR when you have the final answer:
Thought: [Your reasoning about why you have enough information]
Action: finish(your final answer here)
RULES:
1. Always start with a Thought before any Action.
2. Only use ONE Action per step.
3. Wait for the Observation before your next Thought.
4. Use finish() when you have enough information to answer.
5. Be concise but thorough in your reasoning.
6. If a tool call fails, reason about why and try a different approach.
"""
def _parse_action(self, text: str) -> tuple[str, list[str]]:
"""Parse the action from the model's response."""
# Match Action: tool_name(args)
action_match = re.search(r'Action:\s*(\w+)\((.*)?\)', text, re.DOTALL)
if action_match:
tool_name = action_match.group(1)
args_str = action_match.group(2) or ""
# Simple argument parsing
args = [a.strip().strip('"').strip("'") for a in args_str.split(",") if a.strip()] if args_str else []
return tool_name, args
return None, []
def run(self, question: str) -> str:
"""Run the ReAct loop to answer a question."""
print(f"\n{'='*70}")
print(f"Question: {question}")
print(f"{'='*70}")
messages = [
{"role": "system", "content": self._build_system_prompt()},
{"role": "user", "content": f"Question: {question}"}
]
for iteration in range(self.max_iterations):
# Get model's Thought + Action
response = client.chat.completions.create(
model=self.model,
messages=messages,
temperature=0.0,
max_tokens=1000
)
assistant_text = response.choices[0].message.content
print(f"\n{assistant_text}")
# Parse the action
tool_name, args = self._parse_action(assistant_text)
if tool_name is None:
print("[ERROR] Could not parse action. Asking model to retry.")
messages.append({"role": "assistant", "content": assistant_text})
messages.append({
"role": "user",
"content": "Please follow the exact format: Thought: ... Action: tool_name(args)"
})
continue
# Check for finish action
if tool_name == "finish":
answer = args[0] if args else "No answer provided"
print(f"\n{'='*70}")
print(f"Final Answer: {answer}")
print(f"{'='*70}")
return answer
# Execute the tool
if tool_name in self.tools:
try:
result = self.tools[tool_name](*args)
observation = f"Observation: {result}"
except Exception as e:
observation = f"Observation: ERROR - {str(e)}"
else:
observation = f"Observation: ERROR - Unknown tool '{tool_name}'. Available: {list(self.tools.keys())}"
print(observation)
# Add to message history
messages.append({"role": "assistant", "content": assistant_text})
messages.append({"role": "user", "content": observation})
return "Max iterations reached without finding an answer."
# Define tools for the ReAct agent
def search(query: str) -> str:
"""Search the web for information about a topic. Returns relevant text snippets."""
# Simulated search results
results = {
"paris population": "Paris has 2.1 million people in city proper, 12.3M in metro area (2026).",
"eiffel tower height": "The Eiffel Tower is 330 meters (1,083 feet) tall including antennas.",
"python creator": "Python was created by Guido van Rossum, first released in 1991.",
"largest ocean": "The Pacific Ocean is the largest, covering 165.25 million km2.",
}
for key, value in results.items():
if key in query.lower():
return value
return f"Search results for '{query}': No specific results found. Try a more specific query."
def calculator(expression: str) -> str:
"""Evaluate a mathematical expression. Supports +, -, *, /, **, (), and common math functions."""
try:
allowed = set("0123456789+-*/().() ")
if all(c in allowed for c in expression):
return str(eval(expression))
return "Error: Only basic arithmetic is supported"
except Exception as e:
return f"Calculation error: {e}"
def lookup(entity: str) -> str:
"""Look up facts about a specific entity (person, place, thing) in our knowledge base."""
kb = {
"paris": "Paris is the capital of France, located on the Seine river. Known for the Eiffel Tower, Louvre Museum.",
"python": "Python is a high-level programming language. Used in AI, web dev, data science.",
"openai": "OpenAI is an AI research company. Created GPT-4, ChatGPT, DALL-E, and Sora.",
}
return kb.get(entity.lower(), f"No knowledge base entry for '{entity}'")
# Create and run the agent
if __name__ == "__main__":
agent = ReActAgent(
tools={
"search": search,
"calculator": calculator,
"lookup": lookup,
}
)
# Test with various questions
agent.run("What is the height of the Eiffel Tower in feet? Double that number.")
agent.run("Who created Python and what is it used for?")
agent.run("What is the population of the capital of France divided by 1000?")
4. Agent Patterns
Pattern 1: Prompt Chaining
Sequential LLM calls where the output of one step becomes the input to the next. The simplest agent pattern -- fully deterministic, like a pipeline. Best for tasks with clear, fixed stages.
import openai
client = openai.OpenAI()
def llm_call(prompt: str, system: str = "You are a helpful assistant.") -> str:
"""Make a single LLM call."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": prompt}
],
temperature=0.7
)
return response.choices[0].message.content
def prompt_chain_content_pipeline(topic: str) -> dict:
"""
Content creation pipeline using prompt chaining:
Research -> Outline -> Draft -> Edit -> Final
"""
results = {}
# Step 1: Research and gather key points
print("Step 1: Researching...")
research = llm_call(
f"Research the topic '{topic}'. List 5-7 key points with brief explanations.",
system="You are a research analyst. Provide factual, well-organized research."
)
results["research"] = research
print(f" Research complete: {len(research)} chars")
# Step 2: Create an outline based on research
print("Step 2: Creating outline...")
outline = llm_call(
f"Based on this research, create a detailed blog post outline with sections and subsections:\n\n{research}",
system="You are a content strategist. Create clear, logical outlines."
)
results["outline"] = outline
print(f" Outline complete: {len(outline)} chars")
# Step 3: Write the draft based on the outline
print("Step 3: Writing draft...")
draft = llm_call(
f"Write a complete blog post based on this outline. Make it engaging and informative (800-1200 words):\n\n{outline}",
system="You are a skilled blog writer. Write clear, engaging content."
)
results["draft"] = draft
print(f" Draft complete: {len(draft)} chars")
# Step 4: Edit and improve the draft
print("Step 4: Editing...")
edited = llm_call(
f"Edit this blog post for clarity, grammar, flow, and engagement. Fix any issues and improve the writing:\n\n{draft}",
system="You are a professional editor. Improve writing while maintaining the author's voice."
)
results["final"] = edited
print(f" Editing complete: {len(edited)} chars")
return results
# Usage
result = prompt_chain_content_pipeline("The Future of AI Agents in 2026")
print("\n" + "="*60)
print("FINAL ARTICLE:")
print("="*60)
print(result["final"])
Pattern 2: Routing
Use an LLM to classify the input and route it to the appropriate specialized handler. This creates an intelligent dispatcher that directs work to the right expert.
import openai
import json
client = openai.OpenAI()
def classify_query(query: str) -> str:
"""Use LLM to classify the user's query into a category."""
response = client.chat.completions.create(
model="gpt-4o-mini", # Use a cheaper model for classification
messages=[
{
"role": "system",
"content": """Classify the user's query into exactly one category:
- "billing": Questions about payments, invoices, charges, refunds, pricing
- "technical": Questions about product features, bugs, how-to, setup
- "sales": Questions about plans, upgrades, enterprise, demos
- "general": Greetings, feedback, anything else
Respond with ONLY the category name in lowercase, nothing else."""
},
{"role": "user", "content": query}
],
temperature=0.0
)
return response.choices[0].message.content.strip().lower()
def handle_billing(query: str) -> str:
"""Specialized billing handler."""
return client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a billing specialist. Help with payment issues, "
"refunds, and invoice questions. Be precise about amounts and dates. "
"If you need to process a refund, always confirm the amount first."
},
{"role": "user", "content": query}
]
).choices[0].message.content
def handle_technical(query: str) -> str:
"""Specialized technical support handler."""
return client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a senior technical support engineer. Help users with "
"setup, configuration, debugging, and feature questions. Provide "
"step-by-step instructions when applicable. Include code examples."
},
{"role": "user", "content": query}
]
).choices[0].message.content
def handle_sales(query: str) -> str:
"""Specialized sales handler."""
return client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a knowledgeable sales representative. Help with plan "
"comparisons, pricing, enterprise solutions, and schedule demos. "
"Be helpful but not pushy."
},
{"role": "user", "content": query}
]
).choices[0].message.content
def handle_general(query: str) -> str:
"""General purpose handler."""
return client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a friendly customer support agent. Help with general inquiries."},
{"role": "user", "content": query}
]
).choices[0].message.content
# Router mapping
HANDLERS = {
"billing": handle_billing,
"technical": handle_technical,
"sales": handle_sales,
"general": handle_general,
}
def route_query(query: str) -> str:
"""Route a query to the appropriate specialist."""
category = classify_query(query)
print(f"[Router] Query classified as: {category}")
handler = HANDLERS.get(category, handle_general)
response = handler(query)
return response
# Test
queries = [
"I was charged twice for my subscription last month",
"How do I set up the API integration with my Python app?",
"What's the difference between Pro and Enterprise plans?",
"Hello! I just wanted to say I love your product!"
]
for q in queries:
print(f"\nQuery: {q}")
print(f"Response: {route_query(q)[:200]}...")
Pattern 3: Parallelization
Run multiple LLM calls concurrently and aggregate the results. Useful for tasks that can be decomposed into independent subtasks.
import openai
import asyncio
import json
from typing import Any
client = openai.AsyncOpenAI() # Use async client
async def llm_call_async(prompt: str, system: str = "You are a helpful assistant.") -> str:
"""Make an async LLM call."""
response = await client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system},
{"role": "user", "content": prompt}
]
)
return response.choices[0].message.content
async def parallel_analysis(text: str) -> dict:
"""
Analyze a text from multiple perspectives in parallel.
Each analysis runs concurrently for faster results.
"""
# Define all analysis tasks
tasks = {
"sentiment": llm_call_async(
f"Analyze the sentiment of this text. Provide: overall sentiment (positive/negative/neutral), "
f"confidence score (0-1), and key emotional indicators.\n\nText: {text}",
system="You are a sentiment analysis expert. Be precise and analytical."
),
"key_topics": llm_call_async(
f"Extract the top 5 key topics/themes from this text. For each, provide "
f"the topic name and a brief explanation.\n\nText: {text}",
system="You are a topic extraction specialist."
),
"summary": llm_call_async(
f"Write a concise 2-3 sentence summary of this text.\n\nText: {text}",
system="You are an expert summarizer. Be concise yet comprehensive."
),
"action_items": llm_call_async(
f"Extract any action items, recommendations, or next steps from this text. "
f"If none, say 'No action items found'.\n\nText: {text}",
system="You are a project manager. Identify concrete action items."
),
"questions": llm_call_async(
f"Generate 3 follow-up questions that would deepen understanding of this text.\n\nText: {text}",
system="You are a critical thinking coach."
),
}
# Run all tasks concurrently
print("Running 5 analysis tasks in parallel...")
keys = list(tasks.keys())
results_list = await asyncio.gather(*tasks.values())
results = dict(zip(keys, results_list))
# Final aggregation step (sequential, uses parallel results)
aggregation_prompt = f"""Based on these analyses of a text, provide a final comprehensive report:
Sentiment Analysis: {results['sentiment']}
Key Topics: {results['key_topics']}
Summary: {results['summary']}
Action Items: {results['action_items']}
Follow-up Questions: {results['questions']}
Create a structured executive report combining all insights."""
sync_client = openai.OpenAI()
final_report = sync_client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": "You are an executive report writer."},
{"role": "user", "content": aggregation_prompt}
]
).choices[0].message.content
results["final_report"] = final_report
return results
# Usage
async def main():
text = """
Our Q4 2025 results show a 23% increase in AI platform revenue, driven primarily
by enterprise adoption of our agent framework. Customer satisfaction scores improved
from 4.2 to 4.6 out of 5. However, infrastructure costs rose 15% due to increased
GPU demand. We need to optimize our inference pipeline and explore cost-reduction
strategies. The team recommends investing in model distillation and caching.
Three new enterprise clients are in the pipeline for Q1 2026.
"""
results = await parallel_analysis(text)
for key, value in results.items():
print(f"\n{'='*50}")
print(f" {key.upper()}")
print(f"{'='*50}")
print(value[:300])
asyncio.run(main())
Pattern 4: Orchestrator-Worker
One LLM (the orchestrator) breaks down a complex task, delegates subtasks to worker LLMs, and assembles the final result. This mirrors how a project manager coordinates a team.
import openai
import json
client = openai.OpenAI()
def orchestrator_worker(task: str) -> str:
"""
Orchestrator-Worker pattern: One LLM plans and delegates,
workers execute, orchestrator assembles the result.
"""
# Step 1: Orchestrator decomposes the task
print("[Orchestrator] Analyzing task and creating plan...")
plan_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": """You are a task orchestrator. Break down the given task into
2-5 independent subtasks that can be done in parallel by worker agents.
Return a JSON array of subtasks, each with:
- "id": subtask number
- "description": what the worker should do
- "worker_type": one of "researcher", "writer", "analyst", "coder"
Return ONLY valid JSON, no other text."""
},
{"role": "user", "content": f"Task: {task}"}
],
temperature=0.0,
response_format={"type": "json_object"}
)
plan = json.loads(plan_response.choices[0].message.content)
subtasks = plan.get("subtasks", plan.get("tasks", []))
print(f"[Orchestrator] Created {len(subtasks)} subtasks")
# Step 2: Workers execute subtasks
worker_results = {}
worker_prompts = {
"researcher": "You are a thorough researcher. Provide detailed, factual information.",
"writer": "You are a skilled writer. Create clear, engaging content.",
"analyst": "You are a data analyst. Provide structured analysis with insights.",
"coder": "You are an expert programmer. Write clean, well-documented code.",
}
for subtask in subtasks:
subtask_id = subtask["id"]
description = subtask["description"]
worker_type = subtask.get("worker_type", "researcher")
print(f"[Worker-{worker_type}] Executing subtask {subtask_id}: {description[:60]}...")
worker_response = client.chat.completions.create(
model="gpt-4o-mini", # Workers can use cheaper models
messages=[
{"role": "system", "content": worker_prompts.get(worker_type, worker_prompts["researcher"])},
{"role": "user", "content": description}
]
)
worker_results[subtask_id] = {
"subtask": description,
"result": worker_response.choices[0].message.content
}
# Step 3: Orchestrator assembles final result
print("[Orchestrator] Assembling final result...")
assembly_context = "\n\n".join([
f"=== Subtask {sid}: {data['subtask']} ===\n{data['result']}"
for sid, data in worker_results.items()
])
final_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a skilled editor and assembler. Combine the worker "
"outputs into a cohesive, well-structured final deliverable. "
"Resolve any conflicts, remove redundancy, and ensure quality."
},
{
"role": "user",
"content": f"Original task: {task}\n\nWorker outputs:\n{assembly_context}\n\n"
"Please assemble these into a final, polished deliverable."
}
]
)
return final_response.choices[0].message.content
# Usage
result = orchestrator_worker(
"Create a comprehensive comparison of the top 3 AI agent frameworks "
"(LangGraph, CrewAI, AutoGen) covering architecture, use cases, "
"pros/cons, and a code example for each."
)
print("\n" + "="*60)
print(result)
Pattern 5: Evaluator-Optimizer
An iterative refinement pattern where one LLM generates content and another evaluates it, providing feedback for improvement. The generator keeps iterating until the evaluator is satisfied.
import openai
import json
client = openai.OpenAI()
def evaluator_optimizer(task: str, max_iterations: int = 3) -> str:
"""
Evaluator-Optimizer pattern: Generate, evaluate, improve, repeat.
"""
current_output = None
for iteration in range(max_iterations):
print(f"\n--- Iteration {iteration + 1} ---")
# GENERATOR: Create or improve the output
if current_output is None:
gen_prompt = f"Complete this task to the best of your ability:\n\n{task}"
else:
gen_prompt = (
f"Original task: {task}\n\n"
f"Your previous output:\n{current_output}\n\n"
f"Evaluator feedback:\n{feedback}\n\n"
f"Please improve your output based on the feedback. "
f"Address every point the evaluator raised."
)
gen_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are an expert at completing tasks with high quality. "
"When given feedback, you carefully address every point."
},
{"role": "user", "content": gen_prompt}
]
)
current_output = gen_response.choices[0].message.content
print(f"[Generator] Output: {current_output[:150]}...")
# EVALUATOR: Assess the quality
eval_response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": """You are a strict quality evaluator. Assess the output against the task.
Return JSON with:
- "score": 1-10 (10 = perfect)
- "passes": boolean (true if score >= 8)
- "strengths": list of what's good
- "improvements": list of specific improvements needed
- "feedback": detailed feedback paragraph
Be critical but fair. Only give high scores for truly excellent work."""
},
{
"role": "user",
"content": f"Task: {task}\n\nOutput to evaluate:\n{current_output}"
}
],
response_format={"type": "json_object"}
)
evaluation = json.loads(eval_response.choices[0].message.content)
score = evaluation.get("score", 0)
passes = evaluation.get("passes", False)
feedback = evaluation.get("feedback", "")
print(f"[Evaluator] Score: {score}/10 | Passes: {passes}")
print(f"[Evaluator] Feedback: {feedback[:200]}...")
if passes:
print(f"\n[SUCCESS] Output passed evaluation after {iteration + 1} iteration(s)!")
return current_output
print(f"\n[DONE] Max iterations reached. Returning best output (score: {score}/10)")
return current_output
# Usage
result = evaluator_optimizer(
"Write a Python function that implements binary search with full error handling, "
"type hints, comprehensive docstring, and edge case handling. "
"Include 5 test cases using pytest."
)
print("\n" + "="*60)
print(result)
5. Agent Orchestration Frameworks
Framework Landscape (March 2026)
The agent framework ecosystem has matured significantly. Here is a comparison of the major frameworks:
| Framework | By | Best For | Key Feature |
|---|---|---|---|
| LangGraph | LangChain | Stateful agent workflows | Graph-based state machines |
| CrewAI | CrewAI Inc | Multi-agent collaboration | Role-based agent teams |
| AutoGen | Microsoft | Conversational agents | Agent conversations |
| Swarm | OpenAI | Lightweight handoffs | Simple agent transfers |
| Pydantic AI | Pydantic | Type-safe agents | Structured outputs |
PRACTICAL: Customer Support Agent with LangGraph
LangGraph allows you to build agents as state machines (graphs) where nodes are processing steps and edges define the flow. It provides built-in support for state persistence, human-in-the-loop, and streaming.
# pip install langgraph langchain-openai
from typing import Annotated, TypedDict, Literal
from langgraph.graph import StateGraph, START, END
from langgraph.graph.message import add_messages
from langgraph.prebuilt import ToolNode
from langgraph.checkpoint.memory import MemorySaver
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage, SystemMessage
# Step 1: Define the state
class SupportState(TypedDict):
"""State that flows through the graph."""
messages: Annotated[list, add_messages] # Conversation history
category: str # Classified category of the query
sentiment: str # Customer sentiment
needs_human: bool # Whether to escalate to human
# Step 2: Define tools the agent can use
@tool
def search_knowledge_base(query: str) -> str:
"""Search the knowledge base for articles matching the query."""
kb = {
"refund": "Refund Policy: Full refund within 30 days. Partial refund within 60 days. Contact support for exceptions.",
"shipping": "Standard shipping: 5-7 days. Express: 2-3 days. International: 10-14 days.",
"password": "To reset password: Go to Settings > Security > Reset Password. Or use 'Forgot Password' on login page.",
"pricing": "Basic: $9/mo, Pro: $29/mo, Enterprise: Custom. All plans include 14-day free trial.",
}
for key, value in kb.items():
if key in query.lower():
return value
return "No matching article found. Consider searching with different keywords."
@tool
def lookup_order(order_id: str) -> str:
"""Look up order details by order ID."""
orders = {
"ORD-001": {"status": "shipped", "tracking": "1Z999AA10123456784", "eta": "March 10, 2026"},
"ORD-002": {"status": "processing", "tracking": None, "eta": "March 12, 2026"},
"ORD-003": {"status": "delivered", "tracking": "1Z999AA10123456785", "delivered_date": "March 5, 2026"},
}
order = orders.get(order_id)
if order:
return f"Order {order_id}: Status={order['status']}, ETA={order.get('eta', 'N/A')}, Tracking={order.get('tracking', 'N/A')}"
return f"Order {order_id} not found. Please check the order ID."
@tool
def process_refund(order_id: str, reason: str) -> str:
"""Process a refund for an order. Requires human approval for amounts over $100."""
return f"Refund request created for {order_id}. Reason: {reason}. Status: PENDING_APPROVAL. A support agent will review within 24 hours."
tools = [search_knowledge_base, lookup_order, process_refund]
# Step 3: Define the LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
llm_with_tools = llm.bind_tools(tools)
SYSTEM_PROMPT = """You are a helpful customer support agent for TechCo.
You have access to tools for searching the knowledge base, looking up orders, and processing refunds.
Guidelines:
- Always be empathetic and professional
- Search the knowledge base before answering policy questions
- Ask for order IDs when customers ask about orders
- For refunds, always confirm the details before processing
- If the customer is very upset or the issue is complex, recommend escalation to a human agent
"""
# Step 4: Define graph nodes
def classify_query(state: SupportState) -> SupportState:
"""Classify the customer query and detect sentiment."""
last_message = state["messages"][-1].content
classification = llm.invoke([
SystemMessage(content="Classify this customer query. Return ONLY a JSON with 'category' (billing/technical/order/general) and 'sentiment' (positive/neutral/negative/angry)."),
HumanMessage(content=last_message)
])
import json
try:
result = json.loads(classification.content)
category = result.get("category", "general")
sentiment = result.get("sentiment", "neutral")
except json.JSONDecodeError:
category = "general"
sentiment = "neutral"
return {
**state,
"category": category,
"sentiment": sentiment,
"needs_human": sentiment == "angry"
}
def agent_node(state: SupportState) -> SupportState:
"""The main agent node that uses tools to help the customer."""
messages = [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
response = llm_with_tools.invoke(messages)
return {"messages": [response]}
def should_continue(state: SupportState) -> Literal["tools", "check_escalation"]:
"""Decide if we need to call tools or check for escalation."""
last_message = state["messages"][-1]
if hasattr(last_message, "tool_calls") and last_message.tool_calls:
return "tools"
return "check_escalation"
def check_escalation(state: SupportState) -> Literal["agent", END]:
"""Check if we need to escalate to a human."""
if state.get("needs_human"):
return END
return END # In a real app, could loop back to agent for follow-up
# Step 5: Build the graph
def build_support_graph():
graph = StateGraph(SupportState)
# Add nodes
graph.add_node("classify", classify_query)
graph.add_node("agent", agent_node)
graph.add_node("tools", ToolNode(tools))
graph.add_node("check_escalation", lambda state: state)
# Define edges
graph.add_edge(START, "classify")
graph.add_edge("classify", "agent")
graph.add_conditional_edges("agent", should_continue)
graph.add_edge("tools", "agent") # After tools, go back to agent
graph.add_conditional_edges("check_escalation", check_escalation)
# Add memory for conversation persistence
memory = MemorySaver()
return graph.compile(checkpointer=memory)
# Step 6: Run the agent
app = build_support_graph()
# Conversation with thread_id for persistence
config = {"configurable": {"thread_id": "customer-123"}}
# First message
result = app.invoke(
{"messages": [HumanMessage(content="Hi, I need to check on my order ORD-001")]},
config=config
)
print("Agent:", result["messages"][-1].content)
# Follow-up (agent remembers the context)
result = app.invoke(
{"messages": [HumanMessage(content="When will it arrive?")]},
config=config
)
print("Agent:", result["messages"][-1].content)
6. Building a Complete Customer Support Agent
PRACTICAL: Multi-Tool Customer Support Agent
This is a comprehensive project that brings together everything from this week. We will build a full customer support agent that can understand queries, search a knowledge base, look up orders, process refunds with human approval, escalate issues, and maintain conversation history.
"""
Complete Multi-Tool Customer Support Agent
==========================================
Features:
1. Query understanding and classification
2. Knowledge base search (RAG-like)
3. Order status lookup via API
4. Refund processing with human approval
5. Complex issue escalation
6. Persistent conversation history
"""
import openai
import json
import sqlite3
from datetime import datetime
from typing import Optional
from dataclasses import dataclass, field
client = openai.OpenAI()
# =============================================================================
# Database Layer - Conversation History & Order Data
# =============================================================================
class Database:
"""SQLite database for conversation history and order data."""
def __init__(self, db_path: str = ":memory:"):
self.conn = sqlite3.connect(db_path)
self.conn.row_factory = sqlite3.Row
self._setup_tables()
self._seed_data()
def _setup_tables(self):
self.conn.executescript("""
CREATE TABLE IF NOT EXISTS conversations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
role TEXT NOT NULL,
content TEXT NOT NULL,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS orders (
order_id TEXT PRIMARY KEY,
customer_email TEXT,
status TEXT,
total REAL,
items TEXT,
shipping_address TEXT,
tracking_number TEXT,
created_at DATETIME,
updated_at DATETIME
);
CREATE TABLE IF NOT EXISTS refund_requests (
id INTEGER PRIMARY KEY AUTOINCREMENT,
order_id TEXT,
amount REAL,
reason TEXT,
status TEXT DEFAULT 'pending',
requires_human_approval BOOLEAN DEFAULT 0,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE IF NOT EXISTS knowledge_base (
id INTEGER PRIMARY KEY AUTOINCREMENT,
category TEXT,
title TEXT,
content TEXT,
keywords TEXT
);
""")
def _seed_data(self):
"""Seed with sample data."""
# Orders
orders = [
("ORD-1001", "alice@example.com", "shipped", 129.99,
"Wireless Headphones x1, USB-C Cable x2", "123 Main St, SF, CA",
"1Z999AA10123456784", "2026-02-28", "2026-03-05"),
("ORD-1002", "bob@example.com", "processing", 49.99,
"Python Programming Book x1", "456 Oak Ave, NYC, NY",
None, "2026-03-06", "2026-03-06"),
("ORD-1003", "alice@example.com", "delivered", 299.99,
"Smart Watch Pro x1", "123 Main St, SF, CA",
"1Z999AA10123456785", "2026-02-15", "2026-03-01"),
("ORD-1004", "carol@example.com", "cancelled", 79.99,
"Bluetooth Speaker x1", "789 Pine Rd, LA, CA",
None, "2026-03-01", "2026-03-03"),
]
self.conn.executemany(
"INSERT OR REPLACE INTO orders VALUES (?,?,?,?,?,?,?,?,?)", orders
)
# Knowledge base
articles = [
("shipping", "Shipping Policy",
"Standard shipping takes 5-7 business days. Express shipping takes 2-3 business days "
"and costs $12.99 extra. International shipping takes 10-14 business days. "
"Free shipping on orders over $50. All orders include tracking.",
"shipping, delivery, tracking, express, international"),
("refunds", "Refund Policy",
"Full refund available within 30 days of delivery for unused items in original packaging. "
"Partial refund (80%) available within 31-60 days. Refunds over $100 require manager approval. "
"Refunds are processed within 5-10 business days to the original payment method.",
"refund, return, money back, cancel"),
("account", "Account Management",
"To reset your password, go to Settings > Security > Change Password. "
"To update email, go to Settings > Profile. To delete account, contact support. "
"Two-factor authentication is available under Settings > Security > 2FA.",
"password, account, login, email, security, 2FA"),
("pricing", "Pricing Plans",
"Basic Plan: $9/month - 1 user, 10GB storage. "
"Pro Plan: $29/month - 5 users, 100GB storage, priority support. "
"Enterprise: Custom pricing - unlimited users, unlimited storage, dedicated support. "
"All plans include a 14-day free trial. Annual billing saves 20%.",
"pricing, plan, cost, subscription, upgrade, enterprise"),
("technical", "Troubleshooting Guide",
"App crashes: Clear cache (Settings > Storage > Clear Cache) and restart. "
"Sync issues: Check internet connection, then force sync (pull down to refresh). "
"Login problems: Clear cookies, try incognito mode, or reset password. "
"If issues persist, collect logs (Settings > Help > Export Logs) and contact support.",
"crash, bug, error, sync, slow, not working, broken"),
]
self.conn.executemany(
"INSERT OR REPLACE INTO knowledge_base (category, title, content, keywords) VALUES (?,?,?,?)",
articles
)
self.conn.commit()
def get_order(self, order_id: str) -> Optional[dict]:
row = self.conn.execute("SELECT * FROM orders WHERE order_id = ?", (order_id,)).fetchone()
return dict(row) if row else None
def search_kb(self, query: str) -> list[dict]:
"""Simple keyword search on knowledge base."""
words = query.lower().split()
results = []
for row in self.conn.execute("SELECT * FROM knowledge_base").fetchall():
article = dict(row)
keywords = article["keywords"].lower()
content = article["content"].lower()
score = sum(1 for w in words if w in keywords or w in content)
if score > 0:
article["relevance_score"] = score
results.append(article)
return sorted(results, key=lambda x: x["relevance_score"], reverse=True)[:3]
def create_refund(self, order_id: str, amount: float, reason: str) -> dict:
requires_approval = amount > 100
cursor = self.conn.execute(
"INSERT INTO refund_requests (order_id, amount, reason, status, requires_human_approval) "
"VALUES (?, ?, ?, ?, ?)",
(order_id, amount, reason, "pending_approval" if requires_approval else "approved", requires_approval)
)
self.conn.commit()
return {
"refund_id": cursor.lastrowid,
"order_id": order_id,
"amount": amount,
"status": "pending_approval" if requires_approval else "approved",
"requires_human_approval": requires_approval,
"message": (
f"Refund of ${amount:.2f} requires manager approval (over $100). "
"You will be notified within 24 hours."
if requires_approval else
f"Refund of ${amount:.2f} approved. It will appear on your statement in 5-10 business days."
)
}
def save_message(self, session_id: str, role: str, content: str):
self.conn.execute(
"INSERT INTO conversations (session_id, role, content) VALUES (?, ?, ?)",
(session_id, role, content)
)
self.conn.commit()
def get_history(self, session_id: str, limit: int = 20) -> list[dict]:
rows = self.conn.execute(
"SELECT role, content FROM conversations WHERE session_id = ? ORDER BY timestamp DESC LIMIT ?",
(session_id, limit)
).fetchall()
return [{"role": r["role"], "content": r["content"]} for r in reversed(rows)]
# =============================================================================
# Agent Tools
# =============================================================================
db = Database()
TOOLS = [
{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "Search the company knowledge base for articles about policies, "
"troubleshooting, pricing, shipping, refunds, and account management. "
"Use this BEFORE answering any policy or how-to questions.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query describing what the customer needs help with"
}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "lookup_order",
"description": "Look up order details by order ID. Returns status, items, shipping info, "
"and tracking number. Order IDs look like ORD-XXXX.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID, e.g., ORD-1001"
}
},
"required": ["order_id"]
}
}
},
{
"type": "function",
"function": {
"name": "process_refund",
"description": "Process a refund for a specific order. Requires the order ID, refund amount, "
"and reason. Refunds over $100 require human manager approval. "
"Always confirm details with the customer before calling this.",
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID to refund"
},
"amount": {
"type": "number",
"description": "The refund amount in USD"
},
"reason": {
"type": "string",
"description": "Reason for the refund"
}
},
"required": ["order_id", "amount", "reason"]
}
}
},
{
"type": "function",
"function": {
"name": "escalate_to_human",
"description": "Escalate the conversation to a human support agent. Use this when: "
"1) The customer is very frustrated/angry, "
"2) The issue is too complex for automated support, "
"3) The customer explicitly asks for a human, "
"4) You cannot resolve the issue with available tools.",
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Why this needs human attention"
},
"priority": {
"type": "string",
"enum": ["low", "medium", "high", "urgent"],
"description": "Priority level for the escalation"
},
"summary": {
"type": "string",
"description": "Brief summary of the issue and what has been tried"
}
},
"required": ["reason", "priority", "summary"]
}
}
}
]
def execute_tool(name: str, arguments: dict) -> str:
"""Execute a tool and return the result as a string."""
if name == "search_knowledge_base":
results = db.search_kb(arguments["query"])
if results:
return json.dumps([
{"title": r["title"], "content": r["content"], "relevance": r["relevance_score"]}
for r in results
], indent=2)
return "No articles found matching your query."
elif name == "lookup_order":
order = db.get_order(arguments["order_id"])
if order:
return json.dumps(order, indent=2)
return f"Order {arguments['order_id']} not found. Please verify the order ID."
elif name == "process_refund":
result = db.create_refund(
arguments["order_id"],
arguments["amount"],
arguments["reason"]
)
return json.dumps(result, indent=2)
elif name == "escalate_to_human":
return json.dumps({
"status": "escalated",
"ticket_id": f"ESC-{datetime.now().strftime('%Y%m%d%H%M%S')}",
"message": f"Your issue has been escalated to our support team with {arguments['priority']} priority. "
f"A human agent will reach out within "
f"{'1 hour' if arguments['priority'] == 'urgent' else '4 hours' if arguments['priority'] == 'high' else '24 hours'}.",
"reason": arguments["reason"]
}, indent=2)
return f"Unknown tool: {name}"
# =============================================================================
# The Agent
# =============================================================================
SYSTEM_PROMPT = """You are a friendly, empathetic, and efficient customer support agent for TechCo.
Your personality:
- Warm and professional
- Patient with frustrated customers
- Proactive in offering solutions
- Clear and concise in explanations
Your capabilities:
1. Search the knowledge base for policy info, troubleshooting, and FAQs
2. Look up order status and shipping information
3. Process refunds (with human approval for amounts over $100)
4. Escalate complex issues to human agents
Guidelines:
- ALWAYS search the knowledge base before answering policy questions
- ALWAYS confirm details before processing refunds
- If unsure, search the KB first, don't guess
- Use the customer's name if they provide it
- Acknowledge frustration before offering solutions
- For multi-step issues, explain each step clearly
- Offer follow-up help at the end of each interaction
"""
class CustomerSupportAgent:
"""A complete customer support agent with tools and memory."""
def __init__(self, session_id: str):
self.session_id = session_id
self.model = "gpt-4o"
self.max_tool_iterations = 5
def chat(self, user_message: str) -> str:
"""Process a user message and return the agent's response."""
# Save user message to history
db.save_message(self.session_id, "user", user_message)
# Build messages from history
history = db.get_history(self.session_id)
messages = [{"role": "system", "content": SYSTEM_PROMPT}]
for msg in history:
messages.append({"role": msg["role"], "content": msg["content"]})
# Agent loop with tool calling
for _ in range(self.max_tool_iterations):
response = client.chat.completions.create(
model=self.model,
messages=messages,
tools=TOOLS,
tool_choice="auto",
temperature=0.3 # Low temperature for consistent support
)
assistant_message = response.choices[0].message
# If no tool calls, we have the final response
if not assistant_message.tool_calls:
final_response = assistant_message.content
db.save_message(self.session_id, "assistant", final_response)
return final_response
# Process tool calls
messages.append(assistant_message)
for tool_call in assistant_message.tool_calls:
func_name = tool_call.function.name
func_args = json.loads(tool_call.function.arguments)
print(f" [Tool] {func_name}({json.dumps(func_args)})")
result = execute_tool(func_name, func_args)
print(f" [Result] {result[:200]}...")
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": result
})
return "I apologize, but I'm having trouble processing your request. Let me connect you with a human agent."
# =============================================================================
# Demo: Run a support conversation
# =============================================================================
def demo():
"""Demonstrate the customer support agent."""
agent = CustomerSupportAgent(session_id="demo-session-001")
conversations = [
"Hi, I ordered some headphones and they haven't arrived yet. My order is ORD-1001.",
"When exactly will it arrive? Can you give me the tracking number?",
"Actually, I changed my mind. I'd like a refund for this order.",
"Yes, please process the refund. The headphones aren't what I expected.",
"Also, what's your shipping policy for international orders?",
"One more thing - I can't log into my account. I forgot my password.",
"Thanks for all your help!",
]
for msg in conversations:
print(f"\n{'='*60}")
print(f"Customer: {msg}")
print(f"{'='*60}")
response = agent.chat(msg)
print(f"\nAgent: {response}")
if __name__ == "__main__":
demo()
Agent Best Practices and Common Pitfalls
Best Practices
- Start simple: Begin with direct tool calling before reaching for frameworks. Many use cases don't need LangGraph or CrewAI -- a simple loop with OpenAI function calling is often enough.
- Excellent tool descriptions: The single biggest impact on agent quality is the quality of your tool descriptions. Include what the tool does, when to use it, when NOT to use it, and parameter examples.
- Limit tool count: Don't give agents too many tools. 5-10 well-defined tools work much better than 50 vague ones. If you have many tools, consider a routing layer that selects relevant tools per query.
- Always log: Log every tool call, result, and LLM response. Agent debugging is hard without full traces. Use structured logging.
- Set iteration limits: Always set a maximum number of iterations for agent loops. Without limits, agents can loop forever (and rack up API costs).
- Validate tool inputs: Never trust the LLM's tool arguments blindly. Validate and sanitize all inputs, especially for tools that modify data (SQL queries, file writes, API calls).
- Human-in-the-loop: For any destructive or irreversible action (refunds, deletions, sends), require human confirmation.
Common Pitfalls
- Over-engineering: Using a multi-agent framework for a simple tool-calling use case. Match complexity to the problem.
- Infinite loops: Agents that keep calling tools without making progress. Add loop detection and graceful exits.
- Cost explosion: Agents making many LLM calls with large contexts. Monitor costs carefully and set budgets per conversation.
- Hallucinated tool calls: Agents inventing tool names or parameters that don't exist. Validate against your schema.
- No error handling: Tools fail in production. Always handle tool errors gracefully and give the agent a chance to recover.
Week 9 Summary
Key Takeaways
- AI Agents = LLM + Tools + Memory + Planning. They can take actions, not just generate text.
- Tool calling is the mechanism by which LLMs request external function execution. Both OpenAI and Anthropic support it natively.
- The ReAct pattern (Reasoning + Acting) creates a powerful feedback loop: Think, Act, Observe, Repeat.
- Five core agent patterns: Prompt Chaining, Routing, Parallelization, Orchestrator-Worker, and Evaluator-Optimizer. Choose based on your use case complexity.
- Frameworks like LangGraph, CrewAI, and AutoGen provide structure for complex agent systems, but start simple and add complexity as needed.
- In production, always implement logging, iteration limits, input validation, cost monitoring, and human-in-the-loop for destructive actions.
Next Week Preview
In Week 10, we will dive into MCP (Model Context Protocol), context engineering, and multi-agent systems. You will learn how to build MCP servers and clients, manage context effectively, and coordinate multiple specialized agents.