How to Build an AI Agent from Scratch in Python
Build a working AI agent from scratch in Python without frameworks. Covers the core agent loop, tool definitions, memory, error handling, and when to reach for LangChain or CrewAI instead.
An AI agent is not some mystical autonomous entity. It is an LLM in a loop with access to tools. That's it. The model receives a task, decides which tool to call, gets the result, and decides what to do next. Repeat until the task is done.
Here's the thing -- most tutorials overcomplicate this. They start with frameworks, abstractions, and architecture diagrams. But the core pattern is around 50 lines of Python. Once you understand that, frameworks make a lot more sense.
Let's build one from scratch.
The Core Loop
Every AI agent follows the same pattern:
1. Observe — receive input (user message + tool results)
- Think — the LLM decides what to do next
- Act — call a tool or return a final answer
- Repeat — feed the tool result back and go to step 1
That's the entire architecture. Everything else -- memory, planning, multi-agent coordination -- is built on top of this loop.
Setting Up
We'll use the Anthropic Python SDK, but the pattern works identically with OpenAI.
pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."
Defining Tools
Tools are just Python functions with a specific JSON schema so the model knows what they do and what arguments they accept:
import json
import httpx
from datetime import datetime
# The actual tool functions
def get_weather(city: str) -> str:
"""Fetch current weather for a city."""
response = httpx.get(
"https://api.weatherapi.com/v1/current.json",
params={"key": "YOUR_API_KEY", "q": city},
)
data = response.json()
current = data["current"]
return f"{current['temp_f']}°F, {current['condition']['text']}"
def calculate(expression: str) -> str:
"""Evaluate a math expression safely."""
allowed = set("0123456789+-*/.() ")
if not all(c in allowed for c in expression):
return "Error: only numeric math expressions allowed"
try:
return str(eval(expression))
except Exception as e:
return f"Error: {e}"
def search_files(query: str) -> str:
"""Search local files for a query string."""
import subprocess
result = subprocess.run(
["grep", "-r", "-l", query, "./docs"],
capture_output=True, text=True, timeout=5,
)
files = result.stdout.strip().split("\n")
return f"Found in: {', '.join(files)}" if files[0] else "No matches found"
# Tool registry -- maps names to functions
TOOLS = {
"get_weather": get_weather,
"calculate": calculate,
"search_files": search_files,
}
# Tool schemas for the API (this is what the model reads)
TOOL_SCHEMAS = [
{
"name": "get_weather",
"description": "Get the current weather for a city. Use this when the user asks about weather conditions.",
"input_schema": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name, e.g. 'Tokyo' or 'New York'"}
},
"required": ["city"],
},
},
{
"name": "calculate",
"description": "Evaluate a mathematical expression. Use Python math syntax.",
"input_schema": {
"type": "object",
"properties": {
"expression": {"type": "string", "description": "Math expression, e.g. '15.5 * 0.2'"}
},
"required": ["expression"],
},
},
{
"name": "search_files",
"description": "Search local documentation files for a query string. Returns matching file names.",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search term to find in files"}
},
"required": ["query"],
},
},
]
The descriptions matter more than you'd think. The model uses them to decide which tool to call. Vague descriptions lead to wrong tool selections.
The Agent Loop
Here's the complete agent. This is the core of the whole thing:
import anthropic
client = anthropic.Anthropic()
def run_agent(user_message: str, system_prompt: str = "You are a helpful assistant."):
messages = [{"role": "user", "content": user_message}]
while True:
# Step 1: Send messages to the model with available tools
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=system_prompt,
tools=TOOL_SCHEMAS,
messages=messages,
)
# Step 2: Check if the model wants to use a tool
if response.stop_reason == "tool_use":
# Collect the assistant's response (may include text + tool calls)
assistant_content = response.content
messages.append({"role": "assistant", "content": assistant_content})
# Execute each tool call
tool_results = []
for block in assistant_content:
if block.type == "tool_use":
tool_name = block.name
tool_input = block.input
print(f" [Tool] {tool_name}({json.dumps(tool_input)})")
# Run the actual function
if tool_name in TOOLS:
result = TOOLStool_name
else:
result = f"Error: unknown tool '{tool_name}'"
print(f" [Result] {result}")
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Feed results back to the model
messages.append({"role": "user", "content": tool_results})
else:
# Step 3: Model is done -- extract the final text response
final_text = ""
for block in response.content:
if hasattr(block, "text"):
final_text += block.text
return final_text
# Try it
answer = run_agent("What's the weather in Tokyo, and what's a 20% tip on $67.50?")
print(answer)
Run this and you'll see the agent call get_weather for Tokyo, then calculate for the tip, then compose a natural language answer from both results. Two tool calls, one coherent response.
Adding Memory
The agent above is stateless -- each call starts fresh. To maintain conversation history across turns, wrap it with a simple memory layer:
class Agent:
def __init__(self, system_prompt: str, tools: list, tool_map: dict):
self.client = anthropic.Anthropic()
self.system_prompt = system_prompt
self.tools = tools
self.tool_map = tool_map
self.conversation_history = []
def chat(self, user_message: str) -> str:
self.conversation_history.append(
{"role": "user", "content": user_message}
)
while True:
response = self.client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
system=self.system_prompt,
tools=self.tools,
messages=self.conversation_history,
)
if response.stop_reason == "tool_use":
self.conversation_history.append(
{"role": "assistant", "content": response.content}
)
tool_results = []
for block in response.content:
if block.type == "tool_use":
fn = self.tool_map.get(block.name)
result = fn(**block.input) if fn else "Unknown tool"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
self.conversation_history.append(
{"role": "user", "content": tool_results}
)
else:
text = "".join(
b.text for b in response.content if hasattr(b, "text")
)
self.conversation_history.append(
{"role": "assistant", "content": text}
)
return text
agent = Agent(
system_prompt="You are a helpful assistant with access to tools.",
tools=TOOL_SCHEMAS,
tool_map=TOOLS,
)
print(agent.chat("What's the weather in London?"))
print(agent.chat("How about New York?")) # Remembers we're talking about weather
print(agent.chat("Compare the two.")) # Knows which two cities we mean
The conversation history grows with each turn. For production, you'll want to trim older messages once you approach the model's context window limit.
Error Handling
Agents in the wild hit errors constantly -- API timeouts, bad tool inputs, hallucinated tool names. You need to handle all of it:
import time
MAX_ITERATIONS = 15
MAX_EXECUTION_TIME = 120 # seconds
def run_agent_safe(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
iterations = 0
start_time = time.time()
while True:
iterations += 1
elapsed = time.time() - start_time
if iterations > MAX_ITERATIONS:
return "I hit my iteration limit. Here's what I found so far..."
if elapsed > MAX_EXECUTION_TIME:
return "This is taking too long. Let me summarize what I have..."
try:
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=4096,
tools=TOOL_SCHEMAS,
messages=messages,
)
except anthropic.RateLimitError:
time.sleep(2)
continue
except anthropic.APIError as e:
return f"API error: {e}. Please try again."
if response.stop_reason == "tool_use":
messages.append({"role": "assistant", "content": response.content})
tool_results = []
for block in response.content:
if block.type == "tool_use":
try:
fn = TOOLS.get(block.name)
if fn is None:
result = f"Tool '{block.name}' does not exist."
else:
result = fn(**block.input)
except Exception as e:
result = f"Tool error: {type(e).__name__}: {e}"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
messages.append({"role": "user", "content": tool_results})
else:
return "".join(
b.text for b in response.content if hasattr(b, "text")
)
The key insight: when a tool fails, feed the error message back to the model. It will usually adapt -- try different parameters, pick a different tool, or explain to the user why the task couldn't be completed.
When to Use a Framework
Now that you've built an agent from scratch, here's when to reach for frameworks:
Roll your own when:- You need fewer than 5 tools
- You want full control over the loop
- You're embedding it in an existing app
- You don't want the dependency overhead
- You need RAG (retrieval-augmented generation)
- You want pre-built tool integrations (Google Search, SQL, etc.)
- You need complex chains with branching logic
- You need multiple specialized agents collaborating
- You have a pipeline (research then analyze then write)
- Role-based task delegation fits your use case
Common Pitfalls
Infinite loops. The model decides to keep calling tools forever. Always setmax_iterations. I've seen agents burn through $20 in API calls chasing a rabbit hole.
Token limits. Each iteration appends to the conversation. After 10+ tool calls, you can hit the context window. Truncate or summarize older messages.
Tool hallucination. The model invents tool names or parameters that don't exist. Validate tool names against your registry. Return clear error messages so the model can self-correct.
Over-engineering. The reality is most tasks need 2-3 tools and 1-3 iterations. Don't build a 15-tool, multi-agent system when a simple loop will do. Start minimal and add complexity only when you have evidence it's needed.
The pattern is simple: LLM + tools + loop. Everything else is details. Start with the 50-line version, get it working for your use case, and evolve from there.
For more tutorials on building practical AI applications, check out CodeUp.