March 27, 20268 min read

Claude API Tutorial: Complete Guide to Building with Anthropic's SDK

A complete tutorial on the Claude API covering messages, streaming, tool use, vision, extended thinking, and cost optimization. Includes runnable Python code for every feature.

claude anthropic api python ai tutorial
Ad 336x280

The Claude API is one of the most capable LLM APIs available, and it's genuinely pleasant to work with. The SDK is clean, the documentation is good, and the model is excellent at coding and analysis tasks. But there are nuances -- tool use, streaming, extended thinking, cost management -- that aren't obvious from the docs alone.

This is the guide I wish I had when I started. Every feature, with real code, honest takes on what works well and what to watch out for.

Getting Started

Sign up at console.anthropic.com, create an API key, and install the SDK:

pip install anthropic
export ANTHROPIC_API_KEY="sk-ant-..."

Basic Message Creation

The Messages API is the core interface:

import anthropic

client = anthropic.Anthropic()

message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{"role": "user", "content": "Explain Python's GIL in three sentences."}
],
)

print(message.content[0].text)
print(f"Tokens used: {message.usage.input_tokens} in, {message.usage.output_tokens} out")

The response object gives you the generated text, token counts, stop reason, and model info. Always check message.stop_reason -- it tells you whether the model finished naturally (end_turn), hit the token limit (max_tokens), or wants to use a tool (tool_use).

System Prompts

System prompts set the model's behavior. They go in a separate parameter, not in the messages array:

message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system="You are a senior backend engineer. Be concise and opinionated. When there are multiple approaches, recommend the best one and explain why.",
    messages=[
        {"role": "user", "content": "Should I use Redis or Memcached for session storage?"}
    ],
)

Good system prompts make a dramatic difference. Be specific about the role, tone, format, and constraints you want.

Multi-Turn Conversations

Pass the full conversation history in the messages array:

messages = [
    {"role": "user", "content": "I'm building a REST API in Python. Should I use FastAPI or Flask?"},
    {"role": "assistant", "content": "For a new project in 2026, FastAPI is the better choice..."},
    {"role": "user", "content": "What about Django REST Framework?"},
]

response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
)

Each API call is stateless. The model only knows what you send in messages. For a chatbot, you maintain the conversation history on your side and send the full thing each time. This means you control exactly what context the model sees.

Streaming

For any user-facing application, you want streaming. Tokens arrive as they're generated instead of waiting for the full response:

with client.messages.stream(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Write a Python async web scraper."}],
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

# Get the final message object after streaming completes
response = stream.get_final_message()
print(f"\nTokens: {response.usage.input_tokens} + {response.usage.output_tokens}")

Streaming doesn't change the output quality or cost. It just changes the delivery. Use it everywhere you can.

Tool Use (Function Calling)

This is where Claude gets really useful. You define tools, Claude decides when to call them, and you execute the functions:

import json

# Define tools
tools = [
    {
        "name": "get_stock_price",
        "description": "Get the current stock price for a ticker symbol. Use this when the user asks about stock prices or market data.",
        "input_schema": {
            "type": "object",
            "properties": {
                "ticker": {
                    "type": "string",
                    "description": "Stock ticker symbol, e.g. 'AAPL', 'GOOGL'",
                },
            },
            "required": ["ticker"],
        },
    },
    {
        "name": "calculate",
        "description": "Perform mathematical calculations. Use Python syntax.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "Math expression to evaluate",
                },
            },
            "required": ["expression"],
        },
    },
]

# First API call -- model decides to use a tool
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What's Apple's stock price, and what would 100 shares cost?"}],
)

# Check if the model wants to use tools
if response.stop_reason == "tool_use":
    tool_blocks = [b for b in response.content if b.type == "tool_use"]

for tool_call in tool_blocks:
print(f"Tool: {tool_call.name}, Input: {tool_call.input}")

# Execute the tool (your real implementation) if tool_call.name == "get_stock_price": result = json.dumps({"ticker": tool_call.input["ticker"], "price": 198.50}) elif tool_call.name == "calculate": result = str(eval(tool_call.input["expression"])) # Send the result back follow_up = client.messages.create( model="claude-sonnet-4-20250514", max_tokens=1024, tools=tools, messages=[ {"role": "user", "content": "What's Apple's stock price, and what would 100 shares cost?"}, {"role": "assistant", "content": response.content}, { "role": "user", "content": [ { "type": "tool_result", "tool_use_id": tool_call.id, "content": result, } ], }, ], ) print(follow_up.content[0].text)

The pattern is always the same: send tools, check for tool_use stop reason, execute the tools, send results back. For an agent loop that handles multiple rounds, see my article on building AI agents from scratch.

Vision

Claude can analyze images. Send them as base64 or URLs:

import base64

# From a file
with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")

message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=[
{
"role": "user",
"content": [
{
"type": "image",
"source": {
"type": "base64",
"media_type": "image/png",
"data": image_data,
},
},
{
"type": "text",
"text": "What errors do you see in this screenshot? List them with line numbers.",
},
],
}
],
)
print(message.content[0].text)

Vision is excellent for analyzing screenshots, diagrams, charts, handwritten notes, and UI mockups. It's surprisingly good at reading code from screenshots, which makes it useful for debugging visual issues.

Extended Thinking

For complex reasoning tasks, extended thinking lets Claude "think out loud" before answering:

response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000,  # How many tokens for thinking
    },
    messages=[
        {"role": "user", "content": "Analyze this algorithm's time complexity and suggest optimizations:\n\ndef find_pairs(arr, target):\n    result = []\n    for i in range(len(arr)):\n        for j in range(i+1, len(arr)):\n            if arr[i] + arr[j] == target:\n                result.append((arr[i], arr[j]))\n    return result"}
    ],
)

# The response contains thinking blocks and text blocks
for block in response.content:
    if block.type == "thinking":
        print(f"[Thinking]\n{block.thinking}\n")
    elif block.type == "text":
        print(f"[Answer]\n{block.text}")

Extended thinking costs extra tokens but noticeably improves results on math, logic, code analysis, and multi-step reasoning. Set budget_tokens to control how much thinking the model does. For simple questions, skip it.

Error Handling and Retries

Production code needs to handle failures gracefully:

import time
from anthropic import (
    APIError,
    RateLimitError,
    APITimeoutError,
    BadRequestError,
)

def call_claude(messages, max_retries=3):
for attempt in range(max_retries):
try:
return client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
messages=messages,
)
except RateLimitError:
wait = 2 ** attempt # Exponential backoff: 1s, 2s, 4s
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
except APITimeoutError:
print(f"Timeout on attempt {attempt + 1}")
time.sleep(1)
except BadRequestError as e:
# Don't retry bad requests -- the input is wrong
raise e
except APIError as e:
print(f"API error: {e}")
if attempt == max_retries - 1:
raise
time.sleep(1)

raise Exception("Max retries exceeded")

The SDK also has built-in retry support:

# Built-in retries with exponential backoff
client = anthropic.Anthropic(max_retries=3, timeout=30.0)

Cost Optimization

Claude pricing is per-token, and it adds up fast in production. Here's what I've learned:

# Check costs before they surprise you
def estimate_cost(input_tokens, output_tokens, model="claude-sonnet-4-20250514"):
    pricing = {
        "claude-sonnet-4-20250514": {"input": 3.0, "output": 15.0},  # per million tokens
        "claude-haiku-4-20250514": {"input": 0.80, "output": 4.0},
    }
    rates = pricing[model]
    cost = (input_tokens  rates["input"] + output_tokens  rates["output"]) / 1_000_000
    return f"${cost:.4f}"

# After each call
msg = client.messages.create(...)
print(estimate_cost(msg.usage.input_tokens, msg.usage.output_tokens))
Cost reduction strategies:
  1. Use Haiku for simple tasks. Routing, classification, extraction -- Haiku handles these at a fraction of the cost.
  2. Trim conversation history. Don't send 50 turns when the last 10 provide enough context.
  3. Cache system prompts. Anthropic's prompt caching can reduce costs for repeated system prompts.
  4. Be specific in prompts. "List 3 bullet points" costs less than "explain in detail."
  5. Batch requests. The Message Batches API gives a 50% discount for non-urgent requests.
# Prompt caching for repeated system prompts
message = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=1024,
    system=[
        {
            "type": "text",
            "text": "Your very long system prompt goes here...",
            "cache_control": {"type": "ephemeral"},
        }
    ],
    messages=[{"role": "user", "content": "Your question"}],
)
# Subsequent calls with the same system prompt use cached tokens at reduced cost

Claude vs OpenAI: Honest Comparison

Having used both extensively:

Claude is better at: long-form analysis, following complex instructions, coding tasks, maintaining consistency over long conversations, honesty about uncertainty. OpenAI is better at: ecosystem breadth (more third-party integrations), fine-tuning options, image generation (DALL-E), real-time voice (GPT-4o). They're comparable at: general chat, summarization, translation, creative writing.

The reality is both are excellent, and the gap narrows with every release. Pick based on your specific use case, not brand loyalty. And design your code so you can switch -- the abstractions aren't that different.

Available Models

ModelBest ForContext Window
Claude OpusComplex analysis, research200K
Claude SonnetGeneral purpose, coding200K
Claude HaikuFast tasks, classification200K
All models support tool use, vision, and streaming. Sonnet hits the sweet spot of capability vs cost for most applications.

The Claude API is straightforward once you understand the patterns. Start with basic messages, add tools when you need external data, add streaming for user-facing apps, and add extended thinking for complex reasoning. Build incrementally.

For more tutorials on building with AI APIs, check out CodeUp.

Ad 728x90