March 27, 20264 min read

Gemini API Tutorial: Build with Google's AI Model in Python

Complete guide to using the Gemini API in Python. Covers text generation, multimodal inputs, streaming, chat history, function calling, grounding, structured output, and a practical receipt analyzer project.

ai gemini python api tutorial

Google's Gemini models are genuinely good. Not "good for a Google product" -- actually competitive with the best models available. And the free tier is generous enough to build real projects without spending anything.

Let's go from zero to building a practical multimodal application. No hand-waving, just code that works.

Setup

Get an API key from Google AI Studio. Free, takes 30 seconds.

pip install google-genai

from google import genai
client = genai.Client(api_key="YOUR_API_KEY")

Basic Text Generation

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain how DNS works in 3 sentences."
)
print(response.text)

System Instructions and Multimodal Input

System instructions set the model's behavior. Multimodal is where Gemini genuinely shines -- it handles images natively, not as an afterthought.

import pathlib

image_bytes = pathlib.Path("architecture-diagram.png").read_bytes()

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[
        genai.types.Part.from_bytes(data=image_bytes, mime_type="image/png"),
        "Describe this architecture diagram. What are the main components?"
    ],
    config=genai.types.GenerateContentConfig(
        system_instruction="You are a pragmatic senior developer. Keep responses concise."
    )
)
print(response.text)

Streaming and Chat

For long responses, streaming gives you tokens as they're generated: use generate_content_stream() and iterate over chunks. Essential for any user-facing app.

Chat with History

chat = client.chats.create(model="gemini-2.5-flash")

response = chat.send_message("I'm getting a TypeError in my Python code")
print(response.text)

response = chat.send_message("It says 'unsupported operand type'")
print(response.text)

response = chat.send_message("Here's the code: result = '5' + 3")
print(response.text)  # Model remembers the full conversation

Function Calling

Function calling lets Gemini invoke your Python functions for real-time data or actions:

def get_weather(city: str) -> dict:
    """Get current weather for a city."""
    weather_data = {
        "New York": {"temp": 22, "condition": "Partly cloudy"},
        "Tokyo": {"temp": 28, "condition": "Sunny"},
    }
    return weather_data.get(city, {"temp": 0, "condition": "Unknown"})

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What's the weather in Tokyo?",
    config=genai.types.GenerateContentConfig(tools=[get_weather])
)
print(response.text)

The model decides when to call your functions, extracts arguments from natural language, and incorporates results into its response.

Grounding with Google Search

Connect Gemini to real-time search results for current information:

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="What were the major tech layoffs this week?",
    config=genai.types.GenerateContentConfig(
        tools=[genai.types.Tool(google_search=genai.types.GoogleSearch())]
    )
)
print(response.text)

Structured Output (JSON Mode)

When you need data in a specific format, not free-form text:

from pydantic import BaseModel

class MovieReview(BaseModel):
    title: str
    rating: float
    pros: list[str]
    cons: list[str]
    recommended: bool

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Review the movie 'Inception'",
    config=genai.types.GenerateContentConfig(
        response_mime_type="application/json",
        response_schema=MovieReview
    )
)
import json
review = json.loads(response.text)
print(f"Rating: {review['rating']}/10 -- Recommended: {review['recommended']}")

Gemini 2.5 Pro vs Flash

Feature	Pro	Flash
Speed	Slower	Very fast
Cost	~$1.25/M input	~$0.15/M input
Best for	Complex reasoning, detailed analysis	Chat, classification, quick tasks
Context	1M tokens	1M tokens

Flash is good enough for 80% of use cases. Start there and only upgrade to Pro if quality isn't sufficient. Pricing vs competitors:

Model	Input/1M tokens	Output/1M tokens
Gemini 2.5 Flash	$0.15	$0.60
Gemini 2.5 Pro	$1.25	$10.00
GPT-4o	$2.50	$10.00
Claude Sonnet 4	$3.00	$15.00

Practical Project: Receipt Analyzer

Let's build something real -- a multimodal analyzer that extracts structured data from receipt photos:

from pydantic import BaseModel

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: float
    total: float

class Receipt(BaseModel):
    store_name: str
    date: str
    items: list[LineItem]
    subtotal: float
    tax: float
    total: float

def analyze_receipt(image_path: str) -> Receipt:
    image_bytes = pathlib.Path(image_path).read_bytes()
    response = client.models.generate_content(
        model="gemini-2.5-flash",
        contents=[
            genai.types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
            "Extract all information from this receipt including every line item."
        ],
        config=genai.types.GenerateContentConfig(
            response_mime_type="application/json",
            response_schema=Receipt
        )
    )
    return Receipt(**json.loads(response.text))

receipt = analyze_receipt("grocery_receipt.jpg")
print(f"Store: {receipt.store_name}")
for item in receipt.items:
    print(f"  {item.description}: {item.quantity}x ${item.unit_price}")
print(f"Total: ${receipt.total}")

This combines multimodal input, structured output, and practical utility. You could extend this into an expense tracker or accounting tool.

Common Gotchas

Rate limits. Free tier: 15 requests/min for Flash, 2 for Pro. Production apps need a paid plan. Token counting. Images consume tokens too -- a typical photo uses ~258 tokens. Nondeterminism. Same prompt can produce different outputs. Set temperature=0 for more consistency. Error handling. Always wrap API calls in try/except. Catch exceptions.ResourceExhausted for rate limits and exceptions.InvalidArgument for bad requests.

The Gemini API is straightforward, the models are capable, and the pricing is competitive. Worth evaluating alongside OpenAI and Anthropic for any AI-powered feature.

For more AI tutorials and practical coding guides, check out CodeUp.