Gemini API Tutorial: Build with Google's AI Model in Python
Complete guide to using the Gemini API in Python. Covers text generation, multimodal inputs, streaming, chat history, function calling, grounding, structured output, and a practical receipt analyzer project.
Google's Gemini models are genuinely good. Not "good for a Google product" -- actually competitive with the best models available. And the free tier is generous enough to build real projects without spending anything.
Let's go from zero to building a practical multimodal application. No hand-waving, just code that works.
Setup
Get an API key from Google AI Studio. Free, takes 30 seconds.
pip install google-genai
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")
Basic Text Generation
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Explain how DNS works in 3 sentences."
)
print(response.text)
System Instructions and Multimodal Input
System instructions set the model's behavior. Multimodal is where Gemini genuinely shines -- it handles images natively, not as an afterthought.
import pathlib
image_bytes = pathlib.Path("architecture-diagram.png").read_bytes()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
genai.types.Part.from_bytes(data=image_bytes, mime_type="image/png"),
"Describe this architecture diagram. What are the main components?"
],
config=genai.types.GenerateContentConfig(
system_instruction="You are a pragmatic senior developer. Keep responses concise."
)
)
print(response.text)
Streaming and Chat
For long responses, streaming gives you tokens as they're generated: use generate_content_stream() and iterate over chunks. Essential for any user-facing app.
Chat with History
chat = client.chats.create(model="gemini-2.5-flash")
response = chat.send_message("I'm getting a TypeError in my Python code")
print(response.text)
response = chat.send_message("It says 'unsupported operand type'")
print(response.text)
response = chat.send_message("Here's the code: result = '5' + 3")
print(response.text) # Model remembers the full conversation
Function Calling
Function calling lets Gemini invoke your Python functions for real-time data or actions:
def get_weather(city: str) -> dict:
"""Get current weather for a city."""
weather_data = {
"New York": {"temp": 22, "condition": "Partly cloudy"},
"Tokyo": {"temp": 28, "condition": "Sunny"},
}
return weather_data.get(city, {"temp": 0, "condition": "Unknown"})
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What's the weather in Tokyo?",
config=genai.types.GenerateContentConfig(tools=[get_weather])
)
print(response.text)
The model decides when to call your functions, extracts arguments from natural language, and incorporates results into its response.
Grounding with Google Search
Connect Gemini to real-time search results for current information:
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="What were the major tech layoffs this week?",
config=genai.types.GenerateContentConfig(
tools=[genai.types.Tool(google_search=genai.types.GoogleSearch())]
)
)
print(response.text)
Structured Output (JSON Mode)
When you need data in a specific format, not free-form text:
from pydantic import BaseModel
class MovieReview(BaseModel):
title: str
rating: float
pros: list[str]
cons: list[str]
recommended: bool
response = client.models.generate_content(
model="gemini-2.5-flash",
contents="Review the movie 'Inception'",
config=genai.types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=MovieReview
)
)
import json
review = json.loads(response.text)
print(f"Rating: {review['rating']}/10 -- Recommended: {review['recommended']}")
Gemini 2.5 Pro vs Flash
| Feature | Pro | Flash |
|---|---|---|
| Speed | Slower | Very fast |
| Cost | ~$1.25/M input | ~$0.15/M input |
| Best for | Complex reasoning, detailed analysis | Chat, classification, quick tasks |
| Context | 1M tokens | 1M tokens |
| Model | Input/1M tokens | Output/1M tokens |
|---|---|---|
| Gemini 2.5 Flash | $0.15 | $0.60 |
| Gemini 2.5 Pro | $1.25 | $10.00 |
| GPT-4o | $2.50 | $10.00 |
| Claude Sonnet 4 | $3.00 | $15.00 |
Practical Project: Receipt Analyzer
Let's build something real -- a multimodal analyzer that extracts structured data from receipt photos:
from pydantic import BaseModel
class LineItem(BaseModel):
description: str
quantity: int
unit_price: float
total: float
class Receipt(BaseModel):
store_name: str
date: str
items: list[LineItem]
subtotal: float
tax: float
total: float
def analyze_receipt(image_path: str) -> Receipt:
image_bytes = pathlib.Path(image_path).read_bytes()
response = client.models.generate_content(
model="gemini-2.5-flash",
contents=[
genai.types.Part.from_bytes(data=image_bytes, mime_type="image/jpeg"),
"Extract all information from this receipt including every line item."
],
config=genai.types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=Receipt
)
)
return Receipt(**json.loads(response.text))
receipt = analyze_receipt("grocery_receipt.jpg")
print(f"Store: {receipt.store_name}")
for item in receipt.items:
print(f" {item.description}: {item.quantity}x ${item.unit_price}")
print(f"Total: ${receipt.total}")
This combines multimodal input, structured output, and practical utility. You could extend this into an expense tracker or accounting tool.
Common Gotchas
Rate limits. Free tier: 15 requests/min for Flash, 2 for Pro. Production apps need a paid plan. Token counting. Images consume tokens too -- a typical photo uses ~258 tokens. Nondeterminism. Same prompt can produce different outputs. Settemperature=0 for more consistency.
Error handling. Always wrap API calls in try/except. Catch exceptions.ResourceExhausted for rate limits and exceptions.InvalidArgument for bad requests.
The Gemini API is straightforward, the models are capable, and the pricing is competitive. Worth evaluating alongside OpenAI and Anthropic for any AI-powered feature.
For more AI tutorials and practical coding guides, check out CodeUp.