March 27, 20269 min read

How to Use the OpenAI API: Build an AI-Powered App from Scratch

Learn the OpenAI API from setup to production. Chat completions, streaming, function calling, and building a real AI app with Python.

openai api ai python tutorial

You've used ChatGPT. You've seen the demos. Now you want to put AI into your own app. The good news: the OpenAI API is surprisingly straightforward once you understand the core concepts. The bad news: the documentation is sprawling and changes constantly.

This guide cuts through the noise. By the end, you'll have built both a CLI chatbot and a web app with AI capabilities, and you'll understand exactly how the billing works so you don't wake up to a surprise invoice.

Getting Your API Key

First, head to platform.openai.com. Create an account (separate from your ChatGPT account), add a payment method, and generate an API key from the API Keys section.

Store it safely. Treat it like a password.

# Set as environment variable (don't hardcode it in your source code)
export OPENAI_API_KEY="sk-proj-abc123..."

Create a .env file for local development:

OPENAI_API_KEY=sk-proj-abc123...

And immediately add .env to your .gitignore. Leaked API keys get abused within minutes -- bots actively scan GitHub for them.

Install the Python SDK

pip install openai python-dotenv

The official Python library handles authentication, retries, and type safety. You could use raw HTTP requests, but there's no reason to.

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()  # Automatically reads OPENAI_API_KEY from env

Your First Chat Completion

The chat completions endpoint is what powers ChatGPT. You send a list of messages, you get a response back.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful coding assistant."},
        {"role": "user", "content": "Explain Python decorators in 3 sentences."}
    ]
)

print(response.choices[0].message.content)

The messages array is the core concept. Every API call sends the full conversation history. There's no server-side memory -- you manage context yourself.

Three roles matter:

system -- Sets the AI's behavior. "You are a pirate" makes it talk like a pirate. "You are a JSON API" makes it return structured data. This is powerful.
user -- The human's message.
assistant -- The AI's previous responses. You include these to maintain conversation context.

The Messages Array: How Conversations Work

Here's what a multi-turn conversation looks like:

messages = [
    {"role": "system", "content": "You are a Python tutor. Be concise."},
    {"role": "user", "content": "What's a list comprehension?"},
    {"role": "assistant", "content": "A list comprehension is a concise way to create lists. Instead of a for loop, you write [expression for item in iterable]. Example: [x**2 for x in range(5)] gives [0, 1, 4, 9, 16]."},
    {"role": "user", "content": "Can I add conditions?"}
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

The AI sees the full conversation and knows the user is asking about conditions in list comprehensions, not conditions in general. You're responsible for appending each new message and response to the array.

Streaming Responses

Nobody wants to wait 10 seconds staring at a blank screen. Streaming delivers tokens as they're generated:

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "user", "content": "Write a short poem about debugging."}
    ],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        print(content, end="", flush=True)

print()  # Newline at the end

Each chunk contains a small piece of the response. The delta.content field is None for the final chunk, so you need the if content check. Streaming doesn't change the cost -- you're billed the same either way.

Function Calling: Give the AI Superpowers

Function calling is where things get interesting. You define functions that the AI can "call" by returning structured arguments, and then you execute them yourself.

import json

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "City and state, e.g., San Francisco, CA"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools
)

message = response.choices[0].message

if message.tool_calls:
    call = message.tool_calls[0]
    args = json.loads(call.function.arguments)
    print(f"AI wants to call: {call.function.name}({args})")
    # You'd call your actual weather API here

The AI doesn't execute functions. It decides which function to call and what arguments to pass. You execute the function, then send the result back:

# After getting the function result, send it back
messages.append(message)  # The assistant's message with tool_calls
messages.append({
    "role": "tool",
    "tool_call_id": call.id,
    "content": json.dumps({"temperature": 22, "condition": "Partly cloudy"})
})

# Get the final response
final = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    tools=tools
)
print(final.choices[0].message.content)
# "The current weather in Tokyo is 22C and partly cloudy."

This pattern lets you connect AI to databases, APIs, file systems -- anything.

Building a CLI Chatbot

Let's put it together into something real:

from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()
client = OpenAI()

def chat():
    messages = [
        {"role": "system", "content": "You are a helpful coding assistant. Keep answers concise. Use code examples when relevant."}
    ]

print("AI Coding Assistant (type 'quit' to exit)")
    print("-" * 45)

while True:
        user_input = input("\nYou: ").strip()

if user_input.lower() in ("quit", "exit", "q"):
            print("Goodbye!")
            break

if not user_input:
            continue

messages.append({"role": "user", "content": user_input})

try:
            stream = client.chat.completions.create(
                model="gpt-4o",
                messages=messages,
                stream=True,
                max_tokens=1024
            )

print("\nAssistant: ", end="")
            full_response = ""

for chunk in stream:
                content = chunk.choices[0].delta.content
                if content:
                    print(content, end="", flush=True)
                    full_response += content

print()
            messages.append({"role": "assistant", "content": full_response})

except Exception as e:
            print(f"\nError: {e}")
            messages.pop()  # Remove the failed user message

if __name__ == "__main__":
    chat()

Run it with python chatbot.py. You have a working AI assistant in about 50 lines.

Building a Web App with AI

Here's a FastAPI backend that serves as an AI chat API:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from openai import OpenAI
import json

app = FastAPI()
client = OpenAI()

class ChatRequest(BaseModel):
    messages: list[dict]
    model: str = "gpt-4o"
    stream: bool = False

@app.post("/chat")
async def chat(request: ChatRequest):
    try:
        if request.stream:
            return StreamingResponse(
                generate_stream(request),
                media_type="text/event-stream"
            )

response = client.chat.completions.create(
            model=request.model,
            messages=request.messages,
            max_tokens=2048
        )

return {
            "content": response.choices[0].message.content,
            "usage": {
                "prompt_tokens": response.usage.prompt_tokens,
                "completion_tokens": response.usage.completion_tokens,
                "total_tokens": response.usage.total_tokens
            }
        }

except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

async def generate_stream(request: ChatRequest):
    stream = client.chat.completions.create(
        model=request.model,
        messages=request.messages,
        stream=True,
        max_tokens=2048
    )

for chunk in stream:
        content = chunk.choices[0].delta.content
        if content:
            yield f"data: {json.dumps({'content': content})}\n\n"

yield "data: [DONE]\n\n"

And a simple frontend to consume it:

async function sendMessage(messages) {
  const response = await fetch('/chat', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ messages, stream: true })
  });

const reader = response.body.getReader();
  const decoder = new TextDecoder();

while (true) {
    const { done, value } = await reader.read();
    if (done) break;

const text = decoder.decode(value);
    const lines = text.split('\n').filter(line => line.startsWith('data: '));

for (const line of lines) {
      const data = line.slice(6);
      if (data === '[DONE]') return;

const parsed = JSON.parse(data);
      appendToChat(parsed.content);
    }
  }
}

Cost Management

This is where people get burned. OpenAI charges per token (roughly 4 characters = 1 token).

GPT-4o pricing (as of early 2026):

Input: ~$2.50 per 1M tokens

Output: ~$10 per 1M tokens

Practical tips to control costs:

# 1. Set max_tokens to limit response length
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    max_tokens=500  # Caps the response
)

# 2. Track usage
usage = response.usage
print(f"Prompt: {usage.prompt_tokens} tokens")
print(f"Response: {usage.completion_tokens} tokens")
print(f"Total: {usage.total_tokens} tokens")

# 3. Use cheaper models for simple tasks
# gpt-4o-mini is ~10x cheaper and fine for classification, extraction, simple Q&A
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages
)

# 4. Trim conversation history
# Don't send 100 messages when the last 10 provide enough context
def trim_messages(messages, max_messages=20):
    system = [m for m in messages if m["role"] == "system"]
    recent = messages[-max_messages:]
    return system + [m for m in recent if m["role"] != "system"]

Set a monthly spending limit in your OpenAI dashboard. Seriously, do this before anything else.

Rate Limits and Error Handling

OpenAI enforces rate limits based on your account tier. New accounts start with lower limits.

from openai import OpenAI, RateLimitError, APIError
import time

client = OpenAI()

def robust_completion(messages, retries=3):
    for attempt in range(retries):
        try:
            return client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except RateLimitError:
            wait = 2 ** attempt  # Exponential backoff: 1s, 2s, 4s
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
        except APIError as e:
            if e.status_code >= 500:  # Server error, retry
                time.sleep(1)
                continue
            raise  # Client error, don't retry

raise Exception("Max retries exceeded")

Common errors you'll hit:

401 -- Invalid API key. Check your environment variable.

429 -- Rate limited. Back off and retry.

400 -- Bad request. Usually your messages array is malformed.

500/503 -- OpenAI is having issues. Retry with backoff.

Common Mistakes

Sending the entire conversation every time. The messages array grows with each turn. After 50 exchanges, you're sending a lot of tokens. Summarize or truncate old messages. Not using system messages. The system message dramatically affects response quality. "You are a senior Python developer. Give concise, production-ready code." produces much better output than no system message at all. Hardcoding the API key. Use environment variables. Always. Even for "quick tests." Ignoring the finish_reason. Check response.choices[0].finish_reason. If it's "length", the response was cut off by max_tokens. If it's "stop", it completed naturally. Using GPT-4o for everything. GPT-4o-mini handles classification, extraction, and simple generation just fine at a fraction of the cost. Use the most capable model only when you need it.

What's Next

You now know the core OpenAI API: chat completions, streaming, function calling, and cost management. From here, explore embeddings for semantic search, the Assistants API for stateful conversations, and vision capabilities for image analysis.

The real skill isn't calling the API -- it's designing the system around it. Good prompts, smart context management, and graceful error handling separate toy demos from production apps.

Practice building AI-powered tools at CodeUp -- the best way to learn is to ship something real.