March 26, 202612 min read

OpenAI vs Claude vs Gemini — AI API Comparison for Developers

A practical comparison of OpenAI, Claude, and Gemini APIs for developers — pricing, context windows, code generation quality, function calling, vision, speed, and rate limits with code examples.

openai claude gemini ai api comparison
Ad 336x280

If you're integrating an LLM into a product, you have three serious options: OpenAI's GPT models, Anthropic's Claude, and Google's Gemini. Each has real strengths, real weaknesses, and pricing structures that change every few months. The marketing from all three companies is predictably useless for making an engineering decision.

This is a practical comparison based on building with all three APIs. Pricing, context windows, code quality, function calling, latency, rate limits — the things that actually matter when you're writing the integration code.

Quick Comparison Table

FeatureOpenAI (GPT-4.1)Claude (Opus 4)Gemini (2.5 Pro)
Max context1M tokens200K (1M for Opus 4)1M tokens
Input price (per 1M tokens)$2.00$15.00$1.25 - $2.50
Output price (per 1M tokens)$8.00$75.00$10.00
Cached input price$0.50$1.50$0.315
VisionYesYesYes (+ video)
Function callingYesYes (tool_use)Yes
StreamingYesYesYes
Batch APIYesYes (Message Batches)No
Fine-tuningYesNoYes
Time to first token~300ms~400ms~250ms
Rate limit (free tier)500 RPM50 RPM1500 RPM
Prices as of March 2026. These change frequently — check the pricing pages directly.

Calling Each API — Code Examples

OpenAI

import OpenAI from "openai";

const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});

// Basic completion
const response = await openai.chat.completions.create({
model: "gpt-4.1",
messages: [
{ role: "system", content: "You are a senior TypeScript developer." },
{ role: "user", content: "Write a debounce function with TypeScript generics." },
],
temperature: 0.2,
max_tokens: 1024,
});

console.log(response.choices[0].message.content);

// Streaming
const stream = await openai.chat.completions.create({
model: "gpt-4.1",
messages: [{ role: "user", content: "Explain React Server Components." }],
stream: true,
});

for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

Claude (Anthropic)

import Anthropic from "@anthropic-ai/sdk";

const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});

// Basic completion
const response = await anthropic.messages.create({
model: "claude-opus-4-20250514",
max_tokens: 1024,
system: "You are a senior TypeScript developer.",
messages: [
{ role: "user", content: "Write a debounce function with TypeScript generics." },
],
});

// Claude returns content blocks, not a plain string
const text = response.content
.filter((block) => block.type === "text")
.map((block) => block.text)
.join("");

console.log(text);

// Streaming
const stream = anthropic.messages.stream({
model: "claude-opus-4-20250514",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain React Server Components." }],
});

for await (const event of stream) {
if (event.type === "content_block_delta" && event.delta.type === "text_delta") {
process.stdout.write(event.delta.text);
}
}

Gemini

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });

// Basic completion
const response = await ai.models.generateContent({
model: "gemini-2.5-pro",
contents: "Write a debounce function with TypeScript generics.",
config: {
systemInstruction: "You are a senior TypeScript developer.",
temperature: 0.2,
maxOutputTokens: 1024,
},
});

console.log(response.text);

// Streaming
const streamResponse = await ai.models.generateContentStream({
model: "gemini-2.5-pro",
contents: "Explain React Server Components.",
});

for await (const chunk of streamResponse) {
process.stdout.write(chunk.text ?? "");
}

Function Calling / Tool Use

All three support structured function calling. The implementations differ enough that you'll need model-specific code.

OpenAI Function Calling

const response = await openai.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    { role: "user", content: "What's the weather in Tokyo and London?" },
  ],
  tools: [
    {
      type: "function",
      function: {
        name: "get_weather",
        description: "Get current weather for a city",
        parameters: {
          type: "object",
          properties: {
            city: { type: "string", description: "City name" },
            units: { type: "string", enum: ["celsius", "fahrenheit"] },
          },
          required: ["city"],
        },
      },
    },
  ],
  tool_choice: "auto",
});

// OpenAI may return multiple tool calls in one response
const toolCalls = response.choices[0].message.tool_calls;
for (const call of toolCalls ?? []) {
const args = JSON.parse(call.function.arguments);
console.log(Call ${call.function.name}(${JSON.stringify(args)}));
}

Claude Tool Use

const response = await anthropic.messages.create({
  model: "claude-opus-4-20250514",
  max_tokens: 1024,
  tools: [
    {
      name: "get_weather",
      description: "Get current weather for a city",
      input_schema: {
        type: "object",
        properties: {
          city: { type: "string", description: "City name" },
          units: { type: "string", enum: ["celsius", "fahrenheit"] },
        },
        required: ["city"],
      },
    },
  ],
  messages: [
    { role: "user", content: "What's the weather in Tokyo and London?" },
  ],
});

// Claude returns tool_use content blocks
for (const block of response.content) {
if (block.type === "tool_use") {
console.log(Call ${block.name}(${JSON.stringify(block.input)}));
// block.id is used to send results back
}
}

Gemini Function Calling

const response = await ai.models.generateContent({
  model: "gemini-2.5-pro",
  contents: "What's the weather in Tokyo and London?",
  config: {
    tools: [
      {
        functionDeclarations: [
          {
            name: "get_weather",
            description: "Get current weather for a city",
            parameters: {
              type: "object",
              properties: {
                city: { type: "string", description: "City name" },
                units: { type: "string", enum: ["celsius", "fahrenheit"] },
              },
              required: ["city"],
            },
          },
        ],
      },
    ],
  },
});

// Gemini returns function calls in parts
for (const part of response.candidates?.[0]?.content?.parts ?? []) {
if (part.functionCall) {
console.log(Call ${part.functionCall.name}(${JSON.stringify(part.functionCall.args)}));
}
}

The API shapes are different, but the capability is equivalent. OpenAI's is the most mature. Claude's tool_use blocks integrate naturally with its content block architecture. Gemini's function declarations are verbose but work reliably.

Code Generation Quality

This matters if you're building a coding assistant, autocomplete, or any developer tool. Tested across hundreds of prompts for various projects on codeup.dev, here's what holds up:

OpenAI GPT-4.1: Consistently solid at generating boilerplate and standard patterns. Follows instructions well. Sometimes produces code that looks correct but has subtle logical bugs — always review carefully. Excellent at refactoring existing code. Claude Opus 4: Strongest at understanding complex codebases and multi-file changes. Tends to produce more idiomatic code that a senior dev would actually write. Better at explaining trade-offs in code decisions. Occasionally over-engineers solutions when a simple approach would work. Gemini 2.5 Pro: Surprisingly good at long-context code tasks. If you feed it an entire repo (thanks to the 1M token window), it maintains coherence better than you'd expect. Weaker on niche framework-specific code compared to the other two. Strongest when you give it lots of context.

For pure coding tasks, the ranking shifts depending on what you're doing:

TaskBest ModelNotes
Boilerplate generationGPT-4.1Fast, follows templates well
Complex refactoringClaude Opus 4Best at understanding intent
Whole-repo analysisGemini 2.5 Pro1M context advantage
Bug fixingClaude Opus 4Strong at reasoning about edge cases
API integration codeGPT-4.1Largest training set of API examples
Algorithm implementationTieAll three are competent
CSS/UI generationGPT-4.1Slightly more consistent styling

Vision Capabilities

All three handle image input. Gemini also handles video, which the others don't.

// OpenAI — image input
const response = await openai.chat.completions.create({
  model: "gpt-4.1",
  messages: [
    {
      role: "user",
      content: [
        { type: "text", text: "What UI framework is this screenshot using?" },
        {
          type: "image_url",
          image_url: { url: "data:image/png;base64,..." },
        },
      ],
    },
  ],
});

// Claude — image input
const response = await anthropic.messages.create({
model: "claude-opus-4-20250514",
max_tokens: 1024,
messages: [
{
role: "user",
content: [
{
type: "image",
source: { type: "base64", media_type: "image/png", data: "..." },
},
{ type: "text", text: "What UI framework is this screenshot using?" },
],
},
],
});

// Gemini — image input
const response = await ai.models.generateContent({
model: "gemini-2.5-pro",
contents: [
{
parts: [
{ inlineData: { mimeType: "image/png", data: "..." } },
{ text: "What UI framework is this screenshot using?" },
],
},
],
});

For screenshot-to-code tasks, all three work. Claude is marginally better at understanding complex UI layouts. Gemini's video understanding opens unique use cases (analyzing screen recordings, tutorial videos).

Pricing Deep Dive

Cost calculation for a real scenario — a customer support chatbot processing 10,000 conversations/day, averaging 2,000 input tokens and 500 output tokens per conversation:

Daily token usage: 20M input + 5M output
ProviderModelDaily CostMonthly Cost
OpenAIGPT-4.1$80$2,400
AnthropicClaude Sonnet 4$90$2,700
AnthropicClaude Opus 4$675$20,250
GoogleGemini 2.5 Pro$75$2,250
OpenAIGPT-4.1 mini$12$360
GoogleGemini 2.5 Flash$6$180
The cost gap between flagship and mid-tier models is massive. For most production use cases, the mid-tier models (GPT-4.1 mini, Claude Sonnet, Gemini Flash) deliver 90% of the capability at 10-20% of the cost. Reserve the flagship models for tasks where quality genuinely matters more than cost — code review, complex analysis, creative work. Caching matters. If your prompts share a long system prompt or repeated context, all three providers offer prompt caching at steep discounts (50-75% off input pricing). Architect your prompts with caching in mind.

Rate Limits and Reliability

ProviderFree Tier RPMTier 1 RPMTier 5 RPM
OpenAI5003,50010,000
Anthropic501,0004,000
Google1,5002,000
Anthropic's rate limits are the most restrictive, especially on the free tier. If you're building a high-throughput application, budget for a higher tier from day one.

In terms of reliability, all three have occasional outages. OpenAI has the most users and the most visible outages. Anthropic tends to be more stable but has had capacity issues during peak demand. Google's infrastructure means Gemini rarely goes fully down, but latency spikes happen.

Structured Output

Need the model to return valid JSON every time? All three support it, with different approaches.

// OpenAI — response_format with JSON schema
const response = await openai.chat.completions.create({
  model: "gpt-4.1",
  messages: [{ role: "user", content: "List 3 programming languages with pros and cons" }],
  response_format: {
    type: "json_schema",
    json_schema: {
      name: "languages",
      schema: {
        type: "object",
        properties: {
          languages: {
            type: "array",
            items: {
              type: "object",
              properties: {
                name: { type: "string" },
                pros: { type: "array", items: { type: "string" } },
                cons: { type: "array", items: { type: "string" } },
              },
              required: ["name", "pros", "cons"],
            },
          },
        },
        required: ["languages"],
      },
    },
  },
});

// Claude — tool_use trick for structured output
// Claude doesn't have a dedicated JSON mode, but you can use a tool
// and extract the structured input
const response = await anthropic.messages.create({
model: "claude-opus-4-20250514",
max_tokens: 1024,
tools: [
{
name: "output_languages",
description: "Output the list of languages with pros and cons",
input_schema: {
type: "object",
properties: {
languages: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
pros: { type: "array", items: { type: "string" } },
cons: { type: "array", items: { type: "string" } },
},
required: ["name", "pros", "cons"],
},
},
},
required: ["languages"],
},
},
],
tool_choice: { type: "tool", name: "output_languages" },
messages: [
{ role: "user", content: "List 3 programming languages with pros and cons" },
],
});

// Gemini — responseSchema
const response = await ai.models.generateContent({
model: "gemini-2.5-pro",
contents: "List 3 programming languages with pros and cons",
config: {
responseMimeType: "application/json",
responseSchema: {
type: "object",
properties: {
languages: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
pros: { type: "array", items: { type: "string" } },
cons: { type: "array", items: { type: "string" } },
},
required: ["name", "pros", "cons"],
},
},
},
required: ["languages"],
},
},
});

OpenAI's JSON schema mode is the most reliable — it guarantees valid JSON matching your schema. Gemini's responseSchema works similarly. Claude requires the tool_use workaround, which works consistently but feels like a hack.

Which One Should You Pick?

Use OpenAI if:
  • You need the broadest ecosystem (most tutorials, most libraries, most community support)
  • Fine-tuning is important for your use case
  • You want the most predictable API behavior (they've had the longest to iron out edge cases)
  • You're building on Azure (Azure OpenAI gives you dedicated capacity)
Use Claude if:
  • Code quality is your primary concern (strongest at complex reasoning and coding)
  • You need long, coherent multi-turn conversations
  • Safety and alignment matter for your product (Anthropic leads here)
  • You're building developer tools or code assistants
Use Gemini if:
  • Cost is a primary concern (cheapest for high-volume applications)
  • You need the largest context window at the lowest price
  • Video understanding is part of your use case
  • You're already in the Google Cloud ecosystem
  • You want the most generous free tier for prototyping
The honest answer: for most production applications, start with the cheapest model that meets your quality bar, implement provider-agnostic abstractions in your code, and swap models as pricing and capabilities shift. The AI API landscape changes every quarter. Building your application around a single provider's API shape is a mistake you'll regret.
// Build a simple abstraction layer
interface LLMProvider {
  complete(params: {
    model: string;
    system: string;
    messages: { role: string; content: string }[];
    maxTokens: number;
  }): Promise<string>;
}

// Implement for each provider, swap via config
// This is what we do for all LLM-powered features on codeup.dev

Lock in the abstraction. Let the providers compete on price and quality. Switch when it makes sense.

Ad 728x90