AI API Cost Calculator

Compare exact API costs across 20+ AI models β€” GPT-4o, Claude 3.5, Gemini 2.0, DeepSeek, Grok, Llama and more. Enter tokens or words, see costs instantly.

πŸ’‘ Prices updated June 2026  β€’  All processing in your browser  β€’  No signup needed
Input (prompt) tokens
Output (completion) tokens
Number of API calls
Period
All Models β€” Cost Comparison
Model Input price Output price Input cost Output cost Total / call Total (Γ—calls)

How to Use the AI API Cost Calculator

  1. Enter your token counts β€” Input the number of input tokens (your prompt) and output tokens (the AI's response). Switch to Words or Characters if you prefer β€” the calculator converts automatically (1 word β‰ˆ 1.3 tokens, 1 token β‰ˆ 4 characters).
  2. Set your API call volume β€” Enter how many API calls you make per day or month. The "Per 1,000 calls" default shows your cost at scale, which is usually what matters for budgeting.
  3. Filter by provider β€” Click OpenAI, Anthropic, Google, or DeepSeek to focus the comparison. Click "All Models" to see the full picture.
  4. Read the results β€” The table shows input cost, output cost, total per call, and total for your full call volume. The cheapest model gets a βœ… badge. Sort by price or name using the dropdown.
  5. Use the summary cards β€” See the cheapest option, most expensive, average cost, and potential savings at a glance before diving into the table.
πŸ’‘ Key insight: Output tokens are typically 2–5x more expensive than input tokens across all providers. This means a prompt that generates a long response costs significantly more than one that generates a short answer. Optimizing your output length (using "be concise" in your system prompt) can cut costs by 40–60%.

Understanding AI API Pricing in 2026

πŸͺ™ What Is a Token?
A token is roughly 4 characters or ΒΎ of a word in English. "Hello world" = 2 tokens. A typical paragraph = 75–100 tokens. A full page of text β‰ˆ 500 tokens. All AI models price their APIs per 1 million tokens (MTok) β€” so a $3/MTok model costs $0.000003 per token, or $0.003 for 1,000 tokens.
πŸ“Š Input vs Output Pricing
Every model charges separately for input (your prompt) and output (the AI response). Output is always more expensive β€” typically 3–5x the input price. This is because generating tokens requires more compute than reading them. GPT-4o charges $2.50/MTok input but $10/MTok output β€” a 4x difference.
πŸ† Cheapest Models 2026
DeepSeek V3 at $0.27/MTok input is among the cheapest frontier models. Gemini 2.0 Flash at $0.10/MTok input is Google's budget option. Llama 3.3 (self-hosted via Groq/Together) can be near-free at scale. GPT-4o Mini at $0.15/MTok is OpenAI's budget tier. For simple tasks, these often match GPT-4o quality at 10x lower cost.
πŸ’° How to Reduce API Costs
1) Shorten system prompts β€” every character costs money on every call. 2) Use prompt caching (Claude, GPT-4o) β€” cached input tokens cost 90% less. 3) Choose the right model tier β€” GPT-4o Mini vs GPT-4o for simple tasks. 4) Compress context β€” summarize conversation history instead of passing full history. 5) Set max_tokens limits to control output length.
πŸ”„ Prompt Caching
Claude and GPT-4o both support prompt caching β€” if you send the same system prompt repeatedly, cached tokens cost up to 90% less. Anthropic charges $0.30/MTok for cached Claude reads vs $3.00/MTok uncached. For apps with consistent system prompts, caching alone can cut costs by 50–80%.
πŸ“ˆ Cost at Scale
100 users Γ— 10 messages/day Γ— 500 tokens/message = 500,000 tokens/day = 15M tokens/month. At GPT-4o ($2.50 input + $10 output): ~$56/month assuming 60% output. At DeepSeek V3 ($0.27 input + $1.10 output): ~$7/month β€” 87% cheaper. At this scale, model choice matters enormously for unit economics.

AI Model Pricing Comparison β€” Full Breakdown

OpenAI GPT Models

OpenAI remains the most-used AI API in 2026. GPT-4o is the flagship model at $2.50/MTok input and $10/MTok output β€” suitable for complex reasoning, vision, and code generation. GPT-4o Mini at $0.15/$0.60 is the budget tier, ideal for classification, summarization, and simple Q&A. The o3 reasoning model at $10/$40 per MTok is reserved for complex multi-step reasoning tasks where accuracy matters more than cost. OpenAI's Batch API offers 50% discounts for non-realtime workloads.

Anthropic Claude Models

Claude 3.5 Sonnet ($3/$15 per MTok) is widely considered the best model for coding and writing in 2026. Claude 3.5 Haiku ($0.80/$4) is the budget option with strong performance on structured tasks. Claude 3 Opus ($15/$75) is the premium reasoning model. Anthropic's prompt caching cuts cached input costs to $0.30/MTok for Sonnet β€” a major advantage for applications with fixed system prompts. Claude has a 200K token context window across all models.

Google Gemini Models

Gemini 2.0 Flash ($0.10/$0.40 per MTok) is Google's most cost-effective frontier model and one of the cheapest available in 2026. Gemini 2.0 Pro ($1.25/$5) is the premium tier. Gemini has a 1M token context window β€” the largest of any major provider β€” making it ideal for document analysis and long-context tasks. Google offers a free tier with rate limits for development and testing.

DeepSeek Models

DeepSeek V3 at $0.27/$1.10 per MTok caused a major market disruption in early 2025 by matching GPT-4 quality at a fraction of the cost. DeepSeek R1 ($0.55/$2.19) is the reasoning model. Both are open-source and can be self-hosted for near-zero marginal cost at scale. For cost-sensitive production workloads where data privacy allows, DeepSeek often delivers the best price-performance ratio available.

πŸ”— Related tools: Use our AI Token Counter to count tokens in your prompts before calculating costs. See our AI Model Database for full specs including context windows, capabilities, and benchmarks.

Frequently Asked Questions

Which AI API is cheapest in 2026?
DeepSeek V3 and Gemini 2.0 Flash are among the cheapest frontier models in 2026 β€” both under $0.30/MTok for input tokens. For self-hosting, Llama 3.3 via providers like Groq or Together AI can cost even less at scale. GPT-4o Mini ($0.15/MTok) is the most affordable OpenAI option. The right choice depends on your quality requirements and data privacy needs.
How do I calculate AI API costs for my app?
Estimate your average prompt length (input tokens) + average response length (output tokens), then multiply by your expected daily API calls. Use this calculator to compare costs across models. A typical chatbot message is 150–300 input tokens and 100–250 output tokens. At 1,000 calls/day with GPT-4o, that's roughly $1–3/day or $30–90/month.
How many words is 1,000 tokens?
Approximately 750 words in English, or roughly 1.5 pages of standard text. The exact ratio depends on the language and content type β€” code tends to use more tokens per word than plain prose. Technical content with special characters can use up to 2x more tokens per word than simple English text.
What is prompt caching and how much does it save?
Prompt caching lets you reuse repeated portions of your prompt (like a system prompt) at a heavily discounted rate. Anthropic charges $0.30/MTok for cached Claude 3.5 Sonnet reads vs $3.00/MTok uncached β€” a 90% discount. OpenAI offers similar caching for GPT-4o. If your application sends the same system prompt on every call, caching can reduce your total costs by 40–80% depending on the ratio of system prompt to user message length.
Is Claude or GPT-4o cheaper?
For standard usage, GPT-4o ($2.50/$10 per MTok) is slightly cheaper than Claude 3.5 Sonnet ($3/$15 per MTok). However, with prompt caching, Claude can become significantly cheaper if you have a long system prompt. Claude 3.5 Haiku ($0.80/$4) is the most affordable Anthropic option and often competes with GPT-4o Mini in price.
Can I use this calculator for my monthly API budget?
Yes β€” set the "Number of API calls" to your expected monthly volume and select "Per 30,000 calls/month" (or enter your actual number). The "Total (Γ—calls)" column shows your projected monthly cost per model. This helps you budget before choosing a provider and can reveal whether switching models would save meaningful money at your scale.