LLM API Pricing

LLM Pricing

Discover the best LLM API models for your budget with our free comparison tool. Quick, up-to-date pricing from top providers at your fingertips!

Last updated: July 19, 2025


OpenAI	gpt-4-32k	32K	$60	$120	March 20, 2025
OpenAI	gpt-4	8K	$30	$60	March 20, 2025
OpenAI	gpt-4.5	128K	$75	$150	March 20, 2025
OpenAI	gpt-4o	128K	$2.5	$10	March 20, 2025
OpenAI	gpt-4o-mini	128K	$0.15	$0.6	March 20, 2025
OpenAI	o1	200K	$15	$60	March 20, 2025
OpenAI	o3-mini	200K	$1.1	$4.4	March 20, 2025
OpenAI	o1-mini	128K	$3	$12	March 20, 2025
OpenAI	gpt-4-turbo	128K	$10	$30	March 20, 2025
OpenAI	gpt-3.5-turbo	16K	$0.5	$1.5	March 20, 2025
OpenAI	gpt-3.5-turbo-instruct	4K	$1.5	$2	March 20, 2025
OpenAI	gpt-3.5-turbo-16k-0613	4K	$3	$4	March 20, 2025
Azure	gpt-4-32k	32K	$60	$120	March 16, 2024
Azure	gpt-4	8K	$30	$60	March 16, 2024
Azure	gpt-4-turbo	128K	$10	$30	March 16, 2024
Azure	gpt-4-turbo-vision	128K	$10	$30	March 16, 2024
Azure	gpt-3.5-turbo-0125	16K	$0.5	$1.5	March 16, 2024
Azure	gpt-3.5-turbo-instruct	4K	$1.5	$2	March 16, 2024
Anthropic	claude-3.7-sonnet	200K	$3	$15	March 20, 2025
Anthropic	claude-3.5-sonnet	200K	$3	$15	December 20, 2024
Anthropic	claude-3.5-haiku	200K	$0.8	$4	December 20, 2024
Anthropic	claude-3-opus	200K	$15	$75	December 20, 2024
Anthropic	claude-3-sonnet	200K	$3	$15	December 20, 2024
Anthropic	claude-3-haiku	200K	$0.25	$1.25	December 20, 2024
AWS	jurassic-2-ultra	32K	$18.8	$18.8	March 16, 2024
AWS	jurassic-2-mid	32K	$12.5	$12.5	March 16, 2024
AWS	titan-text-lite	32K	$0.3	$0.4	March 16, 2024
AWS	titan-text-express	32K	$0.8	$1.6	March 16, 2024
AWS	claude-instant	32K	$0.8	$2.4	March 16, 2024
AWS	claude-2.0/2.1	32K	$8	$24	March 16, 2024
AWS	claude-3-sonnet	32K	$3	$15	March 16, 2024
AWS	claude-3-haiku	32K	$0.25	$1.25	March 16, 2024
AWS	command	32K	$1.5	$2	March 16, 2024
AWS	command-light	32K	$0.3	$0.6	March 16, 2024
AWS	llama-2-chat-13B	32K	$0.75	$1	March 16, 2024
AWS	llama-2-chat-70B	32K	$1.95	$2.56	March 16, 2024
AWS	mistral-7b	32K	$0.15	$0.2	March 16, 2024
AWS	mistral-8x7b	32K	$0.45	$0.7	March 16, 2024
Google	gemini-2.0-flash	1M	$0.1	$0.4	March 20, 2025
Google	gemini-2.0-flash-lite	1M	$0.08	$0.3	March 20, 2025
Google	gemini-1.5-pro	128K	$1.25	$5	March 20, 2025
Google	gemini-1.5-pro	2M	$2.5	$10	March 20, 2025
Google	gemini-1.5-flash	128K	$0.08	$0.3	March 20, 2025
Google	gemini-1.5-flash	1M	$0.15	$0.6	March 20, 2025
Google	gemini-1.5-flash-8B	128K	$0.04	$0.15	March 20, 2025
Google	gemini-1.5-flash-8B	1M	$0.08	$0.3	March 20, 2025
Mistral	mistral-large-latest	32K	$2	$6	March 20, 2025
Mistral	mistral-small-latest	32K	$0.1	$0.3	March 20, 2025
Mistral	codestral-latest	32K	$0.3	$0.9	March 20, 2025
Mistral	ministral-8b-latest	32K	$0.1	$0.1	March 20, 2025
Mistral	ministral-3b-latest	32K	$0.04	$0.04	March 20, 2025
Cohere	command-r-plus	128K	$2.5	$10	March 20, 2025
Cohere	command-r	4K	$0.15	$0.6	March 20, 2025
Cohere	command-r-fine-tuned	4K	$0.3	$1.2	March 20, 2025
Cohere	command-r-7b	4K	$0.04	$0.15	March 20, 2025
Groq	deepseek-r1-distill-llama-70b	128K	$0.75	$0.99	March 20, 2025
Groq	deepseek-r1-distill-qwen-32b	128K	$0.69	$0.69	March 20, 2025
Groq	qwen-2.5-instruct-32b	128K	$0.79	$0.79	March 20, 2025
Groq	qwen-2.5-coder-instruct-32b	128K	$0.79	$0.79	March 20, 2025
Groq	llama-3.2-1b	8K	$0.04	$0.04	March 20, 2025
Groq	llama-3.2-3b	8K	$0.06	$0.06	March 20, 2025
Groq	llama-3.3-70b-versatile	128K	$0.59	$0.79	March 20, 2025
Groq	llama-3-70b	8K	$0.59	$0.79	March 20, 2025
Groq	llama-3-8b	8K	$0.05	$0.08	March 20, 2025
Groq	mixtral-8x7b-instruct	32K	$0.24	$0.24	March 20, 2025
Groq	gemma-2-9b	8K	$0.2	$0.2	March 20, 2025
Groq	gemma-2-9b	8K	$0.2	$0.2	March 20, 2025
Databricks	DBRX	32K	$2.25	$6.75	April 1, 2024
Databricks	llama-2-70b	4K	$2	$6	April 1, 2024
Databricks	mixtral-8x7b	32K	$1.5	$1.5	April 1, 2024
Databricks	mpt-30b	32K	$1	$1	April 1, 2024
Databricks	mpt-30b	8K	$1	$1	April 1, 2024
Databricks	llama-2-13b	4K	$0.95	$0.95	April 1, 2024
Databricks	mpt-7b	8K	$0.5	$0.5	April 1, 2024
Databricks	mpt-7b	512	$0.5	$0.5	April 1, 2024
Cloudflare	llama-2-7b-chat-fp16	2K	$0.56	$6.66	April 19, 2024
Cloudflare	llama-2-7b-chat-int8	2K	$0.16	$0.24	April 19, 2024
Cloudflare	mistral-7b-instruct	32K	$0.11	$0.19	April 19, 2024
Grok	grok-2-1212	128K	$2	$10	December 20, 2024
Grok	grok-2-vision-1212	8K	$2	$10	December 20, 2024

LLM Pricing: Quick Overview

Hey there! Let's dive into the fascinating world of AI and the different flavors of Large Language Models (LLMs) offered by the big players like OpenAI, Anthropic, Google, Cohere, and Meta. If you're thinking about incorporating these brainy bots into your projects, getting a handle on their pricing is pretty essential. So, let's break it down, shall we?

The Lowdown on Tokens

First off, the pricing for these AI wonders usually revolves around something called "tokens." Imagine a token as a tiny slice of a word. To put it in perspective, 1,000 tokens are roughly equivalent to about 750 words. For example, the sentence "This paragraph is 5 tokens" counts as 5 tokens itself.

A handy rule of thumb is that in English, a token is about four characters long, which works out to roughly three-quarters of a word. If you're working with languages other than English, like Japanese, the math changes a bit.

What's the Deal with Context Length?

When we talk about LLMs, especially those from OpenAI, you'll often hear about "context length." This is a key concept because it affects how well the model performs, what it can do, and, yep, how much it costs.

So, What Exactly is Context Length?

Think of context length as the model's short-term memory for the task at hand. It's the amount of info (or number of tokens) the model can juggle at any given moment. Say a model has a context length of 8,000 tokens; it means it can consider up to 8,000 tokens from what you feed it in one go.

Why Should You Care About Context Length?

Task Complexity: Bigger context lengths let the model tackle more complex stuff, like summarizing a long read or digging into detailed documents.
Smooth Conversations: For chatbots, a longer context means the model can remember more of the chat, leading to replies that make more sense and are more on point.
Price Tag: Generally, the longer the context length, the pricier the model because it needs more computing oomph.

Different Models for Different Needs

The big names in AI have cooked up a variety of models, each with its own strengths and price points, and they usually charge per 1,000,000 tokens.

OpenAI GPT-4: This one's a bit of a know-it-all, great at following complex instructions and solving tough problems. It's pricier and not the fastest kid on the block. The new GPT-4 Turbo version, though, is three times cheaper and can handle a whopping 128K tokens at once! Also, you can access it through Microsoft's Azure OpenAI Service.
OpenAI GPT-3.5 Turbo: Optimized for chit-chat, making it a go-to for chatbots and conversational interfaces. It's speedy and won't break the bank. Available through Microsoft's Azure OpenAI Service too.
Anthropic's Claude 3: Known for its impressive 200k token context length, making it a champ at summarizing or handling Q&As on hefty documents. The trade-off? It's on the slower and pricier side.
Llama 2: Meta's gift to the world, Llama 2 is an open-source model that's pretty much on par with GPT-3.5 Turbo in performance and can even give GPT-4 a run for its money in English text summarization—at 30x less cost! The catch? It's English-only.
Gemini: Google's latest, split into Gemini Ultra, Gemini Pro, and Gemini Nano, announced on December 6, 2023. Gemini Ultra is eyeing the throne currently held by OpenAI's GPT-4, while Gemini Pro is more akin to GPT-3.5 in terms of performance.
PaLM 2: An older model from Google that shines in multilingual, reasoning, and coding tasks. Trained on texts in over 100 languages, it's a whiz at navigating complex language nuances and boasts impressive logic and coding skills.
Mistral: A newcomer on the scene, Mistral AI has released some nifty open-source models that are both fast and affordable. Mistral 7B and Mistral 8x7B (Mixtral) are standout options, offering performance comparable to GPT-3.5 Turbo at 2.5x less cost. Mistral Large, though private, is showing promise in reasoning tasks across several languages.
DBRX: General-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to the measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM.

And there you have it—a whirlwind tour of the LLM pricing landscape. Whether you're building the next great app or just dabbling in AI, there's a model out there that fits the bill. Happy coding!