LLM Pricing

Discover the best LLM API models for your budget with our free comparison tool. Quick, up-to-date pricing from top providers at your fingertips!

Last updated: November 7, 2024
Source
OpenAI
gpt-4-32k
32K
$60
$120
March 16, 2024
OpenAI
gpt-4
8K
$30
$60
March 16, 2024
OpenAI
gpt-4o
128K
$5
$15
May 16, 2024
OpenAI
gpt-4o-2024-08-06
128K
$2.5
$10
August 16, 2024
OpenAI
gpt-4o-mini
128K
$0.15
$0.6
July 19, 2024
OpenAI
gpt-4o-mini-2024-07-18
128K
$0.15
$0.6
July 19, 2024
OpenAI
o1-preview
128K
$15
$60
September 12, 2024
OpenAI
o1-preview-2024-09-12
128K
$15
$60
September 12, 2024
OpenAI
o1-mini
128K
$3
$12
September 12, 2024
OpenAI
o1-mini-2024-09-12
128K
$3
$12
September 12, 2024
OpenAI
gpt-4-turbo-2024-04-09
128K
$10
$30
April 11, 2024
OpenAI
gpt-4-0125-preview
128K
$10
$30
March 16, 2024
OpenAI
gpt-4-1106-preview
128K
$10
$30
March 16, 2024
OpenAI
gpt-4-vision-preview
128K
$10
$30
March 16, 2024
OpenAI
gpt-3.5-turbo-0125
16K
$0.5
$1.5
March 16, 2024
OpenAI
gpt-3.5-turbo-instruct
4K
$1.5
$2
March 16, 2024
OpenAI
gpt-3.5-turbo-1106
4K
$1
$2
March 16, 2024
OpenAI
gpt-3.5-turbo-0613
4K
$1.5
$2
March 16, 2024
OpenAI
gpt-3.5-turbo-16k-0613
4K
$3
$4
March 16, 2024
OpenAI
gpt-3.5-turbo-0301
4K
$1.5
$2
March 16, 2024
Azure
gpt-4-32k
32K
$60
$120
March 16, 2024
Azure
gpt-4
8K
$30
$60
March 16, 2024
Azure
gpt-4-turbo
128K
$10
$30
March 16, 2024
Azure
gpt-4-turbo-vision
128K
$10
$30
March 16, 2024
Azure
gpt-3.5-turbo-0125
16K
$0.5
$1.5
March 16, 2024
Azure
gpt-3.5-turbo-instruct
4K
$1.5
$2
March 16, 2024
Anthropic
claude-3.5-sonnet
200K
$3
$15
June 26, 2024
Anthropic
claude-3-opus
200K
$15
$75
March 16, 2024
Anthropic
claude-3-sonnet
200K
$3
$15
March 16, 2024
Anthropic
claude-3-haiku
200K
$0.25
$1.25
March 16, 2024
Anthropic
claude-2.1
200K
$8
$24
March 16, 2024
Anthropic
claude-2.0
100K
$8
$24
March 16, 2024
Anthropic
claude-instant-1.2
100K
$0.8
$2.4
March 16, 2024
AWS
jurassic-2-ultra
32K
$18.8
$18.8
March 16, 2024
AWS
jurassic-2-mid
32K
$12.5
$12.5
March 16, 2024
AWS
titan-text-lite
32K
$0.3
$0.4
March 16, 2024
AWS
titan-text-express
32K
$0.8
$1.6
March 16, 2024
AWS
claude-instant
32K
$0.8
$2.4
March 16, 2024
AWS
claude-2.0/2.1
32K
$8
$24
March 16, 2024
AWS
claude-3-sonnet
32K
$3
$15
March 16, 2024
AWS
claude-3-haiku
32K
$0.25
$1.25
March 16, 2024
AWS
command
32K
$1.5
$2
March 16, 2024
AWS
command-light
32K
$0.3
$0.6
March 16, 2024
AWS
llama-2-chat-13B
32K
$0.75
$1
March 16, 2024
AWS
llama-2-chat-70B
32K
$1.95
$2.56
March 16, 2024
AWS
mistral-7b
32K
$0.15
$0.2
March 16, 2024
AWS
mistral-8x7b
32K
$0.45
$0.7
March 16, 2024
Google
gemini-1.0-pro
32K
$0.5
$1.5
September 16, 2024
Google
gemini-1.5-pro
128K
$1.25
$5
October 4, 2024
Google
gemini-1.5-pro
2M
$2.5
$10
October 4, 2024
Google
gemini-1.5-flash
128K
$0.08
$0.3
August 11, 2024
Google
gemini-1.5-flash
1M
$0.15
$0.6
October 11, 2024
Google
gemini-1.5-flash-8B
128K
$0.04
$0.15
October 11, 2024
Google
gemini-1.5-flash-8B
1M
$0.08
$0.3
October 11, 2024
Google
palm-2-for-chat
8K
$0.25
$0.5
March 16, 2024
Google
palm-2-for-chat-32k
32K
$0.25
$0.5
March 16, 2024
Google
palm-2-for-text
8K
$2.5
$7.5
March 16, 2024
Google
palm-2-for-text-32k
32K
$2.5
$5
March 16, 2024
Mistral
mistral-large
32K
$8
$24
March 16, 2024
Mistral
mistral-medium
32K
$2.7
$8.1
March 16, 2024
Mistral
mistral-small
32K
$2
$6
March 16, 2024
Mistral
mixtral-8x7b
32K
$0.7
$0.7
March 16, 2024
Mistral
mixtral-8x22b
64K
$2
$6
April 19, 2024
Mistral
mistral-7b
32K
$0.25
$0.25
March 16, 2024
Cohere
command-r-plus
128K
$3
$15
April 9, 2024
Cohere
command-r
4K
$0.5
$1.5
March 16, 2024
Cohere
command-light
4K
$0.3
$0.6
March 16, 2024
Cohere
command-light-fine-tuned
4K
$0.3
$0.6
March 16, 2024
Groq
llama-2-70b
4K
$0.7
$0.8
March 16, 2024
Groq
llama-2-7b
2K
$0.1
$0.1
March 16, 2024
Groq
mixtral-8x7b
32K
$0.27
$0.27
March 16, 2024
Groq
gemma-7b
8K
$0.1
$0.1
March 16, 2024
Databricks
DBRX
32K
$2.25
$6.75
April 1, 2024
Databricks
llama-2-70b
4K
$2
$6
April 1, 2024
Databricks
mixtral-8x7b
32K
$1.5
$1.5
April 1, 2024
Databricks
mpt-30b
32K
$1
$1
April 1, 2024
Databricks
mpt-30b
8K
$1
$1
April 1, 2024
Databricks
llama-2-13b
4K
$0.95
$0.95
April 1, 2024
Databricks
mpt-7b
8K
$0.5
$0.5
April 1, 2024
Databricks
mpt-7b
512
$0.5
$0.5
April 1, 2024
Cloudflare
llama-2-7b-chat-fp16
2K
$0.56
$6.66
April 19, 2024
Cloudflare
llama-2-7b-chat-int8
2K
$0.16
$0.24
April 19, 2024
Cloudflare
mistral-7b-instruct
32K
$0.11
$0.19
April 19, 2024

LLM Pricing: Quick Overview

Hey there! Let's dive into the fascinating world of AI and the different flavors of Large Language Models (LLMs) offered by the big players like OpenAI, Anthropic, Google, Cohere, and Meta. If you're thinking about incorporating these brainy bots into your projects, getting a handle on their pricing is pretty essential. So, let's break it down, shall we?

The Lowdown on Tokens

First off, the pricing for these AI wonders usually revolves around something called "tokens." Imagine a token as a tiny slice of a word. To put it in perspective, 1,000 tokens are roughly equivalent to about 750 words. For example, the sentence "This paragraph is 5 tokens" counts as 5 tokens itself.

A handy rule of thumb is that in English, a token is about four characters long, which works out to roughly three-quarters of a word. If you're working with languages other than English, like Japanese, the math changes a bit.

What's the Deal with Context Length?

When we talk about LLMs, especially those from OpenAI, you'll often hear about "context length." This is a key concept because it affects how well the model performs, what it can do, and, yep, how much it costs.

So, What Exactly is Context Length?

Think of context length as the model's short-term memory for the task at hand. It's the amount of info (or number of tokens) the model can juggle at any given moment. Say a model has a context length of 8,000 tokens; it means it can consider up to 8,000 tokens from what you feed it in one go.

Why Should You Care About Context Length?

  • Task Complexity: Bigger context lengths let the model tackle more complex stuff, like summarizing a long read or digging into detailed documents.
  • Smooth Conversations: For chatbots, a longer context means the model can remember more of the chat, leading to replies that make more sense and are more on point.
  • Price Tag: Generally, the longer the context length, the pricier the model because it needs more computing oomph.

Different Models for Different Needs

The big names in AI have cooked up a variety of models, each with its own strengths and price points, and they usually charge per 1,000 tokens.

  • OpenAI GPT-4: This one's a bit of a know-it-all, great at following complex instructions and solving tough problems. It's pricier and not the fastest kid on the block. The new GPT-4 Turbo version, though, is three times cheaper and can handle a whopping 128K tokens at once! Also, you can access it through Microsoft's Azure OpenAI Service.

  • OpenAI GPT-3.5 Turbo: Optimized for chit-chat, making it a go-to for chatbots and conversational interfaces. It's speedy and won't break the bank. Available through Microsoft's Azure OpenAI Service too.

  • Anthropic's Claude 3: Known for its impressive 200k token context length, making it a champ at summarizing or handling Q&As on hefty documents. The trade-off? It's on the slower and pricier side.

  • Llama 2: Meta's gift to the world, Llama 2 is an open-source model that's pretty much on par with GPT-3.5 Turbo in performance and can even give GPT-4 a run for its money in English text summarization—at 30x less cost! The catch? It's English-only.

  • Gemini: Google's latest, split into Gemini Ultra, Gemini Pro, and Gemini Nano, announced on December 6, 2023. Gemini Ultra is eyeing the throne currently held by OpenAI's GPT-4, while Gemini Pro is more akin to GPT-3.5 in terms of performance.

  • PaLM 2: An older model from Google that shines in multilingual, reasoning, and coding tasks. Trained on texts in over 100 languages, it's a whiz at navigating complex language nuances and boasts impressive logic and coding skills.

  • Mistral: A newcomer on the scene, Mistral AI has released some nifty open-source models that are both fast and affordable. Mistral 7B and Mistral 8x7B (Mixtral) are standout options, offering performance comparable to GPT-3.5 Turbo at 2.5x less cost. Mistral Large, though private, is showing promise in reasoning tasks across several languages.

  • DBRX: General-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs. Moreover, it provides the open community and enterprises building their own LLMs with capabilities that were previously limited to closed model APIs; according to the measurements, it surpasses GPT-3.5, and it is competitive with Gemini 1.0 Pro. It is an especially capable code model, surpassing specialized models like CodeLLaMA-70B on programming, in addition to its strength as a general-purpose LLM.

And there you have it—a whirlwind tour of the LLM pricing landscape. Whether you're building the next great app or just dabbling in AI, there's a model out there that fits the bill. Happy coding!