* This blog post is a summary of this video.

Understanding OpenAI Tokens and Pricing for API Usage

Author: DevTalk with FKTime: 2024-01-29 20:55:00

Table of Contents

What Are Tokens in OpenAI Models?

OpenAI uses tokens to process natural language when using their AI models. Tokens are essentially broken down components of words and phrases that allow the models to analyze the context and meaning behind the text. Rather than processing whole words or characters, tokens provide more meaningful semantic value for the models to interpret.

You can think of a token as being similar to about 75% of a word. The OpenAI documentation refers to tokens in this manner. For example, the word 'language' might be broken down into two tokens: 'lang' and 'uage'. This tokenized version retains the core meaning while allowing more efficient processing.

When you make API requests to OpenAI models, both your input prompt and the text response contain a number of tokens. The total token count for this 'round trip' request is what gets counted for billing purposes. So both what you submit and what you get back add to your usage.

Tokenizer Tool Example

OpenAI provides a handy tokenizer tool that lets you see how text gets broken down. For example, inputting the phrase 'The quick brown fox jumped over the lazy dogs' results in 45 characters and 9 tokens. In this case each word is essentially its own token. Trying a longer phrase better demonstrates how some words get split into multiple tokens. Inputting 'OpenAI generative models are not just limited to natural language, you can also generate images and transcriptions' results in some words like 'OpenAI' and 'language' being multi-token. So a token retains meaning but is not always an exact word.

Tokens vs. Words and Characters

As the examples above demonstrate, tokens provide more semantic meaning than individual characters, but do not always precisely correlate to whole words. This makes them a kind of 'goldilocks' abstraction that contains enough meaning for the models to interpret reasonably accurately, while also being more efficient for processing than full words or characters.

Token Limits Per OpenAI Model

In addition to pricing, tokens also come into play in terms of model scale limits. Each OpenAI model can only process up to a certain max token count per request. Going over this amount will result in truncated or cut-off responses.

For example, the base GPT-3 model tops out at 8,192 tokens per request. So if you tried to have it generate multiple pages of text in one go, it would not be able to complete the task. More capable models like the GPT-3 32k support up to 32,000 tokens per request before hitting their scale ceiling.

The GPT-3.5 Turbo model that we will use in this course allows up to 4,096 tokens per round trip. So you can comfortably submit a few paragraphs of text and get back a reasonable length response without needing to worry about truncation issues.

How Tokens Are Used to Calculate OpenAI Pricing

In addition to limiting model scale, tokens also form the basis for how OpenAI calculates pricing for API usage. Essentially, you get charged per 1,000 tokens used in requests and responses for a given model.

The exact per 1000 token pricing varies a lot based on model capabilities. At the high end, GPT-4 charges $0.06 per 1000 prompt tokens and $0.12 per 1000 completion tokens. This adds up quickly for models that can handle tens of thousands of tokens!

Fortunately, the GPT-3.5 Turbo we will use charges only $0.002 per 1000 round trip tokens. This enables us to get strong results at a very affordable rate compared to other options.

Pricing Comparison of Different OpenAI Models

To see the difference in pricing, here is a breakdown of some of the popular OpenAI models:

  • GPT-4: $0.03 per 1000 prompt tokens, $0.06 per 1000 completion tokens
  • GPT-3 Davinci: $0.02 per 1000 prompt tokens, $0.04 per 1000 completion tokens
  • GPT-3 Curie: $0.002 per 1000 prompt tokens, $0.004 per 1000 completion tokens
  • GPT-3 Ada: $0.0004 per 1000 prompt tokens, $0.0008 per 1000 completion tokens

As you can see, GPT-3 Ada is the most affordable but lower capability, while GPT-4 provides the most advanced results at a premium price. GPT-3.5 Turbo strikes a nice balance for most use cases.

GPT-3.5 Turbo Pricing for This Course

Since we will be leveraging GPT-3.5 Turbo through Claude, we get reliable results for only $0.002 per 1000 roundtrip tokens:

  • 1000 tokens for $0.002
  • 10,000 tokens for $0.02
  • 100,000 tokens for $0.20

This very reasonable pricing means we can prompt Claude frequently without needing to worry much about token usage. It makes Claude a very accessible way to benefit from AI assistace!

Conclusion

In summary, tokens provide the basis for OpenAI models to process natural language requests and responses. Tokens break input text down into semantically meaningful components that power the model logic.

Tokens also limit the scale of what requests a given model can handle, and determine pricing based on usage totals. Understanding these token concepts helps explain both the capabilities and costs associated with leveraging different OpenAI models.

FAQ

Q: What exactly is an OpenAI token?
A: An OpenAI token is like 75% of a word that is used by AI models to process natural language input and output. It breaks down text into small chunks.

Q: How do tokens relate to OpenAI pricing?
A: The number of tokens in your API requests and responses is used to calculate pricing. More tokens used means higher cost.

Q: What is the max tokens per request?
A: Each OpenAI model has a limit on max tokens per request, such as 4,096 for GPT-3. Exceeding this will cut off responses.

Q: How much does the GPT-3.5 Turbo cost?
A: The GPT-3.5 Turbo used in this course costs just $0.002 per 1,000 tokens, making it very affordable.

Q: Can tokens be counted before sending a request?
A: Yes, OpenAI provides a handy tokenizer tool to estimate tokens before sending API requests.

Q: Do requests and responses have separate token counts?
A: Yes, tokens are counted separately for both the prompt/request and the response from the API.

Q: What OpenAI models have the highest token limits?
A: Models like GPT-4 can process up to 88,000 tokens per request, over 20x more than GPT-3.

Q: How can I reduce OpenAI costs?
A: Using lower token models like GPT-3 Turbo, summarizing prompts, and reusing contexts across requests can save on tokens.

Q: Does punctuation and whitespace count towards tokens?
A: No, tokens mainly relate to words and parts of words that convey semantic meaning.

Q: Are tokens universal across all languages?
A: No, other languages like Chinese may tokenize text differently than English tokenization.