Generative models break down data into units called tokens for processing. Each model has a maximum number of tokens that it can handle in a prompt and response.
This page shows you how to use the Count Tokens API to get an estimate of token count and the number of billable characters for a request to a Gemini model. There isn't an API for getting the estimate of tokens in a response.
Note that the Count Tokens API cannot be used for Imagen models.
What information is provided in the count?
Note the following about counting tokens and billable characters:
Counting the total tokens
This count is helpful to make sure your requests don't go over the allowable context window.
The token count will reflect the size of all files (for example, images) that are provided as part of the request input. It will not count the number of images or the number of seconds in a video.
For all Gemini models, a token is equivalent to about 4 characters. 100 tokens are about 60-80 English words.
Counting the total billable characters
This count is helpful for understanding and controlling your costs, since for Vertex AI, number of characters is part of the pricing calculation.
The billable character count will reflect the number of characters in the text that's provided as part of the request input.
For older Gemini models, tokens are not part of the pricing calculation; however, for Gemini 2.0 models, tokens are used in the pricing calculation. Learn more about token limits per model and pricing per model.
Pricing and quota for counting tokens and billable characters
There's no charge or quota restriction for using the CountTokens
API. The
maximum quota for the CountTokens
API is 3000 requests per minute (RPM).