The Gemini family of models are considered multimodal because they are capable of processing information from multiple modalities, including images, videos, and text. For example, you can send a Gemini model a photo of a plate of cookies and ask it to give you a recipe for those cookies.
Prompt | Response |
Give me a recipe for these cookies. |
**INGREDIENTS** - 1 c. (2 sticks) unsalted butter, softened - 3/4 c. granulated sugar - 3/4 c. brown sugar, packed - 1 tsp . vanilla extract - 2 large eggs - 2 1/4 c. all-purpose flour - 1 tsp. baking soda - 1 tsp. salt ... |
You can interact with the Gemini family of models using the Gemini API provided by Vertex AI on Google Cloud. For mobile and web apps, you can use the Vertex AI in Firebase SDKs to call the Gemini API and interact with the Gemini models directly from your app.
This page provides the following information about the Gemini models:
High-level comparison of the use cases for the various Gemini models, including their supported input types.
Comparison of the details for each model, for example maximum input tokens or maximum length of video.
Description of how Gemini models are versioned, specifically their stable, auto-updated, and preview versions.
Lists of available model names to include in your code during initialization.
List of supported languages for the Gemini models.
Available models
You can use any of the following Gemini models with Vertex AI in Firebase:
Gemini 1.5 Flash
Multimodal model that supports the same input and output types as 1.5 Pro, but with a long-context understanding of 1 million tokens. Gemini 1.5 Flash is specifically designed for high-volume, cost-effective applications.Gemini 1.5 Pro
Multimodal model that supports adding image, audio, video, and PDF files in text or chat prompts for a text or code response. Also, it supports long-context understanding with 2 million tokens.Gemini 1.0 Pro Vision
Multimodal model designed to handle text plus images and video for a text or code response. Cannot be used for chat.Gemini 1.0 Pro
Model designed to handle natural language tasks, multiturn chat with text and code, and code generation.
Jump to model names to include in your code
Use cases and capabilities for each model
Each Gemini model has different capabilities to support various use cases. You can learn more in the Google Cloud documentation about each of the Gemini models.
Supported input and output for each model
Gemini 1.5 Flash | Gemini 1.5 Pro | Gemini 1.0 Pro Vision | Gemini 1.0 Pro | |
---|---|---|---|---|
Input types | ||||
Text | ||||
Code | ||||
Image | ||||
Video (frames only) | ||||
Video (frames and audio) | ||||
Audio | ||||
Output types | ||||
Text | ||||
Structured output (like JSON) using response schema | ||||
Code |
To learn about supported file types, see Supported input files and requirements for the Vertex AI Gemini API.
Supported capabilities and general features for each model
Gemini 1.5 Flash | Gemini 1.5 Pro | Gemini 1.0 Pro Vision | Gemini 1.0 Pro | ||
---|---|---|---|---|---|
Text generation from text-only input | |||||
Text generation from multimodal input | |||||
Structured output (like JSON) using response schema | |||||
Multi-turn chat | |||||
Function calling | |||||
Basic function calling | |||||
Parallel function calling | |||||
Function calling mode | |||||
Count tokens and billable characters | |||||
System instructions |
Detailed information about each model
Property | Gemini 1.5 Flash | Gemini 1.5 Pro | Gemini 1.0 Pro Vision | Gemini 1.0 Pro |
---|---|---|---|---|
Total token limit (combined input and output) * | 1,048,576 tokens | 2,097,152 tokens | 16,384 tokens | 32,760 tokens |
Output token limit * | 8,192 tokens | 8,192 tokens | 2,048 tokens | 8,192 tokens |
Maximum number of images per request | 3,000 images | 3,000 images | 16 images | N/A |
Max base64 encoded image size | 7 MB | 7 MB | 7 MB | N/A |
Maximum PDF size | 30 MB | 30 MB | 30 MB | N/A |
Maximum number of video files per request | 10 video files | 10 video files | 1 video file | N/A |
Maximum video length (frames only) | ~60 minutes of video | ~60 minutes of video | 2 minutes | N/A |
Maximum video length (frames and audio) | ~45 minutes of video | ~45 minutes of video | N/A | N/A |
Maximum number of audio files per request | 1 audio file | 1 audio file | N/A | N/A |
Maximum audio length | ~8.4 hours of audio | ~8.4 hours of audio | N/A | N/A |
* For all Gemini models, a token is equivalent to about 4 characters,
so 100 tokens are about 60-80 English words. You can determine the total count
of tokens in your requests using
countTokens
.
Learn about supported file types, how to specify MIME type, and how to make sure that your files and multimodal requests meet the requirements and follow best practices in Supported input files and requirements for the Vertex AI Gemini API.
Versioning of the models
The Gemini models are offered in stable, auto-updated, and preview versions.
Stable versions are considered Generally Available.
- Stable versions have model names appended with a
specific three digit version number, for example
.gemini-1.5-pro-002
- Stable versions have model names appended with a
specific three digit version number, for example
Auto-updated versions always point to the latest stable version of that model; if a new stable version is released, the auto-updated version automatically starts pointing to that new stable version.
- Auto-updated versions have model names with no
appendage, for example
.gemini-1.5-pro
- Auto-updated versions have model names with no
appendage, for example
Preview versions have new capabilities and are considered not stable. Note that preview versions always point to the latest preview version of that model; if a new preview version is released, any existing preview version automatically starts pointing to that new preview version.
- Preview versions have model names appended with
along with the model's initial release date (-preview
), for example-MMDD
(released on April 9, 2024).gemini-1.5-pro-preview-0409
- Preview versions have model names appended with
Learn more about the available Gemini model versions and their lifecycle in the Google Cloud documentation.
Available model names
Model names are the explicit values that you include in your code during initialization of the generative model (which is a required step to call the Gemini API). For initialization examples for your language, see the getting started guide.
Gemini 1.5 Flash model names
Model name | Description | Release stage | Initial release date | Discontinuation date |
---|---|---|---|---|
Stable versions | ||||
gemini-1.5-flash-002 |
Latest stable version of Gemini 1.5 Flash | General Availability | 2024-09-24 | No earlier than 2025-09-24 |
gemini-1.5-flash-001 |
Initial stable version of Gemini 1.5 Flash | General Availability | 2024-05-24 | No earlier than 2025-05-24 |
Auto-updated version | ||||
gemini-1.5-flash |
Points to the latest stable version of 1.5 Flash (currently gemini-1.5-flash-002 |
General Availability | 2024-09-24 | --- |
Gemini 1.5 Pro model names
Model name | Description | Release stage | Initial release date | Discontinuation date |
---|---|---|---|---|
Stable versions | ||||
gemini-1.5-pro-002 |
Latest stable version of Gemini 1.5 Pro | General Availability | 2024-09-24 | No earlier than 2025-09-24 |
gemini-1.5-pro-001 |
Initial stable version of Gemini 1.5 Pro | General Availability | 2024-05-24 | No earlier than 2025-05-24 |
Auto-updated version | ||||
gemini-1.5-pro |
Points to the latest stable version of 1.5 Pro (currently gemini-1.5-pro-002 |
General Availability | 2024-09-24 | --- |
Gemini 1.0 Pro Vision model names
Model name | Description | Release stage | Initial release date | Discontinuation date |
---|---|---|---|---|
Stable versions | ||||
gemini-1.0-pro-vision-001 |
Latest stable version of Gemini 1.0 Pro Vision | General Availability | 2024-02-15 | No earlier than 2025-02-15 |
Auto-updated version | ||||
gemini-1.0-pro-vision |
Points to the latest stable version of 1.5 Pro Vision (currently gemini-1.5-pro-vision-001 |
General Availability | 2024-01-04 | --- |
Gemini 1.0 Pro model names
Model name | Description | Release stage | Initial release date | Discontinuation date |
---|---|---|---|---|
Stable versions | ||||
gemini-1.0-pro-002 |
Latest stable version of Gemini 1.0 Pro | General Availability | 2024-04-09 | No earlier than 2025-04-09 |
gemini-1.0-pro-001 |
Stable version of Gemini 1.0 Pro | General Availability | 2024-02-15 | No earlier than 2025-02-15 |
Auto-updated version | ||||
gemini-1.0-pro |
Points to the latest stable version of 1.0 Pro (currently gemini-1.0-pro-002 |
General Availability | 2024-02-15 | --- |
Supported languages
All the Gemini models can understand and respond in the following languages:
Arabic (ar), Bengali (bn), Bulgarian (bg), Chinese simplified and traditional (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hebrew (iw), Hindi (hi), Hungarian (hu), Indonesian (id), Italian (it), Japanese (ja), Korean (ko), Latvian (lv), Lithuanian (lt), Norwegian (no), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swahili (sw), Swedish (sv), Thai (th), Turkish (tr), Ukrainian (uk), Vietnamese (vi)
Gemini 1.5 Pro and Gemini 1.5 Flash models can understand and respond in the following additional languages:
Afrikaans (af), Amharic (am), Assamese (as), Azerbaijani (az), Belarusian (be), Bosnian (bs), Catalan (ca), Cebuano (ceb), Corsican (co), Welsh (cy), Dhivehi (dv), Esperanto (eo), Basque (eu), Persian (fa), Filipino (Tagalog) (fil), Frisian (fy), Irish (ga), Scots Gaelic (gd), Galician (gl), Gujarati (gu), Hausa (ha), Hawaiian (haw), Hmong (hmn), Haitian Creole (ht), Armenian (hy), Igbo (ig), Icelandic (is), Javanese (jv), Georgian (ka), Kazakh (kk), Khmer (km), Kannada (kn), Krio (kri), Kurdish (ku), Kyrgyz (ky), Latin (la), Luxembourgish (lb), Lao (lo), Malagasy (mg), Maori (mi), Macedonian (mk), Malayalam (ml), Mongolian (mn), Meiteilon (Manipuri) (mni-Mtei), Marathi (mr), Malay (ms), Maltese (mt), Myanmar (Burmese) (my), Nepali (ne), Nyanja (Chichewa) (ny), Odia (Oriya) (or), Punjabi (pa), Pashto (ps), Sindhi (sd), Sinhala (Sinhalese) (si), Samoan (sm), Shona (sn), Somali (so), Albanian (sq), Sesotho (st), Sundanese (su), Tamil (ta), Telugu (te), Tajik (tg), Uyghur (ug), Urdu (ur), Uzbek (uz), Xhosa (xh), Yiddish (yi), Yoruba (yo), Zulu (zu)
Next steps
Try out the capabilities of the Gemini API
- Build multi-turn conversations (chat).
- Generate text from text-only prompts.
- Generate text from multimodal prompts (including text, images, PDFs, video, and audio).
- Generate structured output (like JSON) from both text and multimodal prompts.
- Use function calling to connect generative models to external systems and information.