Thinking

Gemini 2.5 models can use an internal "thinking process" that significantly improves their reasoning and multi-step planning abilities, making them highly effective for complex tasks such as coding, advanced mathematics, and data analysis.

You can configure how much thinking a model can do using a thinking budget. This configuration is particularly important if reducing latency or cost is a priority. Also, review the comparison of task difficulties to decide how much a model might need its thinking capability.

Use a thinking model

Use a thinking model just like you'd use any other Gemini model (initialize your chosen Gemini API provider, create a GenerativeModel instance, etc.). These models can be used for text or code generation tasks, like generating structured output or analyzing multimodal input (like images, video, audio, or PDFs). You can even use thinking models when you're streaming the output.

Supported models

Only Gemini 2.5 models support this capability.

  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-2.5-flash-lite-preview-06-17

Best practices for using thinking models

We recommend testing your prompt in Google AI Studio or Vertex AI Studio where you can view the full thinking process. You can identify any areas where the model may have gone astray so that you can refine your prompts to get more consistent and accurate responses.

Begin with a general prompt that describes the desired outcome, and observe the model's initial thoughts on how it determines its response. If the response isn't as expected, help the model generate a better response by using any of the following prompting techniques:

  • Provide step-by-step instructions
  • Provide several examples of input-output pairs
  • Provide guidance for how the output and responses should be phrased and be formatted
  • Provide specific verification steps

In addition to prompting, consider using these recommendations:

  • Set system instructions, which are like a "preamble" that you add before the model gets exposed to any further instructions from the prompt or end user. They let you steer the behavior of the model based on your specific needs and use cases.

  • Set a thinking budget to configure how much thinking the model can do. If you set a low budget, then the model won't "overthink" its response. If you set a high budget, then the model can think more if needed. Setting a thinking budget also reserves more of the total token output limit for the actual response.

  • (if using the Vertex AI Gemini API) Make sure to enable AI monitoring in the Firebase console so that you can monitor the latency of your requests that have thinking enabled. Note that thinking tokens aren't yet displayed in the monitoring dashboards.

Control the thinking budget

To control how much thinking the model can do to generate its response, you can specify the number of thinking budget tokens that it's allowed to use.

You can manually set the thinking budget in situations where you might need more or fewer tokens than the default thinking budget. Find more detailed guidance about task complexity and suggested budgets later in this section. Here's some high-level guidance:

  • Set a low thinking budget if latency is important or for less complex tasks
  • Set a high thinking budget for more complex tasks

Set the thinking budget

Click your Gemini API provider to view provider-specific content and code on this page.

Set the thinking budget in a GenerationConfig as part of creating the GenerativeModel instance. The configuration is maintained for the lifetime of the instance. If you want to use different thinking budgets for different requests, then create GenerativeModel instances configured with each budget.

Learn about supported thinking budget values later in this section.

Swift

Set the thinking budget in a GenerationConfig as part of creating a GenerativeModel instance.


// ...

// Set the thinking configuration
// Use a thinking budget value appropriate for your model (example value shown here)
let generationConfig = GenerationConfig(
  thinkingConfig: ThinkingConfig(thinkingBudget: 1024)
)

// Specify the config as part of creating the `GenerativeModel` instance
let model = FirebaseAI.firebaseAI(backend: .googleAI()).generativeModel(
  modelName: "GEMINI_MODEL_NAME",
  generationConfig: generationConfig
)

// ...

Kotlin

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.


// ...

// Set the thinking configuration
// Use a thinking budget value appropriate for your model (example value shown here)
val generationConfig = generationConfig {
  thinkingConfig = thinkingConfig {
      thinkingBudget = 1024
  }
}

// Specify the config as part of creating the `GenerativeModel` instance
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(
  modelName = "GEMINI_MODEL_NAME",
  generationConfig,
)

// ...

Java

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.


// ...

// Set the thinking configuration
// Use a thinking budget value appropriate for your model (example value shown here)
ThinkingConfig thinkingConfig = new ThinkingConfig.Builder()
    .setThinkingBudget(1024)
    .build();

GenerationConfig generationConfig = GenerationConfig.builder()
    .setThinkingConfig(thinkingConfig)
    .build();

// Specify the config as part of creating the `GenerativeModel` instance
GenerativeModelFutures model = GenerativeModelFutures.from(
        FirebaseAI.getInstance(GenerativeBackend.googleAI())
                .generativeModel(
                  /* modelName */ "<var>GEMINI_MODEL_NAME</var>",
                  /* generationConfig */ generationConfig
                );
);

// ...

Web

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.


// ...

const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Set the thinking configuration
// Use a thinking budget value appropriate for your model (example value shown here)
const generationConfig = {
  thinkingConfig: {
    thinkingBudget: 1024
  }
};

// Specify the config as part of creating the `GenerativeModel` instance
const model = getGenerativeModel(ai, { model: "GEMINI_MODEL_NAME", generationConfig });

// ...

Dart

Set the values of the parameters in a GenerationConfig as part of creating a GenerativeModel instance.


// ...

// Set the thinking configuration
// Use a thinking budget value appropriate for your model (example value shown here)
final thinkingConfig = ThinkingConfig(thinkingBudget: 1024);

final generationConfig = GenerationConfig(
  // ...
  thinkingConfig: thinkingConfig
);

// Specify the config as part of creating the `GenerativeModel` instance
final model = FirebaseAI.googleAI().generativeModel(
  model: 'GEMINI_MODEL_NAME',
  config: generationConfig,
);

// ...

Unity

Support for setting a thinking budget for Unity is coming in its next release.

Supported thinking budget values

The following table lists the thinking budget values that you can set for each model by configuring the model's thinkingBudget.

Model Default value Available range for thinking budget Value to
disable thinking
Value to
enable dynamic thinking
Minimum value Maximum value
Gemini 2.5 Pro 8,192 128 32,768 cannot be turned off -1
Gemini 2.5 Flash 8,192 1 24,576 0 -1
Gemini 2.5 Flash‑Lite 0
(thinking is disabled by default)
512 24,576 0
(or don't configure thinking budget at all)
-1

Disable thinking

For some easier tasks, the thinking capability isn't necessary, and traditional inference is sufficient. Or if reducing latency is a priority, you may not want the model to take any more time than necessary to generate a response.

In these situations, you can disable (or turn off) thinking:

  • Gemini 2.5 Pro: thinking cannot be disabled
  • Gemini 2.5 Flash: set thinkingBudget to 0 tokens
  • Gemini 2.5 Flash-Lite: thinking is disabled by default

Enable dynamic thinking

You can let the model decide when and how much it thinks (called dynamic thinking) by setting thinkingBudget to -1. The model can use as many tokens as it decides is appropriate, up to its maximum token value listed above.

Task complexity

  • Easy tasks — thinking could be turned off
    Straightforward requests where complex reasoning isn't required, such as fact retrieval or classification. Examples:

    • "Where was DeepMind founded?"
    • "Is this email asking for a meeting or just providing information?"
  • Medium tasks — default budget or some additional thinking budget needed
    Common requests that benefit from a degree of step-by-step processing or deeper understanding. Examples:

    • "Create an analogy between photosynthesis and growing up."
    • "Compare and contrast electric cars and hybrid cars."
  • Hard tasks — maximum thinking budget may be needed
    Truly complex challenges, such as solving complex math problems or coding tasks. These types of tasks require the model to engage its full reasoning and planning capabilities, often involving many internal steps before providing an answer. Examples:

    • "Solve problem 1 in AIME 2025: Find the sum of all integer bases b > 9 for which 17b is a divisor of 97b."
    • "Write Python code for a web application that visualizes real-time stock market data, including user authentication. Make it as efficient as possible."

Pricing and counting thinking tokens

Thinking tokens use the same pricing as text-output tokens.

You can get the total number of thinking tokens from the thoughtsTokenCount field in the usageMetadata attribute of the response:

Swift

// ...

let response = try await model.generateContent("Why is the sky blue?")

if let usageMetadata = response.usageMetadata {
  print("Thoughts Token Count: \(usageMetadata.thoughtsTokenCount)")
}

Kotlin

// ...

val response = model.generateContent("Why is the sky blue?")

response.usageMetadata?.let { usageMetadata ->
    println("Thoughts Token Count: ${usageMetadata.thoughtsTokenCount}")
}

Java

// ...

ListenableFuture<GenerateContentResponse> response =
    model.generateContent("Why is the sky blue?");

Futures.addCallback(response, new FutureCallback<GenerateContentResponse>() {
    @Override
    public void onSuccess(GenerateContentResponse result) {
        String usageMetadata = result.getUsageMetadata();
        if (usageMetadata != null) {
            System.out.println("Thoughts Token Count: " +
                usageMetadata.getThoughtsTokenCount());
        }
    }

    @Override
    public void onFailure(Throwable t) {
        t.printStackTrace();
    }
}, executor);

Web

// ...

const response = await model.generateContent("Why is the sky blue?");

if (response?.usageMetadata?.thoughtsTokenCount != null) {
    console.log(`Thoughts Token Count: ${response.usageMetadata.thoughtsTokenCount}`);
}

Dart

// ...

final response = await model.generateContent(
  Content.text("Why is the sky blue?"),
]);

if (response?.usageMetadata case final usageMetadata?) {
  print("Thoughts Token Count: ${usageMetadata.thoughtsTokenCount}");
}

Unity

Support for counting thinking tokens for Unity is coming in its next release.

Learn more about tokens in the count tokens guide.