使用 Gemini Live API 进行双向流式传输


Gemini Live API 支持与 Gemini 建立低延迟的双向文本和语音互动。借助 Live API,您可以为最终用户提供自然的、类似人类的语音对话体验,并能够使用文本或语音指令中断模型的回答。该模型可以处理文本和音频输入(视频输入即将推出!),并提供文本和音频输出。

您可以在 Google AI StudioVertex AI Studio 中使用提示和 Live API 进行原型设计。

Live API 是一个有状态的 API,用于创建 WebSocket 连接,以在客户端与 Gemini 服务器之间建立 会话。如需了解详情,请参阅 Live API 参考文档 (Gemini Developer API | Vertex AI Gemini API)。

准备工作

点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。

如果您尚未完成入门指南,请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务,以及创建 LiveModel 实例。

支持此功能的模型

支持 Live API 的模型取决于您选择的 Gemini API 提供方。

请注意,无论 API 提供方是谁,gemini-2.0-flashgemini-2.5-flash 型号都支持 Live API

  • Gemini Developer API

    • gemini-live-2.5-flash (非公开正式版*
    • gemini-live-2.5-flash-preview
    • gemini-2.0-flash-live-001
    • gemini-2.0-flash-live-preview-04-09
  • Vertex AI Gemini API

    • gemini-live-2.5-flash (非公开正式版*
    • gemini-2.0-flash-live-preview-04-09 (仅可在 us-central1 中访问)

请注意,对于 Live API 的 2.5 模型名称,live 段紧跟在 gemini 段之后。

* 请与您的 Google Cloud 账号团队代表联系,申请访问权限。

使用 Live API 的标准功能

本部分介绍了如何使用 Live API 的标准功能,特别是如何以流式传输各种类型的输入和输出:

根据流式文本输入生成流式文本

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容

您可以发送流式文本输入,并接收流式文本输出。请务必创建 liveModel 实例,并将响应模态设置为 Text

Swift

Apple 平台应用尚不支持 Live API,但很快就会推出相应支持,敬请期待!

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.TEXT 
   }
)

val session = model.connect()

// Provide a text prompt
val text = "tell a short story"

session.send(text)

var outputText = ""
session.receive().collect {
    if(it.turnComplete) {
        // Optional: if you don't require to send more requests.
        session.stopReceiving();
    }
    outputText = outputText + it.text
}

// Output received from the server.
println(outputText)

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();
class LiveContentResponseSubscriber implements Subscriber<LiveContentResponse> {
    @Override
    public void onSubscribe(Subscription s) {
        s.request(Long.MAX_VALUE); // Request an unlimited number of items
    }
    @Override
    public void onNext(LiveContentResponse liveContentResponse) {
       // Handle the response from the server.
	System.out.println(liveContentResponse.getText());
    }
    @Override
    public void onError(Throwable t) {
        System.err.println("Error: " + t.getMessage());
    }
    @Override
    public void onComplete() {
        System.out.println("Done receiving messages!");
    }
}
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	  LiveSessionFutures session = LiveSessionFutures.from(ses);
        // Provide a text prompt
        String text = "tell me a short story?";
        session.send(text);
        Publisher<LiveContentResponse> publisher = session.receive();
        publisher.subscribe(new LiveContentResponseSubscriber());
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web

Web 应用尚不支持 Live API,但请稍后再回来查看!

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';

late LiveModelSession _session;

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to respond with text
  config: LiveGenerationConfig(responseModalities: [ResponseModality.text]),
);

_session = await model.connect();

// Provide a text prompt
final prompt = Content.text('tell a short story');
await _session.send(input: prompt, turnComplete: true);

// In a separate thread, receive the response
await for (final message in _session.receive()) {
   // Process the received message 
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveText() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Text })
  );

  LiveSession session = await model.ConnectAsync();

  // Provide a text prompt
  var prompt = ModelContent.Text("tell a short story");
  await session.SendAsync(content: prompt, turnComplete: true);

  // Receive the response
  await foreach (var message in session.ReceiveAsync()) {
    // Process the received message
    if (!string.IsNullOrEmpty(message.Text)) {
      UnityEngine.Debug.Log("Received message: " + message.Text);
    }
  }
}

根据流式音频输入生成流式音频

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容

您可以发送流式音频输入,并接收流式音频输出。请务必创建 LiveModel 实例,并将响应模态设置为 Audio

请在本页面稍后部分了解如何配置和自定义回答语音

Swift

Apple 平台应用尚不支持 Live API,但很快就会推出相应支持,敬请期待!

Kotlin


// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with text
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO 
   }
)

val session = model.connect()

// This is the recommended way.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.0-flash-live-preview-04-09",
        // Configure the model to respond with text
        new LiveGenerationConfig.Builder()
                .setResponseModalities(ResponseModality.TEXT)
                .build()
);
LiveModelFutures model = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture =  model.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web

Web 应用尚不支持 Live API,但请稍后再回来查看!

Dart


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
final model = FirebaseAI.googleAI().liveModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
   // Configure the model to respond with audio
   config: LiveGenerationConfig(responseModalities: [ResponseModality.audio]),
);

_session = await model.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message 
}

Unity


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with the flash-live model (only model that supports the Live API)
  var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
    modelName: "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to respond with audio
    liveGenerationConfig: new LiveGenerationConfig(
        responseModalities: new[] { ResponseModality.Audio })
  );

  LiveSession session = await model.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}



打造更具吸引力和互动性的体验

本部分介绍了如何创建和管理 Live API 中更具吸引力或互动性的功能。

更改回答语音

Live API 使用 Chirp 3 来支持合成语音回答。 使用 Firebase AI Logic 时,您可以发送各种高清语音语言的音频。如需查看完整列表以及每种语音的试听,请参阅 Chirp 3:高清语音

如需指定语音,请在 speechConfig 对象中设置语音名称,作为模型配置的一部分。如果您未指定语音,则默认使用 Puck

在试用此示例之前,请完成本指南的准备工作部分,以设置您的项目和应用。
在该部分中,您还需要点击所选Gemini API提供商对应的按钮,以便在此页面上看到特定于提供商的内容

Swift

Apple 平台应用尚不支持 Live API,但很快就会推出相应支持,敬请期待!

Kotlin


// ...

val model = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
        speechConfig = SpeechConfig(voice = Voices.FENRIR)
    }
)

// ...

Java


// ...

LiveModel model = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
    "gemini-2.0-flash-live-preview-04-09",
    // Configure the model to use a specific voice for its audio response
    new LiveGenerationConfig.Builder()
        .setResponseModalities(ResponseModality.AUDIO)
        .setSpeechConfig(new SpeechConfig(Voices.FENRIR))
        .build()
);

// ...

Web

Web 应用尚不支持 Live API,但请稍后再回来查看!

Dart


// ...

final model = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.0-flash-live-preview-04-09',
  // Configure the model to use a specific voice for its audio response
  config: LiveGenerationConfig(
    responseModality: ResponseModality.audio,
    speechConfig: SpeechConfig(voiceName: 'Fenrir'),
  ),
);

// ...

Unity


var model = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
  modelName: "gemini-2.0-flash-live-preview-04-09",
  liveGenerationConfig: new LiveGenerationConfig(
    responseModalities: new[] { ResponseModality.Audio },
    speechConfig: SpeechConfig.UsePrebuiltVoice("Fenrir"))
);

在提示模型以非英语语言回答时,为了获得最佳效果,请在系统指令中添加以下内容:

RESPOND IN LANGUAGE. YOU MUST RESPOND UNMISTAKABLY IN LANGUAGE.

在会话和请求之间保持上下文

您可以使用聊天结构在会话和请求之间保持上下文。请注意,此功能仅适用于文本输入和文本输出。

此方法最适合简短的上下文;您可以发送轮流交互来表示确切的事件序列。对于较长的上下文,我们建议提供单个消息摘要,以便为后续互动腾出上下文窗口。

处理中断

Firebase AI Logic支持处理中断。 请过段时间再来查看!

使用函数调用(工具)

您可以定义工具(例如可用函数)以与 Live API 搭配使用,就像使用标准内容生成方法一样。本部分介绍将 Live API 与函数调用搭配使用时的一些细微差别。如需查看函数调用的完整说明和示例,请参阅函数调用指南

根据单个提示,模型可以生成多个函数调用以及将这些函数的输出串联起来所需的代码。此代码在沙盒环境中执行,生成后续的 BidiGenerateContentToolCall 消息。执行会暂停,直到每个函数调用的结果可用为止,从而确保顺序处理。

此外,将 Live API 与函数调用搭配使用非常强大,因为模型可以向用户请求后续信息或澄清信息。例如,如果模型没有足够的信息来为要调用的函数提供形参值,则模型可以要求用户提供更多信息或澄清信息。

客户端应使用 BidiGenerateContentToolResponse 进行响应。



限制和要求

请注意 Live API 的以下限制和要求。

转录

Firebase AI Logic支持转写。请过段时间再来查看!

语言

音频格式

Live API 支持以下音频格式:

  • 输入音频格式:原始 16 位 PCM 音频,16kHz,小端字节序
  • 输出音频格式:原始 16 位 PCM 音频,24kHz 小端字节序

速率限制

Live API 对每个 Firebase 项目的并发会话数以及每分钟的令牌数 (TPM) 都有速率限制。

  • Gemini Developer API

    • 限制因项目的 Gemini Developer API“使用层级”(请参阅其速率限制文档)而异
  • Vertex AI Gemini API

    • 每个 Firebase 项目 5,000 个并发会话
    • 每分钟 400 万个 token

会话时长

会话的默认时长为 10 分钟。当会话时长超过限制时,连接会被终止。

模型还受上下文大小的限制。发送大量输入内容可能会导致会话提前终止。

语音活动检测 (VAD)

该模型会自动对连续音频输入流执行语音活动检测 (VAD)。VAD 默认处于启用状态。

token 计数

您无法将 CountTokens API 与 Live API 搭配使用。


就您使用 Firebase AI Logic 的体验提供反馈