Gemini Live API 支持与 双向 Gemini 模型进行低延迟的实时语音和视频互动。
Live API 及其特殊模型系列可以处理连续的音频、视频或文本流,以提供即时、自然逼真的语音回答,从而为用户打造自然的对话体验。
本页介绍了如何开始使用最常见的功能 - 流式音频输入和输出,但 Live API 支持许多不同的功能和配置选项。
Live API 是一种有状态 API,用于创建 WebSocket 连接,以便在客户端与 Gemini 服务器之间建立会话。如需了解详情,请参阅 Live API 参考文档(Gemini Developer API | Vertex AI Gemini API)。
查看实用资源
Swift - 即将推出!| Android - 快速入门应用 | Web - 快速入门应用 | Flutter - 快速入门应用 | Unity - 即将推出!
在实际部署的应用中体验 Gemini Live API - 通过 Firebase 控制台查看 Flutter AI Playground 应用
准备工作
如果您尚未完成入门指南,请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务,以及创建 LiveModel 实例。
您可以在 Google AI Studio 或 Vertex AI Studio 中使用提示和 Live API 进行原型设计。
支持此功能的模型
Gemini 2.5 Flash Live 模型是支持 Gemini Live API 的原生音频模型。尽管模型具有不同的模型名称(具体取决于 Gemini API 提供方),但模型的行为和功能是相同的。
Gemini Developer API
gemini-2.5-flash-native-audio-preview-12-2025gemini-2.5-flash-native-audio-preview-09-2025
尽管这些模型是预览版,但它们可在 Gemini Developer API 的“免费层级”中使用。
Vertex AI Gemini API
gemini-live-2.5-flash-native-audio(发布于 2025 年 12 月)gemini-live-2.5-flash-preview-native-audio-09-2025
使用 Vertex AI Gemini API 时,Live API 模型在
global位置不受支持。
流式传输音频输入和输出
|
点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。 |
以下示例展示了如何基本实现发送流式音频输入和接收流式音频输出。
如需了解 Live API 的其他选项和功能,请参阅本页下文的“你还能做什么?”部分。
Swift
如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 audio。
import FirebaseAILogic
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
generationConfig: LiveGenerationConfig(
responseModalities: [.audio]
)
)
do {
let session = try await liveModel.connect()
// Load the audio file, or tap a microphone
guard let audioFile = NSDataAsset(name: "audio.pcm") else {
fatalError("Failed to load audio file")
}
// Provide the audio data
await session.sendAudioRealtime(audioFile.data)
var outputText = ""
for try await message in session.responses {
if case let .content(content) = message.payload {
content.modelTurn?.parts.forEach { part in
if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
// Handle 16bit pcm audio data at 24khz
playAudio(part.data)
}
}
// Optional: if you don't require to send more requests.
if content.isTurnComplete {
await session.close()
}
}
}
} catch {
fatalError(error.localizedDescription)
}
Kotlin
如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 AUDIO。
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO
}
)
val session = liveModel.connect()
// This is the recommended approach.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()
Java
如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 AUDIO。
ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
"gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
new LiveGenerationConfig.Builder()
.setResponseModality(ResponseModality.AUDIO)
.build()
);
LiveModelFutures liveModel = LiveModelFutures.from(lm);
ListenableFuture<LiveSession> sessionFuture = liveModel.connect();
Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
@Override
public void onSuccess(LiveSession ses) {
LiveSessionFutures session = LiveSessionFutures.from(ses);
session.startAudioConversation();
}
@Override
public void onFailure(Throwable t) {
// Handle exceptions
}
}, executor);
Web
如需使用 Live API,请创建 LiveGenerativeModel 实例并将响应模态设置为 AUDIO。
import { initializeApp } from "firebase/app";
import { getAI, getLiveGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";
// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
// ...
};
// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);
// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
// Create a `LiveGenerativeModel` instance with a model that supports the Live API
const liveModel = getLiveGenerativeModel(ai, {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
generationConfig: {
responseModalities: [ResponseModality.AUDIO],
},
});
const session = await liveModel.connect();
// Start the audio conversation
const audioConversationController = await startAudioConversation(session);
// ... Later, to stop the audio conversation
// await audioConversationController.stop()
Dart
如需使用 Live API,请创建 LiveGenerativeModel 实例,并将回答模态设置为 audio。
import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';
late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();
await Firebase.initializeApp(
options: DefaultFirebaseOptions.currentPlatform,
);
// Initialize the Gemini Developer API backend service
// Create a `liveGenerativeModel` instance with a model that supports the Live API
final liveModel = FirebaseAI.googleAI().liveGenerativeModel(
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
// Configure the model to respond with audio
liveGenerationConfig: LiveGenerationConfig(
responseModalities: [ResponseModalities.audio],
),
);
_session = await liveModel.connect();
final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);
// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
// Process the received message
}
Unity
如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 Audio。
using Firebase;
using Firebase.AI;
async Task SendTextReceiveAudio() {
// Initialize the Gemini Developer API backend service
// Create a `LiveModel` instance with a model that supports the Live API
var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
// Configure the model to respond with audio
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio })
);
LiveSession session = await liveModel.ConnectAsync();
// Start a coroutine to send audio from the Microphone
var recordingCoroutine = StartCoroutine(SendAudio(session));
// Start receiving the response
await ReceiveAudio(session);
}
IEnumerator SendAudio(LiveSession liveSession) {
string microphoneDeviceName = null;
int recordingFrequency = 16000;
int recordingBufferSeconds = 2;
var recordingClip = Microphone.Start(microphoneDeviceName, true,
recordingBufferSeconds, recordingFrequency);
int lastSamplePosition = 0;
while (true) {
if (!Microphone.IsRecording(microphoneDeviceName)) {
yield break;
}
int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);
if (currentSamplePosition != lastSamplePosition) {
// The Microphone uses a circular buffer, so we need to check if the
// current position wrapped around to the beginning, and handle it
// accordingly.
int sampleCount;
if (currentSamplePosition > lastSamplePosition) {
sampleCount = currentSamplePosition - lastSamplePosition;
} else {
sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
}
if (sampleCount > 0) {
// Get the audio chunk
float[] samples = new float[sampleCount];
recordingClip.GetData(samples, lastSamplePosition);
// Send the data, discarding the resulting Task to avoid the warning
_ = liveSession.SendAudioAsync(samples);
lastSamplePosition = currentSamplePosition;
}
}
// Wait for a short delay before reading the next sample from the Microphone
const float MicrophoneReadDelay = 0.5f;
yield return new WaitForSeconds(MicrophoneReadDelay);
}
}
Queue audioBuffer = new();
async Task ReceiveAudio(LiveSession liveSession) {
int sampleRate = 24000;
int channelCount = 1;
// Create a looping AudioClip to fill with the received audio data
int bufferSamples = (int)(sampleRate * channelCount);
AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
sampleRate, true, OnAudioRead);
// Attach the clip to an AudioSource and start playing it
AudioSource audioSource = GetComponent();
audioSource.clip = clip;
audioSource.loop = true;
audioSource.Play();
// Start receiving the response
await foreach (var message in liveSession.ReceiveAsync()) {
// Process the received message
foreach (float[] pcmData in message.AudioAsFloat) {
lock (audioBuffer) {
foreach (float sample in pcmData) {
audioBuffer.Enqueue(sample);
}
}
}
}
}
// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
int samplesToProvide = data.Length;
int samplesProvided = 0;
lock(audioBuffer) {
while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
data[samplesProvided] = audioBuffer.Dequeue();
samplesProvided++;
}
}
while (samplesProvided < samplesToProvide) {
data[samplesProvided] = 0.0f;
samplesProvided++;
}
}
价格和令牌计数
您可以在所选 Gemini API 提供商的文档中找到 Live API 模型的价格信息:Gemini Developer API | Vertex AI Gemini API。
无论您使用的是哪个 Gemini API 提供商,Live API 都不支持 Count Tokens API。
您还可以做些什么?
不妨详细了解 Live API 的全套功能,例如以流式传输各种输入模态(音频、文本或视频 + 音频)。
使用各种配置选项(例如添加转写或设置回答语音)自定义实现。
通过让模型访问工具(例如函数调用和依托 Google 搜索实现接地),增强实现效果。有关如何将工具与 Live API 搭配使用的官方文档即将推出!
了解使用 Live API 的限制和规范,例如会话时长、速率限制、支持的语言等。