使用 Firebase AI Logic 开始使用 Gemini Live API


Gemini Live API 支持与 双向 Gemini 模型进行低延迟的实时语音和视频互动。

Live API 及其特殊模型系列可以处理连续的音频、视频或文本流,以提供即时、自然逼真的语音回答,从而为用户打造自然的对话体验。

本页介绍了如何开始使用最常见的功能 - 流式音频输入和输出,但 Live API 支持许多不同的功能配置选项

Live API 是一种有状态 API,用于创建 WebSocket 连接,以便在客户端与 Gemini 服务器之间建立会话。如需了解详情,请参阅 Live API 参考文档(Gemini Developer API | Vertex AI Gemini API)。

跳转到代码示例

查看实用资源

准备工作

如果您尚未完成入门指南,请先完成该指南。该指南介绍了如何设置 Firebase 项目、将应用连接到 Firebase、添加 SDK、为所选的 Gemini API 提供程序初始化后端服务,以及创建 LiveModel 实例。

您可以在 Google AI StudioVertex AI Studio 中使用提示和 Live API 进行原型设计。

支持此功能的模型

Gemini 2.5 Flash Live 模型是支持 Gemini Live API原生音频模型。尽管模型具有不同的模型名称(具体取决于 Gemini API 提供方),但模型的行为和功能是相同的。

  • Gemini Developer API

    • gemini-2.5-flash-native-audio-preview-12-2025
    • gemini-2.5-flash-native-audio-preview-09-2025

    尽管这些模型是预览版,但它们可在 Gemini Developer API 的“免费层级”中使用。

  • Vertex AI Gemini API

    • gemini-live-2.5-flash-native-audio (发布于 2025 年 12 月)
    • gemini-live-2.5-flash-preview-native-audio-09-2025

    使用 Vertex AI Gemini API 时,Live API 模型在 global 位置受支持。

流式传输音频输入和输出

点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。

以下示例展示了如何基本实现发送流式音频输入和接收流式音频输出

如需了解 Live API 的其他选项和功能,请参阅本页下文的“你还能做什么?”部分。

Swift

如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 audio


import FirebaseAILogic

// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
  modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to respond with audio
  generationConfig: LiveGenerationConfig(
    responseModalities: [.audio]
  )
)

do {
  let session = try await liveModel.connect()

  // Load the audio file, or tap a microphone
  guard let audioFile = NSDataAsset(name: "audio.pcm") else {
    fatalError("Failed to load audio file")
  }

  // Provide the audio data
  await session.sendAudioRealtime(audioFile.data)

  var outputText = ""
  for try await message in session.responses {
    if case let .content(content) = message.payload {
      content.modelTurn?.parts.forEach { part in
        if let part = part as? InlineDataPart, part.mimeType.starts(with: "audio/pcm") {
          // Handle 16bit pcm audio data at 24khz
          playAudio(part.data)
        }
      }
      // Optional: if you don't require to send more requests.
      if content.isTurnComplete {
        await session.close()
      }
    }
  }
} catch {
  fatalError(error.localizedDescription)
}

Kotlin

如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 AUDIO


// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
    modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
    // Configure the model to respond with audio
    generationConfig = liveGenerationConfig {
        responseModality = ResponseModality.AUDIO
   }
)

val session = liveModel.connect()

// This is the recommended approach.
// However, you can create your own recorder and handle the stream.
session.startAudioConversation()

Java

如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 AUDIO


ExecutorService executor = Executors.newFixedThreadPool(1);
// Initialize the Gemini Developer API backend service
// Create a `liveModel` instance with a model that supports the Live API
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
        "gemini-2.5-flash-native-audio-preview-12-2025",
        // Configure the model to respond with audio
        new LiveGenerationConfig.Builder()
                .setResponseModality(ResponseModality.AUDIO)
                .build()
);
LiveModelFutures liveModel = LiveModelFutures.from(lm);

ListenableFuture<LiveSession> sessionFuture =  liveModel.connect();

Futures.addCallback(sessionFuture, new FutureCallback<LiveSession>() {
    @Override
    public void onSuccess(LiveSession ses) {
	 LiveSessionFutures session = LiveSessionFutures.from(ses);
        session.startAudioConversation();
    }
    @Override
    public void onFailure(Throwable t) {
        // Handle exceptions
    }
}, executor);

Web

如需使用 Live API,请创建 LiveGenerativeModel 实例并将响应模态设置为 AUDIO


import { initializeApp } from "firebase/app";
import { getAI, getLiveGenerativeModel, GoogleAIBackend, ResponseModality } from "firebase/ai";

// TODO(developer) Replace the following with your app's Firebase configuration
// See: https://firebase.google.com/docs/web/learn-more#config-object
const firebaseConfig = {
  // ...
};

// Initialize FirebaseApp
const firebaseApp = initializeApp(firebaseConfig);

// Initialize the Gemini Developer API backend service
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });

// Create a `LiveGenerativeModel` instance with a model that supports the Live API
const liveModel = getLiveGenerativeModel(ai, {
  model: "gemini-2.5-flash-native-audio-preview-12-2025",
  // Configure the model to respond with audio
  generationConfig: {
    responseModalities: [ResponseModality.AUDIO],
  },
});

const session = await liveModel.connect();

// Start the audio conversation
const audioConversationController = await startAudioConversation(session);

// ... Later, to stop the audio conversation
// await audioConversationController.stop()

Dart

如需使用 Live API,请创建 LiveGenerativeModel 实例,并将回答模态设置为 audio


import 'package:firebase_ai/firebase_ai.dart';
import 'package:firebase_core/firebase_core.dart';
import 'firebase_options.dart';
import 'package:your_audio_recorder_package/your_audio_recorder_package.dart';

late LiveModelSession _session;
final _audioRecorder = YourAudioRecorder();

await Firebase.initializeApp(
  options: DefaultFirebaseOptions.currentPlatform,
);

// Initialize the Gemini Developer API backend service
// Create a `liveGenerativeModel` instance with a model that supports the Live API
final liveModel = FirebaseAI.googleAI().liveGenerativeModel(
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  // Configure the model to respond with audio
  liveGenerationConfig: LiveGenerationConfig(
    responseModalities: [ResponseModalities.audio],
  ),
);

_session = await liveModel.connect();

final audioRecordStream = _audioRecorder.startRecordingStream();
// Map the Uint8List stream to InlineDataPart stream
final mediaChunkStream = audioRecordStream.map((data) {
  return InlineDataPart('audio/pcm', data);
});
await _session.startMediaStream(mediaChunkStream);

// In a separate thread, receive the audio response from the model
await for (final message in _session.receive()) {
   // Process the received message
}

Unity

如需使用 Live API,请创建 LiveModel 实例并将响应模态设置为 Audio


using Firebase;
using Firebase.AI;

async Task SendTextReceiveAudio() {
  // Initialize the Gemini Developer API backend service
  // Create a `LiveModel` instance with a model that supports the Live API
  var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
      modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
      // Configure the model to respond with audio
      liveGenerationConfig: new LiveGenerationConfig(
          responseModalities: new[] { ResponseModality.Audio })
    );

  LiveSession session = await liveModel.ConnectAsync();

  // Start a coroutine to send audio from the Microphone
  var recordingCoroutine = StartCoroutine(SendAudio(session));

  // Start receiving the response
  await ReceiveAudio(session);
}

IEnumerator SendAudio(LiveSession liveSession) {
  string microphoneDeviceName = null;
  int recordingFrequency = 16000;
  int recordingBufferSeconds = 2;

  var recordingClip = Microphone.Start(microphoneDeviceName, true,
                                       recordingBufferSeconds, recordingFrequency);

  int lastSamplePosition = 0;
  while (true) {
    if (!Microphone.IsRecording(microphoneDeviceName)) {
      yield break;
    }

    int currentSamplePosition = Microphone.GetPosition(microphoneDeviceName);

    if (currentSamplePosition != lastSamplePosition) {
      // The Microphone uses a circular buffer, so we need to check if the
      // current position wrapped around to the beginning, and handle it
      // accordingly.
      int sampleCount;
      if (currentSamplePosition > lastSamplePosition) {
        sampleCount = currentSamplePosition - lastSamplePosition;
      } else {
        sampleCount = recordingClip.samples - lastSamplePosition + currentSamplePosition;
      }

      if (sampleCount > 0) {
        // Get the audio chunk
        float[] samples = new float[sampleCount];
        recordingClip.GetData(samples, lastSamplePosition);

        // Send the data, discarding the resulting Task to avoid the warning
        _ = liveSession.SendAudioAsync(samples);

        lastSamplePosition = currentSamplePosition;
      }
    }

    // Wait for a short delay before reading the next sample from the Microphone
    const float MicrophoneReadDelay = 0.5f;
    yield return new WaitForSeconds(MicrophoneReadDelay);
  }
}

Queue audioBuffer = new();

async Task ReceiveAudio(LiveSession liveSession) {
  int sampleRate = 24000;
  int channelCount = 1;

  // Create a looping AudioClip to fill with the received audio data
  int bufferSamples = (int)(sampleRate * channelCount);
  AudioClip clip = AudioClip.Create("StreamingPCM", bufferSamples, channelCount,
                                    sampleRate, true, OnAudioRead);

  // Attach the clip to an AudioSource and start playing it
  AudioSource audioSource = GetComponent();
  audioSource.clip = clip;
  audioSource.loop = true;
  audioSource.Play();

  // Start receiving the response
  await foreach (var message in liveSession.ReceiveAsync()) {
    // Process the received message
    foreach (float[] pcmData in message.AudioAsFloat) {
      lock (audioBuffer) {
        foreach (float sample in pcmData) {
          audioBuffer.Enqueue(sample);
        }
      }
    }
  }
}

// This method is called by the AudioClip to load audio data.
private void OnAudioRead(float[] data) {
  int samplesToProvide = data.Length;
  int samplesProvided = 0;

  lock(audioBuffer) {
    while (samplesProvided < samplesToProvide && audioBuffer.Count > 0) {
      data[samplesProvided] = audioBuffer.Dequeue();
      samplesProvided++;
    }
  }

  while (samplesProvided < samplesToProvide) {
    data[samplesProvided] = 0.0f;
    samplesProvided++;
  }
}



价格和令牌计数

您可以在所选 Gemini API 提供商的文档中找到 Live API 模型的价格信息:Gemini Developer API | Vertex AI Gemini API

无论您使用的是哪个 Gemini API 提供商,Live API支持 Count Tokens API。



您还可以做些什么?

  • 不妨详细了解 Live API 的全套功能,例如以流式传输各种输入模态(音频、文本或视频 + 音频)。

  • 使用各种配置选项(例如添加转写或设置回答语音)自定义实现。

  • 通过让模型访问工具(例如函数调用和依托 Google 搜索实现接地),增强实现效果。有关如何将工具与 Live API 搭配使用的官方文档即将推出!

  • 了解使用 Live API限制和规范,例如会话时长、速率限制、支持的语言等。