Gemini Live API 可处理连续的音频或文本流,这些流称为会话。您可以管理会话生命周期,从初始握手到正常终止。
会话限制
对于 Live API,会话是指通过连接持续流式传输输入和输出的持久连接。
如果会话超出以下任何限制,连接就会终止。不过,Live API 提供了一些选项(见下文)来处理这些与会话相关的限制。
会话上下文窗口的上限为 12.8 万个 token。
由于存在此上下文窗口限制,以下是根据输入模态性得出的会话时长上限近似值:
- 纯音频输入会话的时长上限为
15 分钟 。 - 视频和音频输入时长限制为
2 分钟 。
- 纯音频输入会话的时长上限为
连接时长限制为大约
10 分钟 。在连接结束前约
60 秒 ,您会收到即将断开通知。
以下是一些用于处理与会话相关的限制的选项:
启动会话
如需查看显示如何开始会话的完整代码段,请参阅 Live API 的入门指南。
在会话期间更新
Live API 模型支持以下高级功能,可用于会话中更新:
添加增量内容更新
您可以在有效会话期间添加增量更新。使用此方法可发送文本输入、建立会话上下文或恢复会话上下文。
对于较长的上下文,建议提供单个消息摘要,以释放上下文窗口,以便进行后续互动。
对于简短的上下文,您可以发送精细(导航)互动来表示确切的事件序列,如下面的代码段所示。
Swift
// Define initial turns (history/context).
let turns: [ModelContent] = [
ModelContent(role: "user", parts: [TextPart("What is the capital of France?")]),
ModelContent(role: "model", parts: [TextPart("Paris")]),
]
// Send history, keeping the conversational turn OPEN (false).
await session.sendContent(turns, turnComplete: false)
// Define the new user query.
let newTurn: [ModelContent] = [
ModelContent(role: "user", parts: [TextPart("What is the capital of Germany?")]),
]
// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.sendContent(newTurn, turnComplete: true)
Kotlin
Not yet supported for Android apps - check back soon!
Java
Not yet supported for Android apps - check back soon!
Web
const turns = [{ text: "Hello from the user!" }];
await session.send(
turns,
false // turnComplete: false
);
console.log("Sent history. Waiting for next input...");
// Define the new user query.
const newTurn [{ text: "And what is the capital of Germany?" }];
// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.send(
newTurn,
true // turnComplete: true
);
console.log("Sent final query. Model response expected now.");
Dart
// Define initial turns (history/context).
final List turns = [
Content(
"user",
[Part.text("What is the capital of France?")],
),
Content(
"model",
[Part.text("Paris")],
),
];
// Send history, keeping the conversational turn OPEN (false).
await session.send(
input: turns,
turnComplete: false,
);
// Define the new user query.
final List newTurn = [
Content(
"user",
[Part.text("What is the capital of Germany?")],
),
];
// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.send(
input: newTurn,
turnComplete: true,
);
Unity
// Define initial turns (history/context).
List turns = new List {
new ModelContent("user", new ModelContent.TextPart("What is the capital of France?") ),
new ModelContent("model", new ModelContent.TextPart("Paris") ),
};
// Send history, keeping the conversational turn OPEN (false).
foreach (ModelContent turn in turns)
{
await session.SendAsync(
content: turn,
turnComplete: false
);
}
// Define the new user query.
ModelContent newTurn = ModelContent.Text("What is the capital of Germany?");
// Send the final query, CLOSING the turn (true) to trigger the model response.
await session.SendAsync(
content: newTurn,
turnComplete: true
);
在会话中途更新系统指令
| 仅在将 Vertex AI Gemini API 用作 API 提供方时可用。 |
您可以在活跃会话期间更新系统指令。使用此方法可调整模型的回答,例如更改回答语言或修改语气。
如需在会话中途更新系统指令,您可以使用 system 角色发送文本内容。更新后的系统指令将在剩余会话期间保持有效。
Swift
await session.sendContent(
[ModelContent(
role: "system",
parts: [TextPart("new system instruction")]
)],
turnComplete: false
)
Kotlin
Not yet supported for Android apps - check back soon!
Java
Not yet supported for Android apps - check back soon!
Web
Not yet supported for Web apps - check back soon!
Dart
try {
await _session.send(
input: Content(
'system',
[Part.text('new system instruction')],
),
turnComplete: false,
);
} catch (e) {
print('Failed to update system instructions: $e');
}
Unity
try
{
await session.SendAsync(
content: new ModelContent(
"system",
new ModelContent.TextPart("new system instruction")
),
turnComplete: false
);
}
catch (Exception e)
{
Debug.LogError($"Failed to update system instructions: {e.Message}");
}
压缩上下文窗口
|
点击您的 Gemini API 提供商,以查看此页面上特定于提供商的内容和代码。 |
Live API 会话上下文窗口用于存储实时流式数据(对于音频为每秒 25 个 token [TPS],对于视频为每秒 258 个 token)以及其他内容,包括文本输入和模型输出。所有 Live API 模型的会话上下文窗口限制为 12.8 万个 token。
默认情况下,由于存在此上下文窗口限制,以下是根据输入模态性得出的会话时长上限近似值:
- 纯音频输入会话的时长上限为
15 分钟 。 - 视频和音频输入时长限制为
2 分钟 。
在长时间运行的会话中,随着对话的进行,音频和/或视频令牌的历史记录会不断累积。如果此历史记录超出模型的限制,模型可能会产生幻觉、运行缓慢,或者会话可能会被强制终止。
如需启用更长的会话,您可以将 contextWindowCompression 字段设置为 LiveGenerationConfig,以启用上下文窗口压缩。启用后,服务器会使用滑动窗口机制自动舍弃最旧的对话轮次或对其进行总结,以使上下文大小保持在默认或指定限值内。系统指令不会被舍弃,并且始终位于上下文窗口的开头。
从用户的角度来看,由于“内存”会不断得到管理,因此会话时长理论上是无限的。
您可以配置滑动窗口机制,还可以选择性地配置触发压缩的令牌数量(请参阅下方的可用设置和值)。以下是有关使用这些设置的一些高级注意事项:
将
targetTokens设置得非常低会为连续流释放更多上下文空间,但模型会迅速“忘记”较早的对话轮次。将
targetTokens设置为更接近triggerTokens可保留更多内存,但会更频繁地触发压缩例程。
| 设置 | 如果未在配置中设置,则滑动窗口的默认值 | 最小值 | 最大值 |
|---|---|---|---|
triggerTokens触发压缩之前的上下文长度 |
模型上下文窗口限制的 80% | 5,000 | 128,000 |
targetTokens要保留的目标令牌数量 |
triggerTokens 值的 50%
|
0 | 128,000 |
Swift
// Initialize the Gemini Developer API backend service
let liveModel = FirebaseAI.firebaseAI(backend: .googleAI()).liveModel(
modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
// Enable context window compression.
// (Optional) Configure the number of tokens in the context window that triggers the compression.
generationConfig: LiveGenerationConfig(
responseModalities: [.audio],
contextWindowCompression: ContextWindowCompressionConfig(
triggerTokens: 10000,
slidingWindow: SlidingWindow(
targetTokens: 2000,
)
)
)
)
Kotlin
// Initialize the Gemini Developer API backend service
val liveModel = Firebase.ai(backend = GenerativeBackend.googleAI()).liveModel(
modelName = "gemini-2.5-flash-native-audio-preview-12-2025",
// Enable context window compression.
// (Optional) Configure the number of tokens in the context window that triggers the compression.
generationConfig = liveGenerationConfig {
responseModality = ResponseModality.AUDIO,
contextWindowCompression = ContextWindowCompressionConfig(
triggerTokens = 10000,
slidingWindow = SlidingWindow(targetTokens = 2000)
)
}
)
Java
// Initialize the Gemini Developer API backend service
LiveGenerativeModel lm = FirebaseAI.getInstance(GenerativeBackend.googleAI()).liveModel(
"gemini-2.5-flash-native-audio-preview-12-2025",
// Enable context window compression.
// (Optional) Configure the number of tokens in the context window that triggers the compression.
new LiveGenerationConfig.Builder()
.setResponseModality(ResponseModality.AUDIO)
.setContextWindowCompression(
new ContextWindowCompressionConfig(10000, new SlidingWindow(2000))
)
.build()
);
Web
const ai = getAI(firebaseApp, { backend: new GoogleAIBackend() });
const liveModel = getLiveGenerativeModel(ai, {
model: "gemini-2.5-flash-native-audio-preview-12-2025",
// Enable context window compression.
// (Optional) Configure the number of tokens in the context window that triggers the compression.
generationConfig: {
responseModalities: [ResponseModality.AUDIO],
contextWindowCompression: {
triggerTokens: 10000,
slidingWindow: {
targetTokens: 2000,
},
},
},
});
Dart
final _liveModel = FirebaseAI.googleAI().liveGenerativeModel(
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
// Enable context window compression.
// (Optional) Configure the number of tokens in the context window that triggers the compression.
liveGenerationConfig: LiveGenerationConfig(
responseModalities: [ResponseModalities.audio],
contextWindowCompression: ContextWindowCompressionConfig(
triggerTokens: 10000,
slidingWindow: SlidingWindow(targetTokens: 2000),
),
),
);
Unity
var liveModel = FirebaseAI.GetInstance(FirebaseAI.Backend.GoogleAI()).GetLiveModel(
modelName: "gemini-2.5-flash-native-audio-preview-12-2025",
// Enable context window compression.
// (Optional) Configure the number of tokens in the context window that triggers the compression.
liveGenerationConfig: new LiveGenerationConfig(
responseModalities: new[] { ResponseModality.Audio },
contextWindowCompression: new ContextWindowCompressionConfig(
triggerTokens: 10000,
slidingWindow: new SlidingWindow(targetTokens: 2000)
)
)
);
检测会话何时结束
单个连续 WebSocket 连接的最长持续时间约为
以下示例展示了如何通过监听 going away 通知来检测即将发生的连接终止:
Swift
for try await response in session.responses {
switch response.payload {
case .goingAwayNotice(let goingAwayNotice):
// Prepare for the session to close soon
if let timeLeft = goingAwayNotice.timeLeft {
print("Server going away in \(timeLeft) seconds")
}
}
}
Kotlin
for (response in session.responses) {
when (val message = response.payload) {
is LiveServerGoAway -> {
// Prepare for the session to close soon
val remaining = message.timeLeft
logger.info("Server going away in $remaining")
}
}
}
Java
session.getResponses().forEach(response -> {
if (response.getPayload() instanceof LiveServerResponse.GoingAwayNotice) {
LiveServerResponse.GoingAwayNotice notice = (LiveServerResponse.GoingAwayNotice) response.getPayload();
// Prepare for the session to close soon
Duration timeLeft = notice.getTimeLeft();
}
});
Web
for await (const message of session.receive()) {
switch (message.type) {
...
case "goingAwayNotice":
console.log("Server going away. Time left:", message.timeLeft);
break;
}
}
Dart
Future _handleLiveServerMessage(LiveServerResponse response) async {
final message = response.message;
if (message is GoingAwayNotice) {
// Prepare for the session to close soon
developer.log('Server going away. Time left: ${message.timeLeft}');
}
}
Unity
foreach (var response in session.Responses) {
if (response.Payload is LiveSessionGoingAway notice) {
// Prepare for the session to close soon
TimeSpan timeLeft = notice.TimeLeft;
Debug.Log($"Server going away notice received. Remaining: {timeLeft}");
}
}
继续会话
Live API 支持会话恢复,以防止丢失对话上下文。每个会话都有一个句柄,可用于以下方面:
在达到连接时间限制之前保持会话
单个连续 WebSocket 连接的最长持续时间约为
10 分钟 。您可以通过监听 going away 通知来检测连接何时即将结束,然后使用会话句柄建立新连接来延长会话。在连接断开后立即恢复会话
如果连接在达到最长连接时间限制之前终止或断开(例如从 WiFi 切换到 5G),服务器会将会话状态保留大约
10 分钟 。在此时间范围内,您可以使用会话句柄建立新连接,从而恢复会话。在长时间段后恢复会话
连接结束后,服务器会将会话状态保留几个小时。 在此时间范围内,您可以使用会话句柄建立新连接,从而恢复会话。请注意,对于这两种 Gemini API 提供商,此时间范围有所不同:Gemini Developer API 为
2 小时 | Vertex AI Gemini API 为24 小时 。
默认情况下,会话恢复功能处于停用状态。 如需启用会话恢复,请在建立新连接时传递一个空的恢复配置。启用后,服务器会定期发送包含会话恢复句柄的更新。如果会话断开连接,您可以重新连接并传递此句柄,以恢复上下文保持不变的会话。
以下示例展示了两种恢复会话的方式:
Swift
// Local variable to save the active session handle
var activeSessionHandle: String?
// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = try await liveModel.connect(
sessionResumption: SessionResumptionConfig()
)
// Start receiving responses
for try await message in session.responses {
// Check for new session handles inside your message handling loop
switch message.payload {
case let .sessionResumptionUpdate(updateMessage):
guard let newHandle = updateMessage.newHandle, updateMessage.resumable else {
continue
}
activeSessionHandle = newHandle
print("SessionResumptionUpdate: handle \(newHandle)")
// ... handle other LiveServerMessage types ...
default:
break
}
}
// The following are alternative options to resume a session. Choose only one.
// Option 1: Create and connect a session to resume with the saved handle
if let handle = activeSessionHandle {
session = try await liveModel.connect(
sessionResumption: SessionResumptionConfig(handle: handle)
)
}
// Option 2: Resume the session directly on an existing session object
if let handle = activeSessionHandle {
try await session.resumeSession(
sessionResumption: SessionResumptionConfig(handle: handle)
)
}
Kotlin
// Local variable to save the active session handle
var activeSessionHandle: String? = null
// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = liveModel.connect(
sessionResumption = SessionResumptionConfig()
)
// Start receiving responses
session.receive().collect { message ->
// Process other received response types...
// Check for new session handles inside your message handling loop
if (message is LiveSessionResumptionUpdate) {
if (message.resumable == true && message.newHandle != null) {
activeSessionHandle = message.newHandle
Log.d("TAG", "SessionResumptionUpdate: handle ${message.newHandle}")
}
}
}
// The following are alternative options to resume a session. Choose only one.
// Option 1: Create and connect a session to resume with the saved handle
activeSessionHandle?.let { handle ->
session = liveModel.connect(
sessionResumption = SessionResumptionConfig(handle = handle)
)
}
// Option 2: Resume the session directly on an existing session object
activeSessionHandle?.let { handle ->
session.resumeSession(
sessionResumption = SessionResumptionConfig(handle = handle)
)
}
Java
For Java, session resumption is not yet supported. Check back soon!
Web
// Local variable to save the active session handle
let activeSessionHandle = null;
// Initialize the session. Passing an empty object requests the server to send SessionResumptionUpdate
let session = await liveModel.connect({});
// Start receiving responses
for await (const message of session.receive()) {
// Process other received response types...
// Check for new session handles inside your message handling loop
if (message.type === 'sessionResumptionUpdate') {
if (message.resumable && message.newHandle) {
activeSessionHandle = message.newHandle;
console.log(`SessionResumptionUpdate: handle ${activeSessionHandle}`);
}
}
}
// The following are alternative options to resume a session. Choose only one.
// Option 1: Create and connect a session to resume with the saved handle
if (activeSessionHandle) {
session = await liveModel.connect({
handle: activeSessionHandle
});
}
// Option 2: Resume the session directly on an existing session object
if (activeSessionHandle) {
await session.resumeSession({
handle: activeSessionHandle
});
}
Dart
// Local variable to save the active session handle
String? _activeSessionHandle;
// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var _session = await _liveModel.connect(
sessionResumption: SessionResumptionConfig(),
);
// Start receiving responses
await for (final message in _session.receive()) {
// Process other received response types...
// Check for new session handles inside your message handling loop
if (message is SessionResumptionUpdate &&
message.resumable != null &&
message.resumable!) {
_activeSessionHandle = message.newHandle;
log('SessionResumptionUpdate: handle ${message.newHandle}');
}
}
// The following are alternative options to resume a session. Choose only one.
// Option 1: Create and connect a session to resume with the saved handle
if (_activeSessionHandle != null) {
_session = await _liveModel.connect(
sessionResumption: SessionResumptionConfig.resume(_activeSessionHandle!),
);
}
// Option 2: Alternatively, resume the session directly on an existing session object
if (_activeSessionHandle != null) {
await _session.resumeSession(
sessionResumption: SessionResumptionConfig.resume(_activeSessionHandle!),
);
}
Unity
// Local variable to save the active session handle
string activeSessionHandle = null;
// Initialize the session. Passing an empty config requests the server to send SessionResumptionUpdate
var session = await liveModel.ConnectAsync(
sessionResumption: new SessionResumptionConfig()
);
// Start receiving responses
await foreach (var response in session.ReceiveAsync())
{
// Process other received response types...
// Check for new session handles inside your message handling loop
if (response.Message is LiveSessionResumptionUpdate updateMessage)
{
if (updateMessage.Resumable == true && !string.IsNullOrEmpty(updateMessage.NewHandle))
{
activeSessionHandle = updateMessage.NewHandle;
Debug.Log($"SessionResumptionUpdate: handle {activeSessionHandle}");
}
}
}
// The following are alternative options to resume a session. Choose only one.
// Option 1: Create and connect a session to resume with the saved handle
if (!string.IsNullOrEmpty(activeSessionHandle)) {
session = await liveModel.ConnectAsync(
sessionResumption: new SessionResumptionConfig(activeSessionHandle)
);
}
// Option 2: Resume the session directly on an existing session object
if (!string.IsNullOrEmpty(activeSessionHandle)) {
await session.ResumeSessionAsync(
sessionResumption: new SessionResumptionConfig(activeSessionHandle)
);
}