此页面由 Cloud Translation API 翻译。

与 PDF 文件聊天

本教程演示了如何构建一个对话式应用，让用户能够使用自然语言从 PDF 文档中提取信息。

设置项目
导入所需的依赖项
配置 Genkit 和默认模型
加载和解析 PDF 文件
设置提示
实现界面
实现聊天循环
运行应用

前提条件

在开始工作之前，您应先设置以下前提条件：

实现步骤

设置依赖项后，您就可以构建项目了。

1. 设置项目

创建一个目录结构和一个文件来存放源代码。

$ mkdir -p chat-with-a-pdf/src && \
cd chat-with-a-pdf && \
touch src/index.ts

初始化一个新的 TypeScript 项目。
```
$ npm init -y
```

安装 pdf-parse 模块。

$ npm i pdf-parse && npm i -D @types/pdf-parse

安装以下 Genkit 依赖项，以便在项目中使用 Genkit：
```
$ npm i genkit @genkit-ai/googleai
```
- genkit 提供 Genkit 核心功能。
- @genkit-ai/googleai 提供对 Google AI Gemini 模型的访问权限。
获取和配置模型 API 密钥

如需使用本教程中使用的 Gemini API，您必须先配置 API 密钥。如果您还没有 API 密钥，请在 Google AI Studio 中创建密钥。

Gemini API 提供较大的免费使用额度，开通时无需信用卡。

创建 API 密钥后，使用以下命令将 GEMINI_API_KEY 环境变量设置为您的密钥：
```
$ export GEMINI_API_KEY=<your API key>
```
注意：虽然本教程使用的是 AI Studio 中的 Gemini API，但 Genkit 支持各种模型提供程序，包括：
- Vertex AI 中的 Gemini。
- 通过 Vertex AI Model Garden 以及社区插件使用 Anthropic 的 Claude 3 模型和 Llama 3.1。
- 通过 Ollama 使用开源模型。
- 社区支持的提供商，例如 OpenAI 和 Cohere。

2. 导入所需的依赖项

在您创建的 index.ts 文件中，添加以下代码行以导入此项目所需的依赖项：

import { gemini20Flash, googleAI } from '@genkit-ai/googleai';
import { genkit } from 'genkit/beta'; // chat is a beta feature
import pdf from 'pdf-parse';
import fs from 'fs';
import { createInterface } from "node:readline/promises";

第一行会从 @genkit-ai/googleai 软件包导入 gemini20Flash 模型和 googleAI 插件，以便访问 Google 的 Gemini 模型。
接下来的两行代码导入了用于解析 PDF 文件的 pdf-parse 库，以及用于文件系统操作的 fs 模块。
最后一行会从 node:readline/promises 模块导入 createInterface 函数，该函数用于创建用于用户互动的命令行界面。

3. 配置 Genkit 和默认模型

添加以下代码行以配置 Genkit 并将 Gemini 2.0 Flash 设置为默认模型。

const ai = genkit({
  plugins: [googleAI()],
  model: gemini20Flash,
});

然后，您可以为代码和错误处理添加框架。

(async () => {
  try {
    // Step 1: get command line arguments

    // Step 2: load PDF file

    // Step 3: construct prompt

    // Step 4: start chat

    // Step 5: chat loop

  } catch (error) {
    console.error("Error parsing PDF or interacting with Genkit:", error);
  }
})(); // <-- don't forget the trailing parentheses to call the function!

4. 加载和解析 PDF

添加代码以读取从命令行传入的 PDF 文件名。

    // Step 1: get command line arguments
    const filename = process.argv[2];
    if (!filename) {
      console.error("Please provide a filename as a command line argument.");
      process.exit(1);
    }

添加代码以加载 PDF 文件的内容。

    // Step 2: load PDF file
    let dataBuffer = fs.readFileSync(filename);
    const { text } = await pdf(dataBuffer);

5. 设置提示

添加代码以设置提示：

    // Step 3: construct prompt
    const prefix = process.argv[3] || "Sample prompt: Answer the user's questions about the contents of this PDF file.";
    const prompt = `
      ${prefix}
      Context:
      ${text}
    `;

如果用户未从命令行传入自己的提示，第一个 const 声明会定义默认提示。
第二个 const 声明会将提示前缀和 PDF 文件的完整文本插入到模型的提示中。

6. 实现界面

添加以下代码以启动聊天并实现界面：

    // Step 4: start chat
    const chat = ai.chat({ system: prompt });
    const readline = createInterface(process.stdin, process.stdout);
    console.log("You're chatting with Gemini. Ctrl-C to quit.\n");

第一个 const 声明通过调用 chat 方法传递提示（其中包含 PDF 文件的完整文本），从而与模型开始聊天。其余代码会实例化文本输入，然后向用户显示消息。

7. 实现聊天循环

在第 5 步下方，添加代码以接收用户输入，并使用 chat.send 将该输入发送到模型。应用的这一部分会循环运行，直到用户按 CTRL + C 为止。

    // Step 5: chat loop
    while (true) {
      const userInput = await readline.question("> ");
      const { text } = await chat.send(userInput);
      console.log(text);
    }

8. 运行应用

如需运行应用，请在项目的根文件夹中打开终端，然后运行以下命令：

npx tsx src/index.ts path/to/some.pdf

然后，您就可以开始与 PDF 文件聊天了。