本頁面由 Cloud Translation API 翻譯而成。

與 PDF 檔案對話

本教學課程將示範如何建構對話式應用程式，讓使用者能使用自然語言從 PDF 文件中擷取資訊。

設定專案
匯入必要的依附元件
設定 Genkit 和預設模型
載入及剖析 PDF 檔案
設定提示
實作 UI
實作即時通訊迴圈
執行應用程式

事前準備

開始工作前，請先設定下列先決條件：

導入步驟

設定相依項目後，您就可以建構專案。

1. 設定專案

建立目錄結構和檔案，用於儲存原始碼。

$ mkdir -p chat-with-a-pdf/src && \
cd chat-with-a-pdf && \
touch src/index.ts

初始化新的 TypeScript 專案。
```
$ npm init -y
```

安裝 pdf-parse 模組。

$ npm i pdf-parse && npm i -D @types/pdf-parse

安裝下列 Genkit 依附元件，以便在專案使用 Genkit：
```
$ npm i genkit @genkit-ai/googleai
```
- genkit 提供 Genkit 核心功能。
- @genkit-ai/googleai 可讓您存取 Google AI Gemini 模型。
取得及設定模型 API 金鑰

如要使用本教學課程使用的 Gemini API，您必須先設定 API 金鑰。如果還沒有金鑰，請在 Google AI Studio 建立金鑰。

Gemini API 提供大量免費方案配額，不需要信用卡即可開始使用。

建立 API 金鑰後，請使用下列指令將 GEMINI_API_KEY 環境變數設為該組金鑰：
```
$ export GEMINI_API_KEY=<your API key>
```
注意：雖然本教學課程使用 AI Studio 的 Gemini API，但 Genkit 支援多種模型供應商，包括：
- Vertex AI 的 Gemini。
- 透過 Vertex AI Model Garden 和社群外掛程式，使用 Anthropic 的 Claude 3 模型和 Llama 3.1。
- 透過 Ollama 開放原始碼模型。
- 社群支援的供應商，例如 OpenAI 和 Cohere。

2. 匯入必要的依附元件

在您建立的 index.ts 檔案中，新增下列指令列，以匯入這個專案所需的依附元件：

import { gemini20Flash, googleAI } from '@genkit-ai/googleai';
import { genkit } from 'genkit/beta'; // chat is a beta feature
import pdf from 'pdf-parse';
import fs from 'fs';
import { createInterface } from "node:readline/promises";

第一行會從 @genkit-ai/googleai 套件匯入 gemini20Flash 模型和 googleAI 外掛程式，讓您可以存取 Google 的 Gemini 模型。
接下來的兩行程式碼會匯入 pdf-parse 程式庫，用於剖析 PDF 檔案，以及用於檔案系統作業的 fs 模組。
最後一行會從 node:readline/promises 模組匯入 createInterface 函式，用於建立用於使用者互動的指令列介面。

3. 設定 Genkit 和預設模型

加入以下行程式碼來設定 Genkit，並將 Gemini 2.0 Flash 設為預設模型。

const ai = genkit({
  plugins: [googleAI()],
  model: gemini20Flash,
});

接著，您可以為程式碼和錯誤處理新增骨架。

(async () => {
  try {
    // Step 1: get command line arguments

    // Step 2: load PDF file

    // Step 3: construct prompt

    // Step 4: start chat

    // Step 5: chat loop

  } catch (error) {
    console.error("Error parsing PDF or interacting with Genkit:", error);
  }
})(); // <-- don't forget the trailing parentheses to call the function!

4. 載入及剖析 PDF

新增程式碼，讀取從指令列傳入的 PDF 檔案名稱。

    // Step 1: get command line arguments
    const filename = process.argv[2];
    if (!filename) {
      console.error("Please provide a filename as a command line argument.");
      process.exit(1);
    }

新增程式碼來載入 PDF 檔案的內容。

    // Step 2: load PDF file
    let dataBuffer = fs.readFileSync(filename);
    const { text } = await pdf(dataBuffer);

5. 設定提示

新增程式碼來設定提示：

    // Step 3: construct prompt
    const prefix = process.argv[3] || "Sample prompt: Answer the user's questions about the contents of this PDF file.";
    const prompt = `
      ${prefix}
      Context:
      ${text}
    `;

如果使用者未透過指令列傳入自訂提示，第一個 const 宣告會定義預設提示。
第二個 const 宣告會將提示前置字串和 PDF 檔案的完整文字，插入模型的提示中。

6. 實作 UI

新增下列程式碼，啟動聊天並實作 UI：

    // Step 4: start chat
    const chat = ai.chat({ system: prompt });
    const readline = createInterface(process.stdin, process.stdout);
    console.log("You're chatting with Gemini. Ctrl-C to quit.\n");

第一個 const 宣告會透過呼叫 chat 方法，傳遞提示 (包含 PDF 檔案的完整文字)，啟動與模型的對話。程式碼的其餘部分會將文字輸入內容例項化，然後向使用者顯示訊息。

7. 實作即時通訊迴圈

在步驟 5 下方，新增程式碼來接收使用者輸入內容，並使用 chat.send 將該輸入內容傳送至模型。應用程式會重複執行這部分的程式碼，直到使用者按下 CTRL + C 為止。

    // Step 5: chat loop
    while (true) {
      const userInput = await readline.question("> ");
      const { text } = await chat.send(userInput);
      console.log(text);
    }

8. 執行應用程式

如要執行應用程式，請在專案的根目錄中開啟終端機，然後執行下列指令：

npx tsx src/index.ts path/to/some.pdf

接著，您就可以開始與 PDF 檔案對話。