ベクターエンベディングを使用した探索

このページでは、Cloud Firestore を使用して、次の手法で K 最近傍（KNN）ベクター検索を行う方法について説明します。

ベクター値を保存する
KNN ベクトルインデックスを作成して管理する
サポートされているベクトル距離関数のいずれかを使用して K 最近傍（KNN）クエリを実行する

ベクトルエンベディングを保存する

Cloud Firestore データからテキストエンベディングなどのベクター値を作成し、Cloud Firestore ドキュメントに保存できます。

ベクターエンベディングを使用した書き込みオペレーション

次の例は、ベクターエンベディングを Cloud Firestore ドキュメントに保存する方法を示しています。

Python

from google.cloud import firestore
from google.cloud.firestore_v1.vector import Vector

firestore_client = firestore.Client()
collection = firestore_client.collection("coffee-beans")
doc = {
  "name": "Kahawa coffee beans",
  "description": "Information about the Kahawa coffee beans.",
  "embedding_field": Vector([1.0 , 2.0, 3.0])
}

collection.add(doc)

Node.js

import {
  Firestore,
  FieldValue,
} from "@google-cloud/firestore";

const db = new Firestore();
const coll = db.collection('coffee-beans');
await coll.add({
  name: "Kahawa coffee beans",
  description: "Information about the Kahawa coffee beans.",
  embedding_field: FieldValue.vector([1.0 , 2.0, 3.0])
});

Cloud Functions の関数を使用してベクターエンベディングを計算する

ドキュメントが更新または作成されるたびにベクターエンベディングを計算して保存するには、Cloud Functions の関数を設定します。

Python

@functions_framework.cloud_event
def store_embedding(cloud_event) -> None:
  """Triggers by a change to a Firestore document.
  """
  firestore_payload = firestore.DocumentEventData()
  payload = firestore_payload._pb.ParseFromString(cloud_event.data)

  collection_id, doc_id = from_payload(payload)
  # Call a function to calculate the embedding
  embedding = calculate_embedding(payload)
  # Update the document
  doc = firestore_client.collection(collection_id).document(doc_id)
  doc.set({"embedding_field": embedding}, merge=True)

Node.js

/**
 * A vector embedding will be computed from the
 * value of the `content` field. The vector value
 * will be stored in the `embedding` field. The
 * field names `content` and `embedding` are arbitrary
 * field names chosen for this example.
 */
async function storeEmbedding(event: FirestoreEvent<any>): Promise<void> {
  // Get the previous value of the document's `content` field.
  const previousDocumentSnapshot = event.data.before as QueryDocumentSnapshot;
  const previousContent = previousDocumentSnapshot.get("content");

  // Get the current value of the document's `content` field.
  const currentDocumentSnapshot = event.data.after as QueryDocumentSnapshot;
  const currentContent = currentDocumentSnapshot.get("content");

  // Don't update the embedding if the content field did not change
  if (previousContent === currentContent) {
    return;
  }

  // Call a function to calculate the embedding for the value
  // of the `content` field.
  const embeddingVector = calculateEmbedding(currentContent);

  // Update the `embedding` field on the document.
  await currentDocumentSnapshot.ref.update({
    embedding: embeddingVector,
  });
}

ベクトルインデックスを作成して管理する

ベクトルエンベディングで最近傍検索を実行するには、対応するインデックスを作成する必要があります。次の例は、ベクターインデックスの作成方法と管理方法を示しています。

単一フィールドベクターインデックスを作成する

単一フィールドベクターインデックスを作成するには、gcloud alpha firestore indexes composite create を使用します。

gcloud

gcloud alpha firestore indexes composite create \
--collection-group=collection-group \
--query-scope=COLLECTION \
--field-config field-path=vector-field,vector-config='vector-configuration' \
--database=database-id

ここで

collection-group は、コレクショングループの ID です。
vector-field は、ベクターエンベディングを含むフィールドの名前です。
database-id は、データベースの ID です。
vector-configuration には、ベクトル dimension とインデックスタイプが含まれます。 dimension は、2,048 までの整数です。インデックスのタイプは flat にする必要があります。インデックス構成を {"dimension":"DIMENSION", "flat": "{}"} 形式にします。

複合ベクターインデックスを作成する

次の例では、フィールド color とベクターエンベディングフィールドの複合ベクターインデックスを作成します。

gcloud

gcloud alpha firestore indexes composite create \
--collection-group=collection-group \
--query-scope=COLLECTION \
--field-config=order=ASCENDING,field-path="color" \
--field-config field-path=field,vector-config='{"dimension":"1024", "flat": "{}"}' \
--database=database-id

すべてのベクターインデックスを一覧表示する

gcloud

gcloud alpha firestore indexes composite list --database=database-id

database-id は、データベースの ID に置き換えます。

ベクトルインデックスを削除する

gcloud

gcloud alpha firestore indexes composite delete index-id --database=database-id

ここで

index-id は、削除するインデックスの ID です。 indexes composite list を使用してインデックス ID を取得します。
database-id は、データベースの ID です。

ベクターインデックスについて説明する

gcloud

gcloud alpha firestore indexes composite describe index-id --database=database-id

ここで

index-id は、説明するインデックスの ID です。 indexes composite list を使用してインデックス ID を取得します。
database-id は、データベースの ID です。

最近傍クエリを作成する

類似度検索を実行して、ベクターエンベディングの最近傍を見つけることができます。類似性検索にはベクターインデックスが必要です。インデックスが存在しない場合は、Cloud Firestore によって、gCloud CLI を使用して作成するインデックスが提案されます。

Python

from google.cloud.firestore_v1.base_vector_query import DistanceMeasure

collection = collection("coffee-beans")

# Requires vector index
collection.find_nearest(
   vector_field="embedding_field",
   query_vector=Vector([3.0, 1.0, 2.0]),
   distance_measure=DistanceMeasure.EUCLIDEAN,
   limit=5)

Node.js

import {
  Firestore,
  FieldValue,
  VectorQuery,
  VectorQuerySnapshot,
} from "@google-cloud/firestore";

// Requires single-field vector index
const vectorQuery: VectorQuery = coll.findNearest('embedding_field', FieldValue.vector([3.0, 1.0, 2.0]), {
  limit: 5,
  distanceMeasure: 'EUCLIDEAN'
});

const vectorQuerySnapshot: VectorQuerySnapshot = await vectorQuery.get();

ベクター距離

最近傍クエリでは、ベクトル距離に関する次のオプションがサポートされています。

EUCLIDEAN: ベクター間のユークリッド距離を測定します。詳細については、ユークリッドをご覧ください。
COSINE: ベクター間の角度に基づいてベクターを比較します。これにより、ベクターの大きさに基づかない類似性を測定できます。コサイン距離ではなく、単位正規化ベクターを使用して DOT_PRODUCT を使用することをおすすめします。数学的には、パフォーマンスが向上します。詳しくは、コサイン類似度をご覧ください。
DOT_PRODUCT: COSINE に似ていますが、ベクターの大きさの影響を受けます。詳細については、ドット積をご覧ください。

データの事前フィルタリング

最近傍を検索する前にデータを事前フィルタリングするには、類似度検索を不等式フィルタ以外のフィルタと組み合わせます。and と or の複合フィルタがサポートされています。フィールドフィルタの場合、以下のフィルタがサポートされています。

== 等しい
in
array_contains
array_contains_any

Python

# Similarity search with pre-filter
# Requires composite vector index
collection.where("color", "==", "red").find_nearest(
   vector_field="embedding_field",
   query_vector=Vector([3.0, 1.0, 2.0]),
   distance_measure=DistanceMeasure.EUCLIDEAN,
   limit=5)

Node.js

// Similarity search with pre-filter
// Requires composite vector index
const preFilteredVectorQuery: VectorQuery = coll
  .where("color", "==", "red")
  .findNearest("embedding_field", FieldValue.vector([3.0, 1.0, 2.0]), {
    limit: 5,
    distanceMeasure: "EUCLIDEAN",
  });

vectorQueryResults = await preFilteredVectorQuery.get();

制限事項

ベクターエンベディングを使用する場合は、次の制限事項に注意してください。

サポートされているエンべディングディメンションの最大値は 2,048 です。サイズの大きいインデックスを保存するには、次元削減を使用します。
最近傍クエリから返すドキュメントの最大数は 1,000 です。
ベクター検索は、リアルタイムスナップショットリスナーをサポートしていません。
不等式フィルタを使用してデータを事前にフィルタすることはできません。
ベクター検索をサポートしているのは、Python と Node.js のクライアントライブラリのみです。

次のステップ

Cloud Firestore のベストプラクティスを確認する。
大規模な読み取りと書き込みについて

ベクター エンベディングを使用した探索

ベクトル エンベディングを保存する

ベクター エンベディングを使用した書き込みオペレーション

Python

Node.js

Cloud Functions の関数を使用してベクター エンベディングを計算する

Python

Node.js

ベクトル インデックスを作成して管理する

単一フィールド ベクター インデックスを作成する

gcloud

複合ベクター インデックスを作成する

gcloud

すべてのベクター インデックスを一覧表示する

gcloud

ベクトル インデックスを削除する

gcloud

ベクター インデックスについて説明する

gcloud

最近傍クエリを作成する

Python

Node.js

ベクター距離

データの事前フィルタリング

Python

Node.js

制限事項

次のステップ

ベクターエンベディングを使用した探索

ベクトルエンベディングを保存する

ベクターエンベディングを使用した書き込みオペレーション

Cloud Functions の関数を使用してベクターエンベディングを計算する

ベクトルインデックスを作成して管理する

単一フィールドベクターインデックスを作成する

複合ベクターインデックスを作成する

すべてのベクターインデックスを一覧表示する

ベクトルインデックスを削除する

ベクターインデックスについて説明する