在 iOS 上使用 Firebase ML 來辨識圖片中的文字

你可以使用 Firebase ML 辨識圖片中的文字。Firebase ML 包含一般用途的 API，適用於辨識圖片中的文字 (例如路標上的文字)，以及專為辨識文件文字而設計的 API。

事前準備

使用 Swift Package Manager 安裝及管理 Firebase 依附元件。

在 Xcode 中保持開啟應用程式專案，然後依序點選「File」(檔案) 和「Add Packages」(新增 Package)。
系統提示時，請新增 Firebase Apple 平台 SDK 存放區：

  https://github.com/firebase/firebase-ios-sdk.git

選擇 Firebase ML 程式庫。
將 -ObjC 標記新增至目標建構設定的「Other Linker Flags」部分。
完成後，Xcode 會自動開始在背景中解析並下載依附元件。

接著，請在應用程式中進行一些設定：

在應用程式中匯入 Firebase：

Swift

import FirebaseMLModelDownloader

Objective-C

@import FirebaseMLModelDownloader;

如果尚未為專案啟用雲端 API，請立即啟用：
1. 在 Firebase 控制台中開啟 Firebase ML 「APIs」頁面。
2. 如果尚未將專案升級至即付即用 Blaze 定價方案，請按一下「升級」。只有在專案未採用 Blaze 定價方案時，系統才會提示您升級。
  
  只有採用 Blaze 定價方案的專案才能使用雲端 API。
3. 如果尚未啟用雲端 API，請按一下「啟用雲端 API」。
在將使用 Cloud API 的應用程式部署至正式環境之前，請先採取一些額外步驟，防範未經授權的 API 存取活動，並減輕其影響。

現在可以開始辨識圖片中的文字。

輸入圖片規範

如要讓 Firebase ML 準確辨識文字，輸入圖片必須包含以足夠像素資料呈現的文字。在理想情況下，拉丁文字的每個字元至少應為 16x16 像素。如果是中文、日文和韓文文字，每個字元應為 24x24 像素。一般而言，無論使用哪種語言，字元大於 24x24 像素對準確度沒有幫助。

舉例來說，如果名片佔滿圖片寬度，640x480 的圖片可能就非常適合掃描。如要掃描印在 Letter 尺寸紙張上的文件，可能需要 720x1280 像素的圖片。
如果圖片對焦不佳，可能會影響文字辨識準確度。如果結果不盡理想，請要求使用者重新拍攝圖片。

辨識圖片中的文字

如要辨識圖片中的文字，請按照下文說明執行文字辨識器。

1. 執行文字辨識器

將圖片做為 UIImage 或 CMSampleBufferRef 傳遞至 VisionTextRecognizer 的 process(_:completion:) 方法：

呼叫 cloudTextRecognizer 即可取得 VisionTextRecognizer 的執行個體：

Swift

let vision = Vision.vision()
let textRecognizer = vision.cloudTextRecognizer()

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
let options = VisionCloudTextRecognizerOptions()
options.languageHints = ["en", "hi"]
let textRecognizer = vision.cloudTextRecognizer(options: options)

Objective-C

FIRVision *vision = [FIRVision vision];
FIRVisionTextRecognizer *textRecognizer = [vision cloudTextRecognizer];

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
FIRVisionCloudTextRecognizerOptions *options =
        [[FIRVisionCloudTextRecognizerOptions alloc] init];
options.languageHints = @[@"en", @"hi"];
FIRVisionTextRecognizer *textRecognizer = [vision cloudTextRecognizerWithOptions:options];

如要呼叫 Cloud Vision，圖片必須採用 Base64 編碼字串格式。如要處理 UIImage，請按照下列步驟操作：

Swift

guard let imageData = uiImage.jpegData(compressionQuality: 1.0) else { return }
let base64encodedImage = imageData.base64EncodedString()

Objective-C

NSData *imageData = UIImageJPEGRepresentation(uiImage, 1.0f);
NSString *base64encodedImage =
  [imageData base64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];

接著，將圖片傳遞至 process(_:completion:) 方法：

Swift

textRecognizer.process(visionImage) { result, error in
  guard error == nil, let result = result else {
    // ...
    return
  }

  // Recognized text
}

Objective-C

[textRecognizer processImage:image
                  completion:^(FIRVisionText *_Nullable result,
                               NSError *_Nullable error) {
  if (error != nil || result == nil) {
    // ...
    return;
  }

  // Recognized text
}];

2. 從辨識出的文字區塊擷取文字

如果文字辨識作業成功，系統會傳回 VisionText 物件。VisionText 物件包含圖片中辨識到的完整文字，以及零或多個 VisionTextBlock 物件。

每個 VisionTextBlock 代表一個矩形文字區塊，其中包含零或多個 VisionTextLine 物件。每個 VisionTextLine 物件都包含零或多個 VisionTextElement 物件，代表字詞和類似字詞的實體 (日期、數字等)。

針對每個 VisionTextBlock、VisionTextLine 和 VisionTextElement 物件，您可以取得該區域中辨識的文字，以及該區域的邊界座標。

例如：

Swift

let resultText = result.text
for block in result.blocks {
    let blockText = block.text
    let blockConfidence = block.confidence
    let blockLanguages = block.recognizedLanguages
    let blockCornerPoints = block.cornerPoints
    let blockFrame = block.frame
    for line in block.lines {
        let lineText = line.text
        let lineConfidence = line.confidence
        let lineLanguages = line.recognizedLanguages
        let lineCornerPoints = line.cornerPoints
        let lineFrame = line.frame
        for element in line.elements {
            let elementText = element.text
            let elementConfidence = element.confidence
            let elementLanguages = element.recognizedLanguages
            let elementCornerPoints = element.cornerPoints
            let elementFrame = element.frame
        }
    }
}

Objective-C

NSString *resultText = result.text;
for (FIRVisionTextBlock *block in result.blocks) {
  NSString *blockText = block.text;
  NSNumber *blockConfidence = block.confidence;
  NSArray<FIRVisionTextRecognizedLanguage *> *blockLanguages = block.recognizedLanguages;
  NSArray<NSValue *> *blockCornerPoints = block.cornerPoints;
  CGRect blockFrame = block.frame;
  for (FIRVisionTextLine *line in block.lines) {
    NSString *lineText = line.text;
    NSNumber *lineConfidence = line.confidence;
    NSArray<FIRVisionTextRecognizedLanguage *> *lineLanguages = line.recognizedLanguages;
    NSArray<NSValue *> *lineCornerPoints = line.cornerPoints;
    CGRect lineFrame = line.frame;
    for (FIRVisionTextElement *element in line.elements) {
      NSString *elementText = element.text;
      NSNumber *elementConfidence = element.confidence;
      NSArray<FIRVisionTextRecognizedLanguage *> *elementLanguages = element.recognizedLanguages;
      NSArray<NSValue *> *elementCornerPoints = element.cornerPoints;
      CGRect elementFrame = element.frame;
    }
  }
}

後續步驟

在正式環境中部署使用 Cloud API 的應用程式之前，請先採取幾個額外步驟，防範未經授權的 API 存取活動，並減輕其影響。

辨識文件圖片中的文字

如要辨識文件中的文字，請按照下列說明設定及執行文件文字辨識器。

下文所述的文件文字辨識 API 提供介面，可更輕鬆地處理文件圖片。不過，如果您偏好稀疏文字 API 提供的介面，可以改用該介面掃描文件，方法是將雲端文字辨識器設定為使用密集文字模型。

如要使用文件文字辨識 API，請按照下列指示操作：

1. 執行文字辨識器

將圖片做為 UIImage 或 CMSampleBufferRef 傳遞至 VisionDocumentTextRecognizer 的 process(_:completion:) 方法：

呼叫 cloudDocumentTextRecognizer 即可取得 VisionDocumentTextRecognizer 的執行個體：

Swift

let vision = Vision.vision()
let textRecognizer = vision.cloudDocumentTextRecognizer()

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
let options = VisionCloudDocumentTextRecognizerOptions()
options.languageHints = ["en", "hi"]
let textRecognizer = vision.cloudDocumentTextRecognizer(options: options)

Objective-C

FIRVision *vision = [FIRVision vision];
FIRVisionDocumentTextRecognizer *textRecognizer = [vision cloudDocumentTextRecognizer];

// Or, to provide language hints to assist with language detection:
// See https://cloud.google.com/vision/docs/languages for supported languages
FIRVisionCloudDocumentTextRecognizerOptions *options =
        [[FIRVisionCloudDocumentTextRecognizerOptions alloc] init];
options.languageHints = @[@"en", @"hi"];
FIRVisionDocumentTextRecognizer *textRecognizer = [vision cloudDocumentTextRecognizerWithOptions:options];

如要呼叫 Cloud Vision，圖片必須採用 Base64 編碼字串格式。如要處理 UIImage，請按照下列步驟操作：

Swift

guard let imageData = uiImage.jpegData(compressionQuality: 1.0) else { return }
let base64encodedImage = imageData.base64EncodedString()

Objective-C

NSData *imageData = UIImageJPEGRepresentation(uiImage, 1.0f);
NSString *base64encodedImage =
  [imageData base64EncodedStringWithOptions:NSDataBase64Encoding76CharacterLineLength];

接著，將圖片傳遞至 process(_:completion:) 方法：

Swift

textRecognizer.process(visionImage) { result, error in
  guard error == nil, let result = result else {
    // ...
    return
  }

  // Recognized text
}

Objective-C

[textRecognizer processImage:image
                  completion:^(FIRVisionDocumentText *_Nullable result,
                               NSError *_Nullable error) {
  if (error != nil || result == nil) {
    // ...
    return;
  }

    // Recognized text
}];

2. 從辨識出的文字區塊擷取文字

如果文字辨識作業成功，系統會傳回 VisionDocumentText 物件。VisionDocumentText 物件包含圖片中辨識到的完整文字，以及反映辨識文件結構的物件階層：

針對每個 VisionDocumentTextBlock、VisionDocumentTextParagraph、VisionDocumentTextWord 和 VisionDocumentTextSymbol 物件，您可以取得該區域中辨識到的文字，以及該區域的邊界座標。

例如：

Swift

let resultText = result.text
for block in result.blocks {
    let blockText = block.text
    let blockConfidence = block.confidence
    let blockRecognizedLanguages = block.recognizedLanguages
    let blockBreak = block.recognizedBreak
    let blockCornerPoints = block.cornerPoints
    let blockFrame = block.frame
    for paragraph in block.paragraphs {
        let paragraphText = paragraph.text
        let paragraphConfidence = paragraph.confidence
        let paragraphRecognizedLanguages = paragraph.recognizedLanguages
        let paragraphBreak = paragraph.recognizedBreak
        let paragraphCornerPoints = paragraph.cornerPoints
        let paragraphFrame = paragraph.frame
        for word in paragraph.words {
            let wordText = word.text
            let wordConfidence = word.confidence
            let wordRecognizedLanguages = word.recognizedLanguages
            let wordBreak = word.recognizedBreak
            let wordCornerPoints = word.cornerPoints
            let wordFrame = word.frame
            for symbol in word.symbols {
                let symbolText = symbol.text
                let symbolConfidence = symbol.confidence
                let symbolRecognizedLanguages = symbol.recognizedLanguages
                let symbolBreak = symbol.recognizedBreak
                let symbolCornerPoints = symbol.cornerPoints
                let symbolFrame = symbol.frame
            }
        }
    }
}

Objective-C

NSString *resultText = result.text;
for (FIRVisionDocumentTextBlock *block in result.blocks) {
  NSString *blockText = block.text;
  NSNumber *blockConfidence = block.confidence;
  NSArray<FIRVisionTextRecognizedLanguage *> *blockRecognizedLanguages = block.recognizedLanguages;
  FIRVisionTextRecognizedBreak *blockBreak = block.recognizedBreak;
  CGRect blockFrame = block.frame;
  for (FIRVisionDocumentTextParagraph *paragraph in block.paragraphs) {
    NSString *paragraphText = paragraph.text;
    NSNumber *paragraphConfidence = paragraph.confidence;
    NSArray<FIRVisionTextRecognizedLanguage *> *paragraphRecognizedLanguages = paragraph.recognizedLanguages;
    FIRVisionTextRecognizedBreak *paragraphBreak = paragraph.recognizedBreak;
    CGRect paragraphFrame = paragraph.frame;
    for (FIRVisionDocumentTextWord *word in paragraph.words) {
      NSString *wordText = word.text;
      NSNumber *wordConfidence = word.confidence;
      NSArray<FIRVisionTextRecognizedLanguage *> *wordRecognizedLanguages = word.recognizedLanguages;
      FIRVisionTextRecognizedBreak *wordBreak = word.recognizedBreak;
      CGRect wordFrame = word.frame;
      for (FIRVisionDocumentTextSymbol *symbol in word.symbols) {
        NSString *symbolText = symbol.text;
        NSNumber *symbolConfidence = symbol.confidence;
        NSArray<FIRVisionTextRecognizedLanguage *> *symbolRecognizedLanguages = symbol.recognizedLanguages;
        FIRVisionTextRecognizedBreak *symbolBreak = symbol.recognizedBreak;
        CGRect symbolFrame = symbol.frame;
      }
    }
  }
}

後續步驟

在正式環境中部署使用 Cloud API 的應用程式之前，請先採取幾個額外步驟，防範未經授權的 API 存取活動，並減輕其影響。

在 iOS 上使用 Firebase ML 來辨識圖片中的文字 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

事前準備

Swift

Objective-C

輸入圖片規範

辨識圖片中的文字

1. 執行文字辨識器

Swift

Objective-C

Swift

Objective-C

Swift

Objective-C

2. 從辨識出的文字區塊擷取文字

Swift

Objective-C

後續步驟

辨識文件圖片中的文字

1. 執行文字辨識器

Swift

Objective-C

Swift

Objective-C

Swift

Objective-C

2. 從辨識出的文字區塊擷取文字

Swift

Objective-C

後續步驟

在 iOS 上使用 Firebase ML 來辨識圖片中的文字