在 Android 上使用 ML Kit 辨識圖片中的文字

你可以使用 ML Kit 辨識圖片中的文字。ML Kit 會為 適合辨識圖片文字的一般用途 API,例如 路標文字,以及經過最佳化調整,可辨識 文件。一般用途 API 同時具備裝置和雲端模型。 文件文字辨識功能僅適用於雲端式模型。詳情請參閱 總覽,方便您比較 包括雲端和裝置端模型


  1. 如果還沒試過 將 Firebase 新增至您的 Android 專案
  2. 將 ML Kit Android 程式庫的依附元件新增至模組 (應用程式層級) Gradle 檔案 (通常是 app/build.gradle):
    apply plugin: 'com.android.application'
    apply plugin: 'com.google.gms.google-services'
    dependencies {
      // ...
      implementation 'com.google.firebase:firebase-ml-vision:24.0.3'
  3. 選用步驟,但建議使用:如果您使用裝置端 API,請設定 應用程式,在應用程式完成更新後,自動將機器學習模型下載至裝置 安裝

    方法是在應用程式的 AndroidManifest.xml 檔案:

    <application ...>
          android:value="ocr" />
      <!-- To use multiple models: android:value="ocr,model2,model3" -->
    敬上 如果您未啟用安裝期間模型下載功能,模型就會 。您提出的要求 無法完成下載。
  4. 如要使用雲端模型,且尚未啟用 為專案設定雲端式 API,請立即採用以下做法:

    1. 開啟 ML Kit Firebase 控制台的 API 頁面
    2. 如果您尚未將專案升級至 Blaze 定價方案,請按一下 如要這麼做,請升級。(只有在您的 專案並未採用 Blaze 方案)。

      只有 Blaze 層級的專案可以使用以雲端為基礎的 API。

    3. 如果尚未啟用雲端式 API,請按一下「Enable Cloud-based API」(啟用雲端式 API) API




  • 為了讓 ML Kit 準確辨識文字,輸入圖片必須包含 以充足的像素資料表示的文字最適合拉丁字母 每個字元至少要有 16x16 像素中文 日文和韓文文字 (只有雲端式 API 支援)。 字元應為 24x24 像素所有語言通常沒有 對字元大於 24x24 像素的特性來說,準確性的優勢在於。

    舉例來說,640x480 的圖片適合掃描名片 圖片會佔滿圖片的整個寬度如何掃描列印的文件 則建議使用 720x1280 像素的圖片。

  • 圖片焦點不佳可能會降低文字辨識的準確度。如果您不 請嘗試重新擷取圖片。

  • 如果您在即時應用程式中辨識文字,您可能也會 想要考慮輸入圖片的整體尺寸較小 圖片處理速度更快,因此為了縮短延遲時間,擷取 較低的解析度 (請留意上述準確率規定) 確保文字盡可能填滿圖片。另請參閱 即時效能改善秘訣


為了透過裝置或雲端模型辨識圖片中的文字, 按照下方說明執行文字辨識工具。

1. 執行文字辨識工具

如要辨識圖片中的文字,請建立 FirebaseVisionImage 物件 從 Bitmapmedia.ImageByteBuffer、位元組陣列或 裝置。然後,將 FirebaseVisionImage 物件傳遞至 FirebaseVisionTextRecognizerprocessImage 方法。

  1. 使用圖片建立 FirebaseVisionImage 物件。

    • 要使用 FirebaseVisionImage 物件 media.Image 物件,例如從 裝置的相機,請傳遞 media.Image 物件和圖片的 旋轉至 FirebaseVisionImage.fromMediaImage()

      如果您使用 CameraX 程式庫、OnImageCapturedListenerImageAnalysis.Analyzer 類別會計算旋轉值 因此只需將旋轉模型 轉換為 ML Kit 的 呼叫前 ROTATION_ 常數 FirebaseVisionImage.fromMediaImage()


      private class YourAnalyzer implements ImageAnalysis.Analyzer {
          private int degreesToFirebaseRotation(int degrees) {
              switch (degrees) {
                  case 0:
                      return FirebaseVisionImageMetadata.ROTATION_0;
                  case 90:
                      return FirebaseVisionImageMetadata.ROTATION_90;
                  case 180:
                      return FirebaseVisionImageMetadata.ROTATION_180;
                  case 270:
                      return FirebaseVisionImageMetadata.ROTATION_270;
                      throw new IllegalArgumentException(
                              "Rotation must be 0, 90, 180, or 270.");
          public void analyze(ImageProxy imageProxy, int degrees) {
              if (imageProxy == null || imageProxy.getImage() == null) {
              Image mediaImage = imageProxy.getImage();
              int rotation = degreesToFirebaseRotation(degrees);
              FirebaseVisionImage image =
                      FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
              // Pass image to an ML Kit Vision API
              // ...


      private class YourImageAnalyzer : ImageAnalysis.Analyzer {
          private fun degreesToFirebaseRotation(degrees: Int): Int = when(degrees) {
              0 -> FirebaseVisionImageMetadata.ROTATION_0
              90 -> FirebaseVisionImageMetadata.ROTATION_90
              180 -> FirebaseVisionImageMetadata.ROTATION_180
              270 -> FirebaseVisionImageMetadata.ROTATION_270
              else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
          override fun analyze(imageProxy: ImageProxy?, degrees: Int) {
              val mediaImage = imageProxy?.image
              val imageRotation = degreesToFirebaseRotation(degrees)
              if (mediaImage != null) {
                  val image = FirebaseVisionImage.fromMediaImage(mediaImage, imageRotation)
                  // Pass image to an ML Kit Vision API
                  // ...

      如果您沒有使用相機程式庫來提供圖像旋轉角度, 可根據裝置旋轉角度和相機方向計算 感應器:


      private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
      static {
          ORIENTATIONS.append(Surface.ROTATION_0, 90);
          ORIENTATIONS.append(Surface.ROTATION_90, 0);
          ORIENTATIONS.append(Surface.ROTATION_180, 270);
          ORIENTATIONS.append(Surface.ROTATION_270, 180);
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      private int getRotationCompensation(String cameraId, Activity activity, Context context)
              throws CameraAccessException {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
          int rotationCompensation = ORIENTATIONS.get(deviceRotation);
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          CameraManager cameraManager = (CameraManager) context.getSystemService(CAMERA_SERVICE);
          int sensorOrientation = cameraManager
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          int result;
          switch (rotationCompensation) {
              case 0:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
              case 90:
                  result = FirebaseVisionImageMetadata.ROTATION_90;
              case 180:
                  result = FirebaseVisionImageMetadata.ROTATION_180;
              case 270:
                  result = FirebaseVisionImageMetadata.ROTATION_270;
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  Log.e(TAG, "Bad rotation value: " + rotationCompensation);
          return result;


      private val ORIENTATIONS = SparseIntArray()
      init {
          ORIENTATIONS.append(Surface.ROTATION_0, 90)
          ORIENTATIONS.append(Surface.ROTATION_90, 0)
          ORIENTATIONS.append(Surface.ROTATION_180, 270)
          ORIENTATIONS.append(Surface.ROTATION_270, 180)
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      private fun getRotationCompensation(cameraId: String, activity: Activity, context: Context): Int {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          val deviceRotation = activity.windowManager.defaultDisplay.rotation
          var rotationCompensation = ORIENTATIONS.get(deviceRotation)
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          val cameraManager = context.getSystemService(CAMERA_SERVICE) as CameraManager
          val sensorOrientation = cameraManager
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          val result: Int
          when (rotationCompensation) {
              0 -> result = FirebaseVisionImageMetadata.ROTATION_0
              90 -> result = FirebaseVisionImageMetadata.ROTATION_90
              180 -> result = FirebaseVisionImageMetadata.ROTATION_180
              270 -> result = FirebaseVisionImageMetadata.ROTATION_270
              else -> {
                  result = FirebaseVisionImageMetadata.ROTATION_0
                  Log.e(TAG, "Bad rotation value: $rotationCompensation")
          return result

      然後,請傳遞 media.Image 物件和 將旋轉值轉換為 FirebaseVisionImage.fromMediaImage()


      FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);


      val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)
    • 如要從檔案 URI 建立 FirebaseVisionImage 物件,請傳遞 應用程式環境和檔案 URI FirebaseVisionImage.fromFilePath()。如果您要 使用 ACTION_GET_CONTENT 意圖提示使用者選取 取自圖片庫應用程式中的圖片。


      FirebaseVisionImage image;
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri);
      } catch (IOException e) {


      val image: FirebaseVisionImage
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri)
      } catch (e: IOException) {
    • 要使用 FirebaseVisionImage 物件 ByteBuffer 或位元組陣列,請先計算圖片 旋轉 (方法如上所述) media.Image 輸入欄位。

      接著建立 FirebaseVisionImageMetadata 物件 包含圖片的高度、寬度、色彩編碼格式 和輪替金鑰


      FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
              .setWidth(480)   // 480x360 is typically sufficient for
              .setHeight(360)  // image recognition


      val metadata = FirebaseVisionImageMetadata.Builder()
              .setWidth(480) // 480x360 is typically sufficient for
              .setHeight(360) // image recognition

      使用緩衝區或陣列和中繼資料物件 FirebaseVisionImage 物件:


      FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
      // Or: FirebaseVisionImage image = FirebaseVisionImage.fromByteArray(byteArray, metadata);


      val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata)
      // Or: val image = FirebaseVisionImage.fromByteArray(byteArray, metadata)
    • 要使用 FirebaseVisionImage 物件 Bitmap 物件:


      FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);


      val image = FirebaseVisionImage.fromBitmap(bitmap)
      Bitmap 物件代表的圖片必須 保持直立,不用另外旋轉。

  2. 取得 FirebaseVisionTextRecognizer 的執行個體。



    FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()


    val detector = FirebaseVision.getInstance()



    FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()
    // Or, to change the default settings:
    //   FirebaseVisionTextRecognizer detector = FirebaseVision.getInstance()
    //          .getCloudTextRecognizer(options);
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    FirebaseVisionCloudTextRecognizerOptions options = new FirebaseVisionCloudTextRecognizerOptions.Builder()
            .setLanguageHints(Arrays.asList("en", "hi"))


    val detector = FirebaseVision.getInstance().cloudTextRecognizer
    // Or, to change the default settings:
    // val detector = FirebaseVision.getInstance().getCloudTextRecognizer(options)
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    val options = FirebaseVisionCloudTextRecognizerOptions.Builder()
            .setLanguageHints(listOf("en", "hi"))
  3. 最後,將圖片傳遞至 processImage 方法:


    Task<FirebaseVisionText> result =
                    .addOnSuccessListener(new OnSuccessListener<FirebaseVisionText>() {
                        public void onSuccess(FirebaseVisionText firebaseVisionText) {
                            // Task completed successfully
                            // ...
                            new OnFailureListener() {
                                public void onFailure(@NonNull Exception e) {
                                    // Task failed with an exception
                                    // ...


    val result = detector.processImage(image)
            .addOnSuccessListener { firebaseVisionText ->
                // Task completed successfully
                // ...
            .addOnFailureListener { e ->
                // Task failed with an exception
                // ...

2. 從已辨識的文字區塊擷取文字

如果文字辨識作業成功, 系統會將 FirebaseVisionText 物件傳遞至成功 接聽程式。FirebaseVisionText 物件包含系統辨識到的文字 映像檔和零或多個 TextBlock 物件

每個 TextBlock 都代表文字區塊,其中包含零或 其他 Line 物件。每個 Line 物件都包含零個或多個物件 Element 物件,代表字詞和類似文字 實體 (日期、數字等)。

對於每個 TextBlockLineElement 物件,您可以取得文字 辨識其特徵的位置和邊界座標



String resultText = result.getText();
for (FirebaseVisionText.TextBlock block: result.getTextBlocks()) {
    String blockText = block.getText();
    Float blockConfidence = block.getConfidence();
    List<RecognizedLanguage> blockLanguages = block.getRecognizedLanguages();
    Point[] blockCornerPoints = block.getCornerPoints();
    Rect blockFrame = block.getBoundingBox();
    for (FirebaseVisionText.Line line: block.getLines()) {
        String lineText = line.getText();
        Float lineConfidence = line.getConfidence();
        List<RecognizedLanguage> lineLanguages = line.getRecognizedLanguages();
        Point[] lineCornerPoints = line.getCornerPoints();
        Rect lineFrame = line.getBoundingBox();
        for (FirebaseVisionText.Element element: line.getElements()) {
            String elementText = element.getText();
            Float elementConfidence = element.getConfidence();
            List<RecognizedLanguage> elementLanguages = element.getRecognizedLanguages();
            Point[] elementCornerPoints = element.getCornerPoints();
            Rect elementFrame = element.getBoundingBox();


val resultText = result.text
for (block in result.textBlocks) {
    val blockText = block.text
    val blockConfidence = block.confidence
    val blockLanguages = block.recognizedLanguages
    val blockCornerPoints = block.cornerPoints
    val blockFrame = block.boundingBox
    for (line in block.lines) {
        val lineText = line.text
        val lineConfidence = line.confidence
        val lineLanguages = line.recognizedLanguages
        val lineCornerPoints = line.cornerPoints
        val lineFrame = line.boundingBox
        for (element in line.elements) {
            val elementText = element.text
            val elementConfidence = element.confidence
            val elementLanguages = element.recognizedLanguages
            val elementCornerPoints = element.cornerPoints
            val elementFrame = element.boundingBox


希望透過裝置上的模型即時辨識文字 請遵循下列準則,以達到最佳影格速率:

  • 限制對文字辨識工具的呼叫。如果新的影片影格 可在文字辨識工具執行期間捨棄外框。
  • 使用文字辨識器的輸出內容,將圖像重疊在 先從 ML Kit 取得結果,然後算繪圖片 並疊加單一步驟這麼一來,您的應用程式就會算繪到顯示途徑 每個輸入影格只能建立一次
  • 如果你使用 Camera2 API, ImageFormat.YUV_420_888 格式。

    如果使用舊版 Camera API,請以 ImageFormat.NV21 格式。

  • 建議以較低的解析度拍攝圖片。請特別注意 這個 API 的圖片尺寸規定



如要辨識文件中的文字,請設定並執行雲端式 與文件文字辨識工具搭配使用

以下說明文件文字辨識 API 提供的介面 是為了方便處理文件圖片。不過 如果您偏好 FirebaseVisionTextRecognizer API 提供的介面 您可以改用 BigQuery 掃描文件 辨識器來使用密集文字模型

如何使用文件文字辨識 API:

1. 執行文字辨識工具

如要辨識圖片中的文字,請透過FirebaseVisionImage Bitmapmedia.ImageByteBuffer、位元組陣列或裝置上的檔案。 然後,將 FirebaseVisionImage 物件傳遞至 FirebaseVisionDocumentTextRecognizerprocessImage 方法。

  1. 使用圖片建立 FirebaseVisionImage 物件。

    • 要使用 FirebaseVisionImage 物件 media.Image 物件,例如從 裝置的相機,請傳遞 media.Image 物件和圖片的 旋轉至 FirebaseVisionImage.fromMediaImage()

      如果您使用 CameraX 程式庫、OnImageCapturedListenerImageAnalysis.Analyzer 類別會計算旋轉值 因此只需將旋轉模型 轉換為 ML Kit 的 呼叫前 ROTATION_ 常數 FirebaseVisionImage.fromMediaImage()


      如果您沒有使用相機程式庫來提供圖像旋轉角度, 可根據裝置旋轉角度和相機方向計算 感應器:


      然後,請傳遞 media.Image 物件和 將旋轉值轉換為 FirebaseVisionImage.fromMediaImage()


      FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);


      val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)
    • 如要從檔案 URI 建立 FirebaseVisionImage 物件,請傳遞 應用程式環境和檔案 URI FirebaseVisionImage.fromFilePath()。如果您要 使用 ACTION_GET_CONTENT 意圖提示使用者選取 取自圖片庫應用程式中的圖片。


  2. 取得 FirebaseVisionDocumentTextRecognizer:


    FirebaseVisionDocumentTextRecognizer detector = FirebaseVision.getInstance()
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    FirebaseVisionCloudDocumentRecognizerOptions options =
            new FirebaseVisionCloudDocumentRecognizerOptions.Builder()
                    .setLanguageHints(Arrays.asList("en", "hi"))
    FirebaseVisionDocumentTextRecognizer detector = FirebaseVision.getInstance()


    val detector = FirebaseVision.getInstance()
    // Or, to provide language hints to assist with language detection:
    // See https://cloud.google.com/vision/docs/languages for supported languages
    val options = FirebaseVisionCloudDocumentRecognizerOptions.Builder()
            .setLanguageHints(listOf("en", "hi"))
    val detector = FirebaseVision.getInstance()

  3. 最後,將圖片傳遞至 processImage 方法:


            .addOnSuccessListener(new OnSuccessListener<FirebaseVisionDocumentText>() {
                public void onSuccess(FirebaseVisionDocumentText result) {
                    // Task completed successfully
                    // ...
            .addOnFailureListener(new OnFailureListener() {
                public void onFailure(@NonNull Exception e) {
                    // Task failed with an exception
                    // ...


            .addOnSuccessListener { firebaseVisionDocumentText ->
                // Task completed successfully
                // ...
            .addOnFailureListener { e ->
                // Task failed with an exception
                // ...

2. 從已辨識的文字區塊擷取文字

如果文字辨識作業成功,系統會傳回 FirebaseVisionDocumentText 物件。A 罩杯 FirebaseVisionDocumentText 物件包含可在 圖片及物件階層,反映可識別的 文件:

對於每個 BlockParagraphWordSymbol 物件,您可以取得 可在區域辨識的文字和區域的邊界座標。



String resultText = result.getText();
for (FirebaseVisionDocumentText.Block block: result.getBlocks()) {
    String blockText = block.getText();
    Float blockConfidence = block.getConfidence();
    List<RecognizedLanguage> blockRecognizedLanguages = block.getRecognizedLanguages();
    Rect blockFrame = block.getBoundingBox();
    for (FirebaseVisionDocumentText.Paragraph paragraph: block.getParagraphs()) {
        String paragraphText = paragraph.getText();
        Float paragraphConfidence = paragraph.getConfidence();
        List<RecognizedLanguage> paragraphRecognizedLanguages = paragraph.getRecognizedLanguages();
        Rect paragraphFrame = paragraph.getBoundingBox();
        for (FirebaseVisionDocumentText.Word word: paragraph.getWords()) {
            String wordText = word.getText();
            Float wordConfidence = word.getConfidence();
            List<RecognizedLanguage> wordRecognizedLanguages = word.getRecognizedLanguages();
            Rect wordFrame = word.getBoundingBox();
            for (FirebaseVisionDocumentText.Symbol symbol: word.getSymbols()) {
                String symbolText = symbol.getText();
                Float symbolConfidence = symbol.getConfidence();
                List<RecognizedLanguage> symbolRecognizedLanguages = symbol.getRecognizedLanguages();
                Rect symbolFrame = symbol.getBoundingBox();


val resultText = result.text
for (block in result.blocks) {
    val blockText = block.text
    val blockConfidence = block.confidence
    val blockRecognizedLanguages = block.recognizedLanguages
    val blockFrame = block.boundingBox
    for (paragraph in block.paragraphs) {
        val paragraphText = paragraph.text
        val paragraphConfidence = paragraph.confidence
        val paragraphRecognizedLanguages = paragraph.recognizedLanguages
        val paragraphFrame = paragraph.boundingBox
        for (word in paragraph.words) {
            val wordText = word.text
            val wordConfidence = word.confidence
            val wordRecognizedLanguages = word.recognizedLanguages
            val wordFrame = word.boundingBox
            for (symbol in word.symbols) {
                val symbolText = symbol.text
                val symbolConfidence = symbol.confidence
                val symbolRecognizedLanguages = symbol.recognizedLanguages
                val symbolFrame = symbol.boundingBox
