在 Android 上使用 ML Kit 檢測和跟踪對象

您可以使用 ML Kit 跨視頻幀檢測和跟踪對象。

當您傳遞 ML Kit 圖像時,ML Kit 會為每個圖像返回最多五個檢測到的對象及其在圖像中的位置的列表。檢測視頻流中的對象時,每個對像都有一個 ID,可用於跨圖像跟踪對象。您還可以選擇啟用粗略對象分類,這會使用廣泛的類別描述來標記對象。

在你開始之前

  1. 如果您尚未將 Firebase 添加到您的 Android 項目中,請將其添加到您的 Android 項目中。
  2. 將 ML Kit Android 庫的依賴項添加到模塊(應用程序級)Gradle 文件(通常app/build.gradle ):
    apply plugin: 'com.android.application'
    apply plugin: 'com.google.gms.google-services'
    
    dependencies {
      // ...
    
      implementation 'com.google.firebase:firebase-ml-vision:24.0.3'
      implementation 'com.google.firebase:firebase-ml-vision-object-detection-model:19.0.6'
    }
    

1. 配置物體檢測器

要開始檢測和跟踪對象,請首先創建FirebaseVisionObjectDetector的實例,可以選擇指定要更改默認設置的任何檢測器設置。

  1. 使用FirebaseVisionObjectDetectorOptions對象為您的用例配置對象檢測器。您可以更改以下設置:

    物體檢測器設置
    檢測方式STREAM_MODE (默認)| SINGLE_IMAGE_MODE

    STREAM_MODE (默認)下,對象檢測器以低延遲運行,但在檢測器的前幾次調用中可能會產生不完整的結果(例如未指定的邊界框或類別標籤)。此外,在STREAM_MODE中,檢測器為對象分配跟踪 ID,您可以使用它來跨幀跟踪對象。當您想要跟踪對像或低延遲很重要時(例如實時處理視頻流時),請使用此模式。

    SINGLE_IMAGE_MODE中,對象檢測器會等待,直到檢測到的對象的邊界框和(如果啟用了分類)類別標籤可用,然後再返回結果。因此,檢測延遲可能會更高。此外,在SINGLE_IMAGE_MODE中,不會分配跟踪 ID。如果延遲並不重要並且您不想處理部分結果,請使用此模式。

    檢測並跟踪多個物體false (默認)| true

    是否檢測和跟踪最多五個對像或僅檢測和跟踪最突出的對象(默認)。

    對物體進行分類false (默認)| true

    是否將檢測到的對象分類為粗略類別。啟用後,對象檢測器會將對象分為以下類別:時尚商品、食品、家居用品、地點、植物和未知。

    對象檢測和跟踪 API 針對這兩個核心用例進行了優化:

    • 實時檢測和跟踪相機取景器中最突出的物體
    • 從靜態圖像中檢測多個對象

    要為這些用例配置 API:

    Java

    // Live detection and tracking
    FirebaseVisionObjectDetectorOptions options =
            new FirebaseVisionObjectDetectorOptions.Builder()
                    .setDetectorMode(FirebaseVisionObjectDetectorOptions.STREAM_MODE)
                    .enableClassification()  // Optional
                    .build();
    
    // Multiple object detection in static images
    FirebaseVisionObjectDetectorOptions options =
            new FirebaseVisionObjectDetectorOptions.Builder()
                    .setDetectorMode(FirebaseVisionObjectDetectorOptions.SINGLE_IMAGE_MODE)
                    .enableMultipleObjects()
                    .enableClassification()  // Optional
                    .build();
    

    Kotlin+KTX

    // Live detection and tracking
    val options = FirebaseVisionObjectDetectorOptions.Builder()
            .setDetectorMode(FirebaseVisionObjectDetectorOptions.STREAM_MODE)
            .enableClassification()  // Optional
            .build()
    
    // Multiple object detection in static images
    val options = FirebaseVisionObjectDetectorOptions.Builder()
            .setDetectorMode(FirebaseVisionObjectDetectorOptions.SINGLE_IMAGE_MODE)
            .enableMultipleObjects()
            .enableClassification()  // Optional
            .build()
    
  2. 獲取FirebaseVisionObjectDetector的實例:

    Java

    FirebaseVisionObjectDetector objectDetector =
            FirebaseVision.getInstance().getOnDeviceObjectDetector();
    
    // Or, to change the default settings:
    FirebaseVisionObjectDetector objectDetector =
            FirebaseVision.getInstance().getOnDeviceObjectDetector(options);
    

    Kotlin+KTX

    val objectDetector = FirebaseVision.getInstance().getOnDeviceObjectDetector()
    
    // Or, to change the default settings:
    val objectDetector = FirebaseVision.getInstance().getOnDeviceObjectDetector(options)
    

2. 運行物體檢測器

要檢測和跟踪對象,請將圖像傳遞給FirebaseVisionObjectDetector實例的processImage()方法。

對於序列中的每一幀視頻或圖像,執行以下操作:

  1. 從您的圖像創建FirebaseVisionImage對象。

    • 要從media.Image對象創建FirebaseVisionImage對象(例如從設備的相機捕獲圖像時),請將media.Image對象和圖像的旋轉傳遞給FirebaseVisionImage.fromMediaImage()

      如果您使用CameraX庫, OnImageCapturedListenerImageAnalysis.Analyzer類會為您計算旋轉值,因此您只需在調用FirebaseVisionImage.fromMediaImage()之前將旋轉轉換為 ML Kit 的ROTATION_常量之一:

      Java

      private class YourAnalyzer implements ImageAnalysis.Analyzer {
      
          private int degreesToFirebaseRotation(int degrees) {
              switch (degrees) {
                  case 0:
                      return FirebaseVisionImageMetadata.ROTATION_0;
                  case 90:
                      return FirebaseVisionImageMetadata.ROTATION_90;
                  case 180:
                      return FirebaseVisionImageMetadata.ROTATION_180;
                  case 270:
                      return FirebaseVisionImageMetadata.ROTATION_270;
                  default:
                      throw new IllegalArgumentException(
                              "Rotation must be 0, 90, 180, or 270.");
              }
          }
      
          @Override
          public void analyze(ImageProxy imageProxy, int degrees) {
              if (imageProxy == null || imageProxy.getImage() == null) {
                  return;
              }
              Image mediaImage = imageProxy.getImage();
              int rotation = degreesToFirebaseRotation(degrees);
              FirebaseVisionImage image =
                      FirebaseVisionImage.fromMediaImage(mediaImage, rotation);
              // Pass image to an ML Kit Vision API
              // ...
          }
      }
      

      Kotlin+KTX

      private class YourImageAnalyzer : ImageAnalysis.Analyzer {
          private fun degreesToFirebaseRotation(degrees: Int): Int = when(degrees) {
              0 -> FirebaseVisionImageMetadata.ROTATION_0
              90 -> FirebaseVisionImageMetadata.ROTATION_90
              180 -> FirebaseVisionImageMetadata.ROTATION_180
              270 -> FirebaseVisionImageMetadata.ROTATION_270
              else -> throw Exception("Rotation must be 0, 90, 180, or 270.")
          }
      
          override fun analyze(imageProxy: ImageProxy?, degrees: Int) {
              val mediaImage = imageProxy?.image
              val imageRotation = degreesToFirebaseRotation(degrees)
              if (mediaImage != null) {
                  val image = FirebaseVisionImage.fromMediaImage(mediaImage, imageRotation)
                  // Pass image to an ML Kit Vision API
                  // ...
              }
          }
      }
      

      如果您不使用為您提供圖像旋轉的相機庫,您可以根據設備的旋轉和設備中相機傳感器的方向來計算它:

      Java

      private static final SparseIntArray ORIENTATIONS = new SparseIntArray();
      static {
          ORIENTATIONS.append(Surface.ROTATION_0, 90);
          ORIENTATIONS.append(Surface.ROTATION_90, 0);
          ORIENTATIONS.append(Surface.ROTATION_180, 270);
          ORIENTATIONS.append(Surface.ROTATION_270, 180);
      }
      
      /**
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
       */
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      private int getRotationCompensation(String cameraId, Activity activity, Context context)
              throws CameraAccessException {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          int deviceRotation = activity.getWindowManager().getDefaultDisplay().getRotation();
          int rotationCompensation = ORIENTATIONS.get(deviceRotation);
      
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          CameraManager cameraManager = (CameraManager) context.getSystemService(CAMERA_SERVICE);
          int sensorOrientation = cameraManager
                  .getCameraCharacteristics(cameraId)
                  .get(CameraCharacteristics.SENSOR_ORIENTATION);
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360;
      
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          int result;
          switch (rotationCompensation) {
              case 0:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  break;
              case 90:
                  result = FirebaseVisionImageMetadata.ROTATION_90;
                  break;
              case 180:
                  result = FirebaseVisionImageMetadata.ROTATION_180;
                  break;
              case 270:
                  result = FirebaseVisionImageMetadata.ROTATION_270;
                  break;
              default:
                  result = FirebaseVisionImageMetadata.ROTATION_0;
                  Log.e(TAG, "Bad rotation value: " + rotationCompensation);
          }
          return result;
      }

      Kotlin+KTX

      private val ORIENTATIONS = SparseIntArray()
      
      init {
          ORIENTATIONS.append(Surface.ROTATION_0, 90)
          ORIENTATIONS.append(Surface.ROTATION_90, 0)
          ORIENTATIONS.append(Surface.ROTATION_180, 270)
          ORIENTATIONS.append(Surface.ROTATION_270, 180)
      }
      /**
       * Get the angle by which an image must be rotated given the device's current
       * orientation.
       */
      @RequiresApi(api = Build.VERSION_CODES.LOLLIPOP)
      @Throws(CameraAccessException::class)
      private fun getRotationCompensation(cameraId: String, activity: Activity, context: Context): Int {
          // Get the device's current rotation relative to its "native" orientation.
          // Then, from the ORIENTATIONS table, look up the angle the image must be
          // rotated to compensate for the device's rotation.
          val deviceRotation = activity.windowManager.defaultDisplay.rotation
          var rotationCompensation = ORIENTATIONS.get(deviceRotation)
      
          // On most devices, the sensor orientation is 90 degrees, but for some
          // devices it is 270 degrees. For devices with a sensor orientation of
          // 270, rotate the image an additional 180 ((270 + 270) % 360) degrees.
          val cameraManager = context.getSystemService(CAMERA_SERVICE) as CameraManager
          val sensorOrientation = cameraManager
                  .getCameraCharacteristics(cameraId)
                  .get(CameraCharacteristics.SENSOR_ORIENTATION)!!
          rotationCompensation = (rotationCompensation + sensorOrientation + 270) % 360
      
          // Return the corresponding FirebaseVisionImageMetadata rotation value.
          val result: Int
          when (rotationCompensation) {
              0 -> result = FirebaseVisionImageMetadata.ROTATION_0
              90 -> result = FirebaseVisionImageMetadata.ROTATION_90
              180 -> result = FirebaseVisionImageMetadata.ROTATION_180
              270 -> result = FirebaseVisionImageMetadata.ROTATION_270
              else -> {
                  result = FirebaseVisionImageMetadata.ROTATION_0
                  Log.e(TAG, "Bad rotation value: $rotationCompensation")
              }
          }
          return result
      }

      然後,將media.Image對象和旋轉值傳遞給FirebaseVisionImage.fromMediaImage()

      Java

      FirebaseVisionImage image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation);

      Kotlin+KTX

      val image = FirebaseVisionImage.fromMediaImage(mediaImage, rotation)
    • 要從文件 URI 創建FirebaseVisionImage對象,請將應用上下文和文件 URI 傳遞給FirebaseVisionImage.fromFilePath() 。當您使用ACTION_GET_CONTENT意圖提示用戶從其圖庫應用程序中選擇圖像時,這非常有用。

      Java

      FirebaseVisionImage image;
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri);
      } catch (IOException e) {
          e.printStackTrace();
      }

      Kotlin+KTX

      val image: FirebaseVisionImage
      try {
          image = FirebaseVisionImage.fromFilePath(context, uri)
      } catch (e: IOException) {
          e.printStackTrace()
      }
    • 要從ByteBuffer或字節數組創建FirebaseVisionImage對象,請首先按照上面針對media.Image輸入所述計算圖像旋轉。

      然後,創建一個FirebaseVisionImageMetadata對象,其中包含圖像的高度、寬度、顏色編碼格式和旋轉:

      Java

      FirebaseVisionImageMetadata metadata = new FirebaseVisionImageMetadata.Builder()
              .setWidth(480)   // 480x360 is typically sufficient for
              .setHeight(360)  // image recognition
              .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
              .setRotation(rotation)
              .build();

      Kotlin+KTX

      val metadata = FirebaseVisionImageMetadata.Builder()
              .setWidth(480) // 480x360 is typically sufficient for
              .setHeight(360) // image recognition
              .setFormat(FirebaseVisionImageMetadata.IMAGE_FORMAT_NV21)
              .setRotation(rotation)
              .build()

      使用緩衝區或數組以及元數據對象來創建FirebaseVisionImage對象:

      Java

      FirebaseVisionImage image = FirebaseVisionImage.fromByteBuffer(buffer, metadata);
      // Or: FirebaseVisionImage image = FirebaseVisionImage.fromByteArray(byteArray, metadata);

      Kotlin+KTX

      val image = FirebaseVisionImage.fromByteBuffer(buffer, metadata)
      // Or: val image = FirebaseVisionImage.fromByteArray(byteArray, metadata)
    • 要從Bitmap對象創建FirebaseVisionImage對象:

      Java

      FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(bitmap);

      Kotlin+KTX

      val image = FirebaseVisionImage.fromBitmap(bitmap)
      Bitmap對象表示的圖像必須是直立的,不需要額外旋轉。
  2. 將圖像傳遞給processImage()方法:

    Java

    objectDetector.processImage(image)
            .addOnSuccessListener(
                    new OnSuccessListener<List<FirebaseVisionObject>>() {
                        @Override
                        public void onSuccess(List<FirebaseVisionObject> detectedObjects) {
                            // Task completed successfully
                            // ...
                        }
                    })
            .addOnFailureListener(
                    new OnFailureListener() {
                        @Override
                        public void onFailure(@NonNull Exception e) {
                            // Task failed with an exception
                            // ...
                        }
                    });
    

    Kotlin+KTX

    objectDetector.processImage(image)
            .addOnSuccessListener { detectedObjects ->
                // Task completed successfully
                // ...
            }
            .addOnFailureListener { e ->
                // Task failed with an exception
                // ...
            }
    
  3. 如果對processImage()的調用成功,則FirebaseVisionObject的列表將傳遞到成功偵聽器。

    每個FirebaseVisionObject包含以下屬性:

    邊界框指示圖像中對象位置的Rect
    跟踪號碼一個整數,用於識別圖像中的對象。 SINGLE_IMAGE_MODE 下為空。
    類別對象的粗略類別。如果對象檢測器未啟用分類,則始終為FirebaseVisionObject.CATEGORY_UNKNOWN
    信心對象分類的置信度值。如果對象檢測器未啟用分類,或者對像被分類為未知,則此值為null

    Java

    // The list of detected objects contains one item if multiple object detection wasn't enabled.
    for (FirebaseVisionObject obj : detectedObjects) {
        Integer id = obj.getTrackingId();
        Rect bounds = obj.getBoundingBox();
    
        // If classification was enabled:
        int category = obj.getClassificationCategory();
        Float confidence = obj.getClassificationConfidence();
    }
    

    Kotlin+KTX

    // The list of detected objects contains one item if multiple object detection wasn't enabled.
    for (obj in detectedObjects) {
        val id = obj.trackingId       // A number that identifies the object across images
        val bounds = obj.boundingBox  // The object's position in the image
    
        // If classification was enabled:
        val category = obj.classificationCategory
        val confidence = obj.classificationConfidence
    }
    

提高可用性和性能

為了獲得最佳用戶體驗,請在您的應用程序中遵循以下準則:

  • 成功的對象檢測取決於對象的視覺複雜性。具有少量視覺特徵的對象可能需要佔據圖像的較大部分才能被檢測。您應該為用戶提供有關捕獲輸入的指導,該輸入與您想要檢測的對像類型配合良好。
  • 使用分類時,如果您想要檢測未完全屬於支持類別的對象,請對未知對象實施特殊處理。

另外,請查看 [ML Kit Material Design 展示應用][showcase-link]{: .external } 和機器學習驅動功能集合的 Material Design 模式

在實時應用程序中使用流模式時,請遵循以下準則以獲得最佳幀速率:

  • 不要在流模式下使用多個對象檢測,因為大多數設備無法產生足夠的幀速率。

  • 如果不需要,請禁用分類。

  • 對檢測器的節流調用。如果檢測器運行時有新的視頻幀可用,則丟棄該幀。
  • 如果您使用檢測器的輸出將圖形疊加在輸入圖像上,請首先從 ML Kit 獲取結果,然後一步渲染圖像並疊加。通過這樣做,每個輸入幀只需渲染到顯示表面一次。
  • 如果您使用 Camera2 API,請以ImageFormat.YUV_420_888格式捕獲圖像。

    如果您使用較舊的相機 API,請以ImageFormat.NV21格式捕獲圖像。