With an Object Detection model, you can identify objects of interest in an image or each frame of live video. Each prediction returns a set of objects, each with a label, bounding box, and confidence score.
If you just need to know the contents of an image – not the location of the objects – consider using Image Labeling instead.
Custom Training Models for Object Detection¶
You can train a custom model that is compatible with the Object Detection API by using Quickstart: Use Fritz AI Studio to Train a Custom Model.
Pre-trained Object Detection Model¶
The object detection model supports 90 labels from the COCO dataset.
- 1. Add the dependencies via Gradle
- 2. Create an ObjectDetectionOnDeviceModel
- 3. Create an Object Detection Predictor
- 4. Create a FritzVisionImage from an image or a video stream
- 5. Run prediction - Detect different objects in the image
- 6. Displaying the result
- 7. Use the record method on the predictor to collect data
|SSDLite + MobileNet V2 variant||Core ML (iOS), TensorFlow Lite (Android)||~17 MB||300x300-pixel image||Offsets for >2,000 candidate bounding boxes, Class labels for each box, Confidence scores for each box||18 FPS on iPhone X, 8 FPS on Pixel 2|
Custom Model Compatibility Checklist¶
If you have a custom model that was trained outside of Fritz AI, follow this checklist to make sure it will be compatible with the Object Detection API.
- Your model must be a single-shot multibox detector with boxes matching the default configuration found here.
- Your model must be in the TensorFlow Lite (.tflite) or Core ML (.mlmodel) formats.
- iOS Only The name of the input layer must be named
Preprocessor/sub:0and the 2 outputs
- Android Only The 1 input layer (Preprocessor/sub) and 4 output layers (‘outputLocations’, ‘outputClasses’, ‘outputScores’, ‘numDetections’) should be defined in the TensorFlow Lite conversion tool.
- The input should have the following dimensions:
1x300x300x3 (batch_size x height x width * num_channels). Height and width are configurable.
- iOS Only The output should have the following dimensions:
4 (box points) x num_anchor_boxes x 1for boxPredictions and
num_classes x 1for classPredictions.
- Android Only The output should have the following dimensions:
1 x num_anchor_boxes x 4 (box points)for outputLocations,
num_classes x 1for outputClasses & outputScores, and