Android Custom Pose Estimation


If you haven’t set up the SDK yet, make sure to go through those directions first. You’ll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.

Use a custom 2D pose model estimate the position of object keypoints in images and video. If you want to track the position of people in an image or video, see documentation for Human Pose Estimation.

To get started, make sure you have the model included in your app.

1. Add the dependencies via Gradle

Add our repository in order to download the Vision API:

repositories {
    maven { url "" }

Add renderscript support and include the vision dependency in app/build.gradle. Renderscript is used in order to improve image processing performance. You’ll also need to specify aaptOptions in order to prevent compressing TensorFlow Lite models.

android {
    defaultConfig {
        renderscriptTargetApi 21
        renderscriptSupportModeEnabled true

    // Don't compress included TensorFlow Lite models on build.
    aaptOptions {
        noCompress "tflite"

dependencies {
    implementation 'ai.fritz:vision:+'

2. Add the model to your app as an asset

Add your TFLite model file to your app as an asset. In Android studio, you can drag tflite files directly into the file navigator.

3. Define a Skeleton

Extend the Skeleton class to reflect keypoint outputs of your model. For example, if your pose estimation model predicts the location of each finger tip on a hand, you might define a skeleton as follows:

public class HandSkeleton extends Skeleton {

  public static String OBJECT_NAME = "hand";

  public static String[] FINGER_NAMES = {

  public HandSkeleton() {

4. Define a PoseOnDeviceModel

Register your model with the Fritz SDK by creating a new PoseOnDeviceModel. In addition to your model file, Fritz Model ID, and skeleton, you’ll need to know the output stride of your model as well.

PoseOnDeviceModel onDeviceModel = PoseOnDeviceModel(
    "<your model id>",

3. Get a Pose predictor

In order to use the predictor, the on-device model must first be loaded.

FritzVisionPosePredictor predictor = FritzVision.PoseEstimation.getPredictor(

3. Create FritzVisionImage from an image or a video stream

To create a FritzVisionImage from a Bitmap:

FritzVisionImage visionImage = FritzVisionImage.fromBitmap(bitmap);
var visionImage = FritzVisionImage.fromBitmap(bitmap)

To create a FritzVisionImage from a media.Image object when capturing the result from a camera, first determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);

// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);

// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
ImageRotation imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);
// Get the system service for the camera manager
val manager = getSystemService(Context.CAMERA_SERVICE) as CameraManager

// Gets the first camera id
var cameraId = manager.getCameraIdList().get(0)

// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
var imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId)

Finally, create the FritzVisionImage object with the rotation

FritzVisionImage visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);
val visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);

4. Run prediction

To detect object poses in FritzVisionImage, run the following:

FritzVisionPoseResult poseResult = predictor.predict(visionImage);

The predict method returns back a FritzVisionPoseResult object that contains the following methods:

FritzVisionPoseResult methods
Type Method and Description
Gets a list of Pose objects.
getPosesByThreshold(float minConfidence)
Gets a list of poses above a given threshold.

5. Access the Pose Result

FritzVisionPoseResult contains several convenience methods to help draw the keypoints.

Get a bitmap of the pose on the original image

List<Pose> poses = poseResult.getPoses();
Bitmap posesOnImage = visionImage.overlaySkeletons(poses);

Draw the poses onto a Canvas

// Draw the pose to the canvas.
List<Pose> poses = poseResult.getPoses();
for (Pose pose : poses) {

Access the position of specific keypoints

To access each keypoint separately:

// Get the first pose
Pose pose = poseResult.getPoses().get(0);

// Get the objects keypoints
Keypoints[] keypoints = pose.getKeypoints();

// Get the name of the keypoint
String partName = keypoints[0].getPartName();
PointF keypointPoisition = keypoints[0].getPosition()

Advanced Options

Configuring the Predictor

You can configure the predictor with FritzVisionPosePredictorOptions to return specific results that match the options given:

FritzVisionPosePredictorOptions methods
Option Default Description
minPartThreshold 0.50 Minimum confidence score a keypoint must have to be included in a pose.
minPoseThreshold 0.20 Minimum confidence score a pose must have to be included in result.
maxPosesToDetect 1 Detect multiple poses in the image.
nmsRadius 20 Non-maximum suppression (NMS) distance for Part instances. Two parts suppress each other if they are less than nmsRadius pixels away.
PoseSmoothingMethod null Run pose smoothing between predictions.
  • To initialize the pose predictor with options.

    FritzVisionPosePredictorOptions options = new FritzVisionPosePredictorOptions();
    options.minPoseThreshold = .6f;
    predictor = FritzVision.PoseEstimation.getPredictor(onDeviceModel, options);
  • To help improve stability of predictions between frames, set the PoseSmoothingMethod.

    FritzVisionPosePredictorOptions posePredictorOptions =
        new FritzVisionPosePredictorOptions();
    posePredictorOptions.smoothingOptions = new OneEuroFilterMethod();

    1-Euro Filter

    “The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

    - 1-Euro point filter Paper

    The 1-Euro filter runs in real-time with parameters minCutoff and beta which control the amount of lag and jitter.


    .2 (default)

    Minimum frequency cutoff. Lower values will decrease jitter but increase lag.


    .01 (default)

    Higher values of beta will help reduce lag, but may increase jitter.


    .3 (default)

    Max derivative value allowed. Increasing will allow more sudden movements.

    To get a better understanding of how different parameter values affect the results, try out the 1-Euro Filter Demo.


    Pose smoothing is only applied to single pose estimation (maxPosesToDetect = 1).

6. Use the record method on the predictor to collect data

The FritzVisionPosePredictor used to make predictions has a record method allowing you to send an image, a model-predicted annotation, and a user-generated annotation back to your Fritz AI account.

FritzVisionPoseResult predictedResults = visionPredictor.predict(visionImage);

// Implement your own custom UX for users to annotate an image and store
// that as a FritzVisionPoseResult.
visionPredictor.record(visionImage, predictedResults.toAnnotations(), modifiedResults.toAnnotations())