Android Human Pose Estimation


If you haven’t set up the SDK yet, make sure to go through those directions first. You’ll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.

Use the Human PoseOnDeviceModel to detect human figures in images and video. The model estimates the location of body parts and joints relative to a 2D image.

To get started, make sure you have the model included in your app.

1. Add the dependencies via Gradle

Add our repository in order to download the Vision API:

repositories {
    maven { url "" }

Add renderscript support and include the vision dependency in app/build.gradle. Renderscript is used in order to improve image processing performance. You’ll also need to specify aaptOptions in order to prevent compressing TensorFlow Lite models.

android {
    defaultConfig {
        renderscriptTargetApi 21
        renderscriptSupportModeEnabled true

    // Don't compress included TensorFlow Lite models on build.
    aaptOptions {
        noCompress "tflite"

dependencies {
    implementation 'ai.fritz:vision:+'

(Optional include model in your app) To include Pose Estimation model with your build, then you’ll need to add the dependency as shown below. Note: This includes the model with your app when you publish it to the play store and will increase your app size.


Behind the scenes, Pose Estimation uses a TensorFlow Lite model. In order to include this with your app, you’ll need to make sure that the model is not compressed in the APK by setting aaptOptions.

Accurate model:

  • Resolution: 513x385
  • Model Size: 13.3MB
dependencies {
  implementation 'ai.fritz:vision-pose-estimation-model-accurate:3.0.0'

Fast model:

  • Resolution: 353x257
  • Model Size: 2.3MB
dependencies {
  implementation 'ai.fritz:vision-pose-estimation-model-fast:3.0.0'

Small model:

  • Resolution: 353x257
  • Model Size: 614KB
dependencies {
  implementation 'ai.fritz:vision-pose-estimation-model-small:3.0.0'

Now you’re ready to transform images with the Pose Estimation API.

2. Get a Pose predictor

In order to use the predictor, the on-device model must first be loaded. If you followed the Optional step above and included the model dependency, you can get a predictor to use immediately:

// For fast
PoseOnDeviceModel onDeviceModel =

// For accurate
PoseOnDeviceModel onDeviceModel =

// For small
PoseOnDeviceModel onDeviceModel =

FritzVisionPosePredictor predictor =

If you did not include the on-device model with your app, you’ll have to load the model before you can get a predictor. To do that, you’ll use PoseEstimationManagedModel and call FritzVision.PoseEstimation.loadPredictor to start the model download.

FritzVisionPosePredictor predictor;

// For fast
PoseManagedModel managedModel =

// For accurate
PoseManagedModel managedModel =

// For small
PoseManagedModel managedModel =

FritzVision.PoseEstimation.loadPredictor(managedModel, new PredictorStatusListener<FritzVisionPosePredictor>() {
    public void onPredictorReady(FritzVisionPosePredictor posePredictor) {
        Log.d(TAG, "Pose estimation predictor is ready");
        predictor = posePredictor;

3. Create FritzVisionImage from an image or a video stream

To create a FritzVisionImage from a Bitmap:

FritzVisionImage visionImage = FritzVisionImage.fromBitmap(bitmap);
var visionImage = FritzVisionImage.fromBitmap(bitmap)

To create a FritzVisionImage from a media.Image object when capturing the result from a camera, first determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);

// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);

// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
ImageRotation imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);
// Get the system service for the camera manager
val manager = getSystemService(Context.CAMERA_SERVICE) as CameraManager

// Gets the first camera id
var cameraId = manager.getCameraIdList().get(0)

// Determine the rotation on the FritzVisionImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
var imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId)

Finally, create the FritzVisionImage object with the rotation

FritzVisionImage visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);
val visionImage = FritzVisionImage.fromMediaImage(image, imageRotationFromCamera);

4. Run prediction

To detect body poses in FritzVisionImage, run the following:

FritzVisionPoseResult poseResult = predictor.predict(visionImage);

The predict method returns back a FritzVisionPoseResult object that contains the following methods:

FritzVisionPoseResult methods
Type Method and Description
Gets a list of Pose objects.
getPosesByThreshold(float minConfidence)
Gets a list of poses above a given threshold.

5. Access the Pose Result

FritzVisionPoseResult contains several convenience methods to help draw the keypoints and body position.

Get a bitmap of the pose on the original image

List<Pose> poses = poseResult.getPoses();
Bitmap posesOnImage = visionImage.overlaySkeletons(poses);

Draw the poses onto a Canvas

// Draw the pose to the canvas.
List<Pose> poses = poseResult.getPoses();
for (Pose pose : poses) {

Access the position of specific body keypoints

There are several body keypoints for each pose:

  • nose
  • left eye
  • right eye
  • left ear
  • right ear
  • left shoulder
  • right shoulder
  • left elbow
  • right elbow
  • left wrist
  • right wrist
  • left hip
  • right hip
  • left knee
  • right knee
  • left ankle
  • right ankle

To access each body keypoint separately:

// Get the first pose
Pose pose = poseResult.getPoses().get(0);

// Get the body keypoints
Keypoint[] keypoints = pose.getKeypoints();

// Get the name of the keypoint
String partName = keypoints[0].getPartName();
PointF keypointPoisition = keypoints[0].getPosition();

Advanced Options

Configuring the Predictor

You can configure the predictor with FritzVisionPosePredictorOptions to return specific results that match the options given:

FritzVisionPosePredictorOptions methods
Option Default Description
minPartThreshold 0.50 Minimum confidence score a keypoint must have to be included in a pose.
minPoseThreshold 0.20 Minimum confidence score a pose must have to be included in result.
maxPosesToDetect 1 Detect multiple poses in the image.
nmsRadius 20 Non-maximum suppression (NMS) distance for Part instances. Two parts suppress each other if they are less than nmsRadius pixels away.
PoseSmoothingMethod null Run pose smoothing between predictions.
  • To initialize the pose predictor with options.

    FritzVisionPosePredictorOptions options = new FritzVisionPosePredictorOptions();
    options.minPoseThreshold = .6f;
    predictor = FritzVision.PoseEstimation.getPredictor(onDeviceModel, options);
  • To help improve stability of predictions between frames, set the PoseSmoothingMethod.

    FritzVisionPosePredictorOptions posePredictorOptions = new FritzVisionPosePredictorOptions();
    posePredictorOptions.smoothingOptions = new OneEuroFilterMethod();

    1-Euro Filter

    “The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

    - 1-Euro point filter Paper

    The 1-Euro filter runs in real-time with parameters minCutoff and beta which control the amount of lag and jitter.


    .2 (default)

    Minimum frequency cutoff. Lower values will decrease jitter but increase lag.


    .01 (default)

    Higher values of beta will help reduce lag, but may increase jitter.


    .3 (default)

    Max derivative value allowed. Increasing will allow more sudden movements.

    To get a better understanding of how different parameter values affect the results, try out the 1-Euro Filter Demo.


    Pose smoothing is only applied to single pose estimation (maxPosesTo Detect = 1).