Android Rigid Pose Estimation


If you haven’t set up the SDK yet, make sure to go through those directions first. You’ll need to add the Core library to the app before using the specific feature API or custom model. Follow iOS setup or Android setup directions.


The 3D pose estimation described here is restricted to rigid bodies like a book or chair. It does not apply to soft or compound bodies like people.

3D Pose Estimation on Android combines neural networks, traditional computer vision techniques, and ARCore to estimate an object’s position in the real world.

For a full list of compatible devices, see here..

1. Add the dependencies via Gradle

Add our repository in order to install the SDK:

repositories {
    maven { url "" }

Add renderscript support and include the vision dependency in app/build. gradle. Renderscript is used in order to improve image processing performance.

android {
    defaultConfig {
        renderscriptTargetApi 21
        renderscriptSupportModeEnabled true

dependencies {
    implementation 'ai.fritz:vision-opencv:1.0.0'

2. Create a RigidPoseOnDeviceModel

In order to estimate the real-world coordinates of the object, you’ll need to create a custom Rigid Pose Model.

 * A 3D Pose TensorFlow Lite model included in the assets folder of your app.
 * @param String modelPath: the path to your model file.
 * @param String modelId: the model id specified by Fritz AI for the included model.
 * @param int version: the version number specified by Fritz AI for the included model.
 * @param int inputHeight: the expected input height for the model
 * @param int inputWidth: the expected input width for the model
 * @param int outputHeight: the expected output height for the model
 * @param int outputWidth: the expected output width for the model
 * @param int numKeypoints: the number of output 2D keypoints for the model.
 * @param List<Point3> object3DPoints: the local, 3D coordinates of the rigid body. This is used to infer the 3D coordinates from the 2D keypoints.
RigidPoseOnDeviceModel onDeviceModel = new RigidPoseOnDeviceModel(
                        modelPath, modelId, version,
                        inputHeight, inputWidth,
                        outputHeight, outputWidth, numKeypoints,

2. Get a FritzVisionRigidPosePredictor

In order to use the predictor, the on-device model must first be loaded:

import ai.fritz.visionCV.rigidpose.FritzVisionRigidPosePredictor;

FritzVisionRigidPosePredictor posePredictor =

If you did not include the on-device model, you’ll have to load the model before you can get a predictor. To do that, you’ll use a RigidPoseManagedModel object and call FritzVisionCV.RigidPose.loadPredictor to start the model download.

FritzVisionLabelPredictor predictor;

RigidPoseManagedModel managedModel = new RigidPoseManagedModel(
                            modelPath, modelId, version,
                            inputHeight, inputWidth,
                            outputHeight, outputWidth, numKeypoints,
FritzVisionCV.RigidPose.loadPredictor(managedModel, new PredictorStatusListener<FritzVisionRigidPosePredictor>() {
    public void onPredictorReady(FritzVisionRigidPosePredictor posePredictor) {
        predictor = posePredictor;

3. Create a FritzCVImage from an image or a video stream

To create a FritzCVImage from a Bitmap:

FritzCVImage visionImage = FritzCVImage.fromBitmap(bitmap);

To create a FritzCVImage from a media.Image

First determine the orientation of the image. This will rotate the image to account for device rotation and the orientation of the camera sensor.

// Get the system service for the camera manager
final CameraManager manager = (CameraManager) getSystemService(Context.CAMERA_SERVICE);

// Gets the first camera id
String cameraId = manager.getCameraIdList().get(0);

// Determine the rotation on the FritzCVImage from the camera orientaion and the device rotation.
// "this" refers to the calling Context (Application, Activity, etc)
int imageRotationFromCamera = FritzVisionOrientation.getImageRotationFromCamera(this, cameraId);

Finally, create the FritzCVImage object with the rotation

FritzCVImage visionImage =
    FritzCVImage.fromMediaImage(image, imageRotationFromCamera);

To create a FritzCVImage from an OpenCV Mat object

FritzCVImage visionImage =
    FritzCVImage.fromMatrix(image, imageRotationFromCamera);

4. Run prediction to get 2D Keypoints.

Next, pass the FritzCVImage into the predictor in order to get information on the 2D keypoints:

RigidPoseResult poseResult = posePredictor.predict(visionImage);
RigidPoseResult methods
Type Method and Description
Get a list of 2D keypoints detected in the image. Coordinates are relative to the size of the input for the model (224x224)
Gets a list of confidence scores corresponding to each keypoint. Values from 0 to 1, with 1 being 100% confidence.
drawKeypoints(Mat canvas, Scalar color)
Draw keypoints on a provided Mat canvas with the keypoint index. Useful for debugging.

5. Infer 3D Pose from RigidPoseResult

Use pose lifting in order to infer the 3D pose ( from the 2D keypoint coordinates.

// Use the ARCore camera to get the intrinsic matrix.
Camera camera = frame.getCamera();
Mat cameraMatrix = FritzVisionRigidPoseLifting.getCameraIntrinsicMatrix(camera);
MatOfDouble distorsionMatrix = FritzVisionRigidPoseLifting.getDistortionMatrix();

// Create a pose lifting object to infer the 3D Pose
FritzVisionRigidPoseLifting poseLifting = new FritzVisionRigidPoseLifting(onDeviceModel, poseResult);
Pose objectPose = poseLifting.infer3DPose(cameraMatrix, distorsionMatrix);
FritzVisionRigidPoseLifting methods
Method Description
Mat getTvec() Get the OpenCV translation vector output.
Mat getRvec() Get the OpenCV rotation vector output.
infer3DPose(Mat cameraMatrix, MatOfDouble distortionMatrix) Gets an ARCore Pose object to apply to a 3D model.

6. Place an AR Object / 3D model in the real world using the inferred Pose.

By composing the camera pose and the object’s pose, we get the real world coordinates for placing an AR object.

Pose finalPose = camera.getPose().compose(objectPose);

Advanced Options

Configure the FritzVisionRigidPosePredictor

You can configure the predictor with FritzVisionRigidPosePredictorOptions to return specific results.

// Create a predictor with specified options.
FritzVisionRigidPosePredictorOptions options = new FritzVisionRigidPosePredictorOptions();
options.confidenceThreshold = .6f;
options.numKeypointsAboveThreshold = 4;
posePredictor = FritzVisionCV.RigidPose.getPredictor(onDeviceModel, options);

Pose and Keypoint Smoothing

To help improve stability of predictions between frames, use the RigidPoseSmoother class which uses 1-Euro filters.

// Create predictor options
RigidPoseSmoother poseSmoother =
    new RigidPoseSmoother(onDeviceModel.getNumKeypoints(), minCutoff, beta, derivativeCutoff);

// Create predictor options with default options
RigidPoseSmoother poseSmoother =
    new RigidPoseSmoother(onDeviceModel.getNumKeypoints());

// Smooth 2D Keypoints
RigidPoseResult smoothedResult = poseSmoother.smooth2DKeypoints(poseResult);

// Smooth 3D Pose
Pose smoothedObjectPose = poseSmoother.smoothPose(objectPose);
Pose Smoother Methods
Method Description
RigidPoseResult smooth2DKeypoints(RigidPoseResult poseResult) Smooth the result for all keypoints.
Pose smoothPose(Pose objectPose) Smooth the object pose (rotation and translation).

“The 1-Euro filter (“one Euro filter”) is a simple algorithm to filter noisy signals for high precision and responsiveness. It uses a first order low-pass filter with an adaptive cutoff frequency: at low speeds, a low cutoff stabilizes the signal by reducing jitter, but as speed increases, the cutoff is increased to reduce lag.”

1-Euro filter parameters
Parameter Description
minCutoff (default: .8) Minimum frequency cutoff. Lower values will decrease jitter but increase lag.
beta (default: .01) Higher values of beta will help reduce lag, but may increase jitter.
derivateCutoff (default: .1) Max derivative value allowed. Increasing will allow more sudden movements.