Pose Estimation

Pose estimation is a computer vision technique that predicts and tracks the location of important features on person or object. App developers can use Pose Estimation to build AI-powered coaches for sports and fitness, immersive AR experiences, and more.

Pose Estimation

(Left to right) Human Pose Estimation, Custom Pose Estimation, Rigid Pose Estimation

Custom Training Models for Image Labeling

You can train a custom model that is compatible with the Pose Estimation API by using Quickstart: Use Fritz AI Studio to Train a Custom Model.

Pose Estimation Concepts

Before diving into the API, take a few minutes to familiarize yourself with the following high level concepts.


Keypoints are the specific features of an object we want to location in an image. For example, in the case of human pose estimation, keypoints are joints like shoulders, elbows, and knees.

On a car, keypoints might be each tire or a front headlight. 2D pose estimation models predict the (X, Y) coordinates of keypoints relative to the input image while 3D pose estimation predicts the (X, Y, Z) coordinates of keypointed, providing depth as well.


Unless otherwise specified, the Pose Estimation APIs perform 2D pose estimation.


A Skeleton is defined by a group of keypoints and their connections. Skeletons help decode raw model outputs to make them useful application logic, data organization, and visualization.

For example, the human skeleton for human pose estimation model contains 17 keypoints and connections required to draw a stick figure on people in an image. Each pose estimation model must have an associated skeleton. If you have trained a custom pose estimation model, you’ll need to define this skeleton in your application code.


When using the Fritz Data Collection API to gather model predictions from real world use, the skeleton associated with a model is automatically used to define annotation configurations for viewing and labeling images.

Human Pose Estimation

Human pose estimation is a specific use case of pose estimation that uses a model to predict the location of people in images.

Our pre-trained Human Pose Estimation models locates 17 body parts and joints for each person detected in an image.

Custom Pose Estimation

Custom pose estimation refers pose estimation models trained using templates and notebooks provided by Fritz AI. Custom pose models are compatible with the Pose Estimation SDK which handles pre- and post-processing as well as output visualization, data collection, and model management.

Rigid Pose Estimation

Rigid pose estimation is a method of performing 3D pose estimation by predicting and object’s position in two dimensions and then using information about the camera’s position and the object’s size to lift that 2D pose into 3D space.

This technique requires that the object be “rigid”, meaning it cannot bend or deform in any way. Rigid pose estimation is useful for augmented and virtual reality applications where physical objects can interact with virtual ones. It is not possible to perform rigid pose estimation for 3D human pose estimation.