Example 2: Motion Tracking / Imitation (Unitree G1)#
The key difference from velocity is: the command is no longer “sample a target velocity”, but “load a reference motion” and advance a time index every step.
So the core of tracking is MotionCommand (a CommandTerm subclass), and rewards/terminations mainly measure “reference vs robot” errors.
Task skeleton: make_tracking_env_cfg (base cfg)#
Path: src/mjlab/tasks/tracking/tracking_env_cfg.py
The base cfg defines the full MDP, but leaves motion-specific details to robot overrides (e.g., motion_file, anchor_body_name, body_names):
# file: src/mjlab/tasks/tracking/tracking_env_cfg.py
commands = {
"motion": MotionCommandCfg(
entity_name="robot",
resampling_time_range=(1e9, 1e9),
motion_file="", # override
anchor_body_name="", # override
body_names=(), # override
sampling_mode="adaptive",
debug_vis=True,
)
}
MotionCommand: the “engine” of tracking#
Path: src/mjlab/tasks/tracking/mdp/commands.py
You can think of MotionCommand as a “reference motion generator” responsible for:
loading the motion npz (joint_pos/joint_vel/body_pos_w/body_quat_w/…)
selecting the current time_step (supports start/uniform/adaptive sampling)
generating a reference pose relative to an anchor (used for metrics and observations)
initializing the robot near the reference pose at reset (plus random perturbations for robustness)
Rewards & terminations: high-precision shadow imitation#
In tracking, rewards and terminations revolve around: “make the robot become a shadow of the reference motion.”
1) Rewards: multi-dimensional error penalties#
Pose errors:
joint_pos_tracking: track each joint anglebody_pos_tracking/body_quat_tracking: track hands/feet/torso positions and orientations in world frame (relative to anchor)
Velocity errors:
joint_vel_tracking/body_lin_vel_tracking: encourage smooth motion, not just matching poses
2) Terminations: strict accuracy control#
Unlike velocity, tracking often uses “fail-fast” training:
Tracking error termination:
if torso/feet deviate from reference beyond a threshold (e.g., 0.3–0.5m), the robot is likely unstable → terminate immediately
prevents wasting compute on hopeless trajectories
Self-collision:
for human-like or difficult motions, self-collision is often strictly disallowed (terminate on contact)
Training entrypoint: how is motion_file injected?#
Path: src/mjlab/scripts/train.py
train.py detects tracking tasks by checking for commands["motion"] of type MotionCommandCfg.
It then requires --registry-name (a W&B artifact), downloads it, and writes the path into motion_cmd.motion_file:
# file: src/mjlab/scripts/train.py
is_tracking_task = (
cfg.env.commands is not None
and "motion" in cfg.env.commands
and isinstance(cfg.env.commands["motion"], MotionCommandCfg)
)
if is_tracking_task:
artifact = wandb.Api().artifact(registry_name)
motion_cmd = cfg.env.commands["motion"]
motion_cmd.motion_file = str(Path(artifact.download()) / "motion.npz")
The design intent: motion dataset management is decoupled from the training run (you don’t hardcode motion file paths in code).
G1 override: unitree_g1_flat_tracking_env_cfg#
Path: src/mjlab/tasks/tracking/config/g1/env_cfgs.py
This layer “grounds” the base cfg to G1:
bind the robot entity:
get_g1_robot_cfg()configure the self-collision sensor
set
MotionCommandCfg.anchor_body_nameandbody_names(decide which bodies participate in tracking)set termination end-effector bodies
provide a
has_state_estimation=Falsebranch: remove some obs terms to simulate missing state estimation
Example (has_state_estimation branch):
# file: src/mjlab/tasks/tracking/config/g1/env_cfgs.py
if not has_state_estimation:
new_policy_terms = {k: v for k, v in cfg.observations["policy"].terms.items()
if k not in ["motion_anchor_pos_b", "base_lin_vel"]}
cfg.observations["policy"] = ObservationGroupCfg(
terms=new_policy_terms, concatenate_terms=True, enable_corruption=True
)
Task registration#
Path: src/mjlab/tasks/tracking/config/g1/__init__.py
Besides the default task, a “No-State-Estimation” variant is registered via cfg variants:
# file: src/mjlab/tasks/tracking/config/g1/__init__.py
register_mjlab_task(
task_id="Mjlab-Tracking-Flat-Unitree-G1-No-State-Estimation",
env_cfg=unitree_g1_flat_tracking_env_cfg(has_state_estimation=False),
play_env_cfg=unitree_g1_flat_tracking_env_cfg(has_state_estimation=False, play=True),
rl_cfg=unitree_g1_tracking_ppo_runner_cfg(),
runner_cls=MotionTrackingOnPolicyRunner,
)
If you want to modify tracking: where do you usually change things?#
Body list / anchor: most important (directly defines errors and difficulty) Path:
src/mjlab/tasks/tracking/config/g1/env_cfgs.py(motion_cmd.body_names/anchor_body_name)sampling_mode: start/uniform/adaptive changes learning dynamics significantly Path:
MotionCommandCfg.sampling_mode(base cfg / play overrides)Reward std: controls the shape of
exp(-err/std^2)(signal strength) Path:src/mjlab/tasks/tracking/tracking_env_cfg.py(rewards dict)Termination thresholds: too strict → won’t learn; too loose → drift Path:
src/mjlab/tasks/tracking/tracking_env_cfg.py+ G1 override (fillbody_names)