Quickstart: The Shortest Dev Loop (Train / Play / Modify Tasks)#
This chapter only answers three questions:
How do I run a baseline quickly?
How do I validate my MDP is not broken?
Where do I iterate fastest (rewards / obs / randomization)?
0) Run a working baseline first#
velocity (Unitree G1, flat):
# Train: common overrides (tyro can override dataclass fields directly)
uv run train Mjlab-Velocity-Flat-Unitree-G1 --env.scene.num-envs 4096
# Play: load latest checkpoint from W&B (or pass --checkpoint-file)
uv run play Mjlab-Velocity-Flat-Unitree-G1 --wandb-run-path your-org/mjlab/run-id
tracking (Unitree G1, motion imitation):
# Tracking requires a motion registry (W&B artifact). train.py injects motion_file into the command cfg.
uv run train Mjlab-Tracking-Flat-Unitree-G1 \
--registry-name your-org/motions/motion-name \
--env.scene.num-envs 4096
uv run play Mjlab-Tracking-Flat-Unitree-G1 --wandb-run-path your-org/mjlab/run-id
homie (Unitree H1, mixed velocity + squat + disturbances):
# H1 is heavier and the task is more complex; start with fewer envs first.
uv run train Mjlab-Homie-Unitree-H1 --env.scene.num-envs 2048
uv run play Mjlab-Homie-Unitree-H1 --wandb-run-path your-org/mjlab/run-id
Homie also provides an optional “with hands” variant (mounts Robotiq 2F85, and adds policy-free random gripper motion):
uv run train Mjlab-Homie-Unitree-H1-with_hands --env.scene.num-envs 2048
uv run play Mjlab-Homie-Unitree-H1-with_hands --wandb-run-path your-org/mjlab/run-id
1) Use a dummy agent for a quick “MDP health check” (strongly recommended)#
Before training, run the env for a few hundred steps with zero / random actions:
uv run play Mjlab-Velocity-Flat-Unitree-G1 --agent zero
uv run play Mjlab-Velocity-Flat-Unitree-G1 --agent random
Tracking dummy agents still need a motion registry (otherwise commands cannot load motions):
uv run play Mjlab-Tracking-Flat-Unitree-G1 \
--agent random \
--registry-name your-org/motions/motion-name
Homie dummy agents do not need extra arguments:
uv run play Mjlab-Homie-Unitree-H1 --agent zero
uv run play Mjlab-Homie-Unitree-H1 --agent random
# Same for the with_hands variant (gripper is policy-free, 0-dim action)
uv run play Mjlab-Homie-Unitree-H1-with_hands --agent random
What should you look for?
Stability: NaN/Inf, exploding values (enable NaN guard if needed, see Debugging & Performance: Stay Stable and Fast).
Obs/action shapes: mismatched dims usually fail early in
ActionManageror when building gym spaces.Reward signals: check viewer / logger (metrics are available in
extras).
2) Where do I modify things fastest?#
These tasks follow the pattern: base env cfg + robot-specific override.
base cfg (task definition):
velocity:
src/mjlab/tasks/velocity/velocity_env_cfg.py::make_velocity_env_cfgtracking:
src/mjlab/tasks/tracking/tracking_env_cfg.py::make_tracking_env_cfg
g1 overrides (fill-in overrides):
velocity:
src/mjlab/tasks/velocity/config/g1/env_cfgs.pytracking:
src/mjlab/tasks/tracking/config/g1/env_cfgs.py
Homie uses the same structure (base cfg + H1 override):
base cfg:
src/mjlab/tasks/homie/homie_env_cfg.py::make_homie_env_cfgh1 override:
src/mjlab/tasks/homie/config/h1/env_cfgs.py::unitree_h1_homie_env_cfg
If you want to modify Homie, start with Example 3: Homie — Mixed Motion and Disturbances (Unitree H1). The most common knobs are:
Action split / scaling: policy controls legs only; upper body / gripper are policy-free actions.
Commands & env grouping: who walks vs squats vs stands (
mdp.assign_homie_env_groups).Disturbances / randomization: pushes, hand loads, friction randomization (
events).
Suggested iteration order (fastest feedback first):
Reward weights: tweak
RewardTermCfg(weight=...)first.Observation terms: add/remove terms in
observations["policy"].terms.Randomization: edit
events(startup/reset/interval) and domain randomization fields.Command distributions: adjust ranges and sampling modes in
commands.
3) How does CLI override configs?#
train.py / play.py use tyro to parse dataclasses, so you can override fields directly from CLI:
# Override num_envs, episode length, viewer resolution, and (example) a reward weight
uv run train Mjlab-Velocity-Flat-Unitree-G1 \
--env.scene.num-envs 2048 \
--env.episode-length-s 15 \
--env.viewer.width 1280 --env.viewer.height 720
Note
Tyro overrides work best for dataclass fields. For “deep” keys inside dicts (e.g., a specific reward term),
it is usually cleaner to edit the corresponding config/<robot>/env_cfgs.py in Python to keep CLI usage maintainable.