Developer Walkthrough (Research & Development)#
The goal of this walkthrough is to help new contributors build a reliable mental model of the codebase in 1–2 hours, and start modifying / creating manager-based RL tasks with confidence.
What you will learn here#
Architecture overview: how
ManagerBasedRlEnv = Scene + Simulation + Managersfits together (data flow + control flow).The manager-based API: design philosophy, extension points, and how rewards / terminations are configured.
Task development workflow: how to build a task in mjlab (dict-based cfg), similar to Isaac Lab’s manager-based tasks.
Task deep-dives (G1/H1): a guided tour of
tasks/velocity,tasks/tracking, andtasks/homie(cfg / MDP / training entrypoints).
Recommended reading order#
If this is your first time reading the code, follow the order below (overview → env lifecycle → managers/terms → task examples → build your own).
If you already know Isaac Lab well, you can jump to managers_and_terms and then compare the task chapters with the code directly.
Contents
- Architecture Overview: mjlab in One Diagram
- Quickstart: The Shortest Dev Loop (Train / Play / Modify Tasks)
- Code Map: Where Should You Change Things?
- Key Class 1: ManagerBasedRlEnv (Lifecycle and Data Flow)
- Config entrypoint: ManagerBasedRlEnvCfg
- Three time scales to remember
- Env construction: Scene + Simulation + Managers
- Manager loading order (why it matters)
- step(): action → physics → done/reward → reset → obs
- Reset order is sensitive (why extras/log matter)
- Finite horizon vs infinite horizon (terminated vs truncated)
- Key Class 2: Managers + Terms (IsaacLab-like Core Abstractions)
- Rewards and Terminations
- Example 1: Velocity Tracking (Unitree G1)
- Example 2: Motion Tracking / Imitation (Unitree G1)
- Task skeleton: make_tracking_env_cfg (base cfg)
- MotionCommand: the “engine” of tracking
- Rewards & terminations: high-precision shadow imitation
- Training entrypoint: how is motion_file injected?
- G1 override: unitree_g1_flat_tracking_env_cfg
- Task registration
- If you want to modify tracking: where do you usually change things?
- Example 3: Homie — Mixed Motion and Disturbances (Unitree H1)
- Task skeleton: make_homie_env_cfg (base cfg)
- Env grouping: train three “subtasks” in one vectorized env
- H1 override: unitree_h1_homie_env_cfg
- Core feature: UpperBodyPoseAction (policy-free, 0-dim action)
- Curriculum: gradually increase disturbance strength
- Rewards & terminations: balancing mixed objectives
- H1 override and H1 constants
- with_hands: gripper variant (policy-free)
- Why Homie is a good reference
- How to Add a New G1 RL Task (From Zero to Trainable)
- Debugging & Performance: Stay Stable and Fast