Example 3: Homie — Mixed Motion and Disturbances (Unitree H1)¶
Homie is a more “composite” task that mixes velocity tracking, squatting (height control), and upper-body random disturbances.
Two task ids are provided:
Mjlab-Homie-Unitree-H1: the default version.Mjlab-Homie-Unitree-H1-with_hands: mounts Robotiq 2F85 grippers (and adds policy-free random gripper motion).
The core idea is: reduce the policy action space to the lower body, and treat the upper body (and optional grippers) as smooth, time-varying disturbances. This helps the policy keep robust leg locomotion under changing body poses.
Task registration¶
Path: src/mjlab_homierl/__init__.py
The external package registers both task ids through
mjlab.tasks.registry.register_mjlab_task. The registered play env uses the
same H1 override, but with play=True so the task can switch into a lighter
inference-oriented configuration.
Task skeleton: make_homie_env_cfg (base cfg)¶
Path: src/mjlab_homierl/homie_env_cfg.py
Homie provides two command generators, both supporting env-group gating:
twist (
UniformVelocityCommand): target base linear (x/y) and yaw velocities.height (
RelativeHeightCommand): target pelvis height relative to feet (squat motion).
# file: src/mjlab_homierl/homie_env_cfg.py
commands = {
"twist": UniformVelocityCommandCfg(
...,
active_env_group="velocity",
rel_standing_envs=1.0 / 6.0,
avoid_consecutive_standing=True,
),
"height": RelativeHeightCommandCfg(
entity_name="robot",
active_env_group="squat",
# Smooth height-command transitions (avoid step changes at resampling).
interp_rate=0.02,
foot_site_names=(), # filled by robot override
ranges=RelativeHeightCommandCfg.Ranges(height=(0.6, 1.0)),
),
}
Env grouping: train three “subtasks” in one vectorized env¶
Path: src/mjlab_homierl/mdp/curriculums.py::assign_homie_env_groups
Homie partitions the vectorized env into three env groups (masks), and uses the group names in commands / rewards / curriculum for gating:
squat: ~20% (set_x < 1/5), focuses on height commands (squatting).standing: ~13.3% (1/5 <= set_x <= 1/3), focuses on “stand still” stability under disturbances.velocity: ~66.7% (set_x > 1/3), focuses on velocity tracking (walking/running).
Two key takeaways:
Command gating:
twisthasactive_env_group="velocity": non-velocity envs are forced totwist=0(standing).heighthasactive_env_group="squat": non-squat envs are set toinactive_height(filled by robot override), avoiding height commands “confusing” walking envs.
Reward gating: many reward terms specify
env_group=...to only activate on some groups (e.g., standing stabilization terms, squat-only geometric constraints).
H1 override: unitree_h1_homie_env_cfg¶
Path: src/mjlab_homierl/env_cfgs.py::unitree_h1_homie_env_cfg
Homie still follows base cfg + robot-specific override. The H1 override mainly:
Switches to plane terrain and disables terrain curriculum (remove
terrain_levels).Keeps MuJoCo CCD at the flat-task default:
cfg.sim.mujoco.ccd_iterations = 50.Splits actions: policy controls legs (hip/knee/ankle); upper-body motion is generated by a policy-free action (next section).
Binds commands to H1 foot geometry: fill
foot_site_namesfor the height command, and set squat/standing ranges +inactive_height:height_cmd.ranges.height = (0.4, 0.98)height_cmd.inactive_height = 0.98(keep a stable standing height outside squat envs)
Adds sensors and contact penalties: adds
self_collisionandhip_knee_ground_contactsensors, and wires thehip_knee_contactreward term.Configures feet “parallel” rewards: fills H1 foot corner sites (
*_foot_fi/fo/ri/ro) forfeet_ground_parallel/feet_paralleland reorders right-foot sites to match left/right local frames.Adds disturbances/randomization: step-scheduled external pushes, and a reset-time constant downward hand load (0–5kg equivalent,
hand_load).Optional with_hands version: when
hands=True, mounts 2F85 and adds a policy-freegripperaction with interval resampling (see below).Play override: when
play=True, the env removes critic observations, rewards, and curriculum, and disables push / hand-load disturbances so viewer playback stays responsive.
Core feature: UpperBodyPoseAction (policy-free, 0-dim action)¶
Path: src/mjlab_homierl/env_cfgs.py
Besides the policy-controlled joint_pos action, H1 Homie adds upper_body_pose (policy action dim = 0):
0 policy dims: does not increase the neural network output size.
Smooth interpolation: maintains an internal pose target and moves toward it via
torch.lerpeach step.Periodic resampling: an
EventTermCfgperiodically samples a new goal pose (default: every 2 seconds).Optional rate limiting:
max_speed_rad_sclamps per-step target changes to avoid overly abrupt motion.
# Upper-body action config (policy-free)
cfg.actions["upper_body_pose"] = UpperBodyPoseActionCfg(
entity_name="robot",
joint_names=upper_body_joint_expr,
interp_rate=0.05,
max_speed_rad_s=1.0,
target_range=(-0.6, 0.6),
initial_ratio=0.0, # training starts at 0; play mode uses 1.0
use_sampled_ratio=True,
)
# Interval event: resample goals (range is larger but clamped by joint limits + ratio)
cfg.events["upper_body_random_targets"] = EventTermCfg(
func=_sample_upper_body_targets_with_curriculum,
mode="interval",
interval_range_s=(2.0, 2.0),
params={
"action_name": "upper_body_pose",
"target_range": (-3.0, 1.0),
"start_step": step_threshold,
},
)
Curriculum: gradually increase disturbance strength¶
Path: src/mjlab_homierl/mdp/curriculums.py
To avoid overwhelming early training, Homie uses upper_body_action_curriculum:
Performance-coupled: when the average
track_linear_velocityreward exceeds a threshold (e.g., 0.8), increase disturbance amplitude.Linear growth: ratio increases from 0 to 1.0.
cfg.curriculum["upper_body_action"] = CurriculumTermCfg(
func=mdp.upper_body_action_curriculum,
params={
"action_name": "upper_body_pose",
"reward_name": "track_linear_velocity",
"success_threshold": 0.8,
"increment": 0.05,
"max_ratio": 1.0,
"start_step": step_threshold,
},
)
Rewards & terminations: balancing mixed objectives¶
Homie needs to balance “walk” and “squat” objectives while being robust to upper-body disturbances.
1) Rewards: decouple objectives via env groups¶
Env-group gating:
many reward terms use
env_group=...so they only apply to some groupsH1 override adds extra standing stabilization (
track_*_standing) to reduce residual sway instanding
Regularizers for robustness:
knee_deviation_reward: penalize knee lateral deviation during squat, encouraging reasonable squatting postureupright: keep the torso upright (critical for resisting upper-body disturbances)feet_ground_parallel/feet_parallel: constrain feet orientation vs ground / between feet (requires per-robot corner site config)hip_knee_contact/self_collisions: penalize “bad contacts” via rewards instead of terminating too early
2) Terminations: looser coupling¶
Relaxed posture limits: H1 has larger motion ranges and disturbances;
fell_overthresholds are typically less strict than smaller robots.Self-collision handling: Homie prefers to keep training signal via reward penalties rather than immediate termination for large-range motion.
H1 override and H1 constants¶
Path: src/mjlab_homierl/robots/unitree_h1/h1_constants.py
Homie uses H1-specific actuator parameters heavily:
Multiple actuator groups: H1 is split into
HIP_KNEE,ANKLE_TORSO, andARMgroups with different stiffness/damping.Automatic action scale: per-joint scaling computed from actuator
effort_limit / stiffness.
# Compute action scale automatically
for a in H1_ARTICULATION.actuators:
names = a.target_names_expr
for n in names:
H1_ACTION_SCALE[n] = 0.25 * a.effort_limit / a.stiffness
with_hands: gripper variant (policy-free)¶
Path: src/mjlab_homierl/env_cfgs.py and src/mjlab_homierl/robots/unitree_h1/h1_constants.py
If you choose Mjlab-Homie-Unitree-H1-with_hands:
The robot config mounts 2F85 via
get_h1_robot_cfg(hands=...)(default mount config:_default_hands_cfg).The env adds a policy-free
gripperaction (0-dim) and an interval event that resamples gripper targets periodically (similar spirit to the upper-body action).Hand collisions are disabled by default. HOMIE uses the grippers as disturbance attachments rather than manipulation contacts, which avoids saturating MuJoCo CCD on the locomotion task.
Play-time runner behavior¶
Path: src/mjlab_homierl/rl/runner.py
Official mjlab play builds a full runner first, but HOMIE adds a custom
inference-only path. When the play env omits the critic observation group,
HomieHimOnPolicyRunner builds an actor-only policy and loads only the actor
weights that match. This is why HOMIE play can drop the critic group without
breaking checkpoint loading.
Why Homie is a good reference¶
Homie is a great reference if you want to build tasks with:
Mixed objectives: velocity tracking + height control in one setup.
Partial actuation: policy controls only part of the body; the rest follows scripted / random targets.
Curriculum beyond domain params: dynamically changing action behavior (not just friction/mass randomization).