Example 3: Homie — Mixed Motion and Disturbances (Unitree H1)#

Homie is a more “composite” task that mixes velocity tracking, squatting (height control), and upper-body random disturbances.

Two task ids are provided:

  • Mjlab-Homie-Unitree-H1: the default version.

  • Mjlab-Homie-Unitree-H1-with_hands: mounts Robotiq 2F85 grippers (and adds policy-free random gripper motion).

The core idea is: reduce the policy action space to the lower body, and treat the upper body (and optional grippers) as smooth, time-varying disturbances. This helps the policy keep robust leg locomotion under changing body poses.

Task skeleton: make_homie_env_cfg (base cfg)#

Path: src/mjlab/tasks/homie/homie_env_cfg.py

Homie provides two command generators, both supporting env-group gating:

  1. twist (UniformVelocityCommand): target base linear (x/y) and yaw velocities.

  2. height (RelativeHeightCommand): target pelvis height relative to feet (squat motion).

# file: src/mjlab/tasks/homie/homie_env_cfg.py
commands = {
    "twist": UniformVelocityCommandCfg(
        ...,
        active_env_group="velocity",
        rel_standing_envs=1.0 / 6.0,
        avoid_consecutive_standing=True,
    ),
    "height": RelativeHeightCommandCfg(
        entity_name="robot",
        active_env_group="squat",
        # Smooth height-command transitions (avoid step changes at resampling).
        interp_rate=0.02,
        foot_site_names=(), # filled by robot override
        ranges=RelativeHeightCommandCfg.Ranges(height=(0.6, 1.0)),
    ),
}

Env grouping: train three “subtasks” in one vectorized env#

Path: src/mjlab/tasks/homie/mdp/curriculums.py::assign_homie_env_groups

Homie partitions the vectorized env into three env groups (masks), and uses the group names in commands / rewards / curriculum for gating:

  • squat: ~20% (set_x < 1/5), focuses on height commands (squatting).

  • standing: ~13.3% (1/5 <= set_x <= 1/3), focuses on “stand still” stability under disturbances.

  • velocity: ~66.7% (set_x > 1/3), focuses on velocity tracking (walking/running).

Two key takeaways:

  • Command gating:

    • twist has active_env_group="velocity": non-velocity envs are forced to twist=0 (standing).

    • height has active_env_group="squat": non-squat envs are set to inactive_height (filled by robot override), avoiding height commands “confusing” walking envs.

  • Reward gating: many reward terms specify env_group=... to only activate on some groups (e.g., standing stabilization terms, squat-only geometric constraints).

H1 override: unitree_h1_homie_env_cfg#

Path: src/mjlab/tasks/homie/config/h1/env_cfgs.py::unitree_h1_homie_env_cfg

Homie still follows base cfg + robot-specific override. The H1 override mainly:

  • Switches to plane terrain and disables terrain curriculum (remove terrain_levels).

  • Splits actions: policy controls legs (hip/knee/ankle); upper-body motion is generated by a policy-free action (next section).

  • Binds commands to H1 foot geometry: fill foot_site_names for the height command, and set squat/standing ranges + inactive_height:

    • height_cmd.ranges.height = (0.4, 0.98)

    • height_cmd.inactive_height = 0.98 (keep a stable standing height outside squat envs)

  • Adds sensors and contact penalties: adds self_collision and hip_knee_ground_contact sensors, and wires the hip_knee_contact reward term.

  • Configures feet “parallel” rewards: fills H1 foot corner sites (*_foot_fi/fo/ri/ro) for feet_ground_parallel / feet_parallel and reorders right-foot sites to match left/right local frames.

  • Adds disturbances/randomization: step-scheduled external pushes, and a reset-time constant downward hand load (0–5kg equivalent, hand_load).

  • Optional with_hands version: when hands=True, mounts 2F85 and adds a policy-free gripper action with interval resampling (see below).

Core feature: UpperBodyPoseAction (policy-free, 0-dim action)#

Path: src/mjlab/tasks/homie/config/h1/env_cfgs.py

Besides the policy-controlled joint_pos action, H1 Homie adds upper_body_pose (policy action dim = 0):

  • 0 policy dims: does not increase the neural network output size.

  • Smooth interpolation: maintains an internal pose target and moves toward it via torch.lerp each step.

  • Periodic resampling: an EventTermCfg periodically samples a new goal pose (default: every 2 seconds).

  • Optional rate limiting: max_speed_rad_s clamps per-step target changes to avoid overly abrupt motion.

# Upper-body action config (policy-free)
cfg.actions["upper_body_pose"] = UpperBodyPoseActionCfg(
    entity_name="robot",
    joint_names=upper_body_joint_expr,
    interp_rate=0.05,
    max_speed_rad_s=1.0,
    target_range=(-0.6, 0.6),
    initial_ratio=0.0,  # training starts at 0; play mode uses 1.0
    use_sampled_ratio=True,
)

# Interval event: resample goals (range is larger but clamped by joint limits + ratio)
cfg.events["upper_body_random_targets"] = EventTermCfg(
    func=_sample_upper_body_targets_with_curriculum,
    mode="interval",
    interval_range_s=(2.0, 2.0),
    params={
        "action_name": "upper_body_pose",
        "target_range": (-3.0, 1.0),
        "start_step": step_threshold,
    },
)

Curriculum: gradually increase disturbance strength#

Path: src/mjlab/tasks/homie/mdp/curriculums.py

To avoid overwhelming early training, Homie uses upper_body_action_curriculum:

  • Performance-coupled: when the average track_linear_velocity reward exceeds a threshold (e.g., 0.8), increase disturbance amplitude.

  • Linear growth: ratio increases from 0 to 1.0.

cfg.curriculum["upper_body_action"] = CurriculumTermCfg(
    func=mdp.upper_body_action_curriculum,
    params={
        "action_name": "upper_body_pose",
        "reward_name": "track_linear_velocity",
        "success_threshold": 0.8,
        "increment": 0.05,
        "max_ratio": 1.0,
        "start_step": step_threshold,
    },
)

Rewards & terminations: balancing mixed objectives#

Homie needs to balance “walk” and “squat” objectives while being robust to upper-body disturbances.

1) Rewards: decouple objectives via env groups#

  • Env-group gating:

    • many reward terms use env_group=... so they only apply to some groups

    • H1 override adds extra standing stabilization (track_*_standing) to reduce residual sway in standing

  • Regularizers for robustness:

    • knee_deviation_reward: penalize knee lateral deviation during squat, encouraging reasonable squatting posture

    • upright: keep the torso upright (critical for resisting upper-body disturbances)

    • feet_ground_parallel / feet_parallel: constrain feet orientation vs ground / between feet (requires per-robot corner site config)

    • hip_knee_contact / self_collisions: penalize “bad contacts” via rewards instead of terminating too early

2) Terminations: looser coupling#

  • Relaxed posture limits: H1 has larger motion ranges and disturbances; fell_over thresholds are typically less strict than smaller robots.

  • Self-collision handling: Homie prefers to keep training signal via reward penalties rather than immediate termination for large-range motion.

H1 override and H1 constants#

Path: src/mjlab/asset_zoo/robots/unitree_h1/h1_constants.py

Homie uses H1-specific actuator parameters heavily:

  • Multiple actuator groups: H1 is split into HIP_KNEE, ANKLE_TORSO, and ARM groups with different stiffness/damping.

  • Automatic action scale: per-joint scaling computed from actuator effort_limit / stiffness.

# Compute action scale automatically
for a in H1_ARTICULATION.actuators:
    names = a.target_names_expr
    for n in names:
        H1_ACTION_SCALE[n] = 0.25 * a.effort_limit / a.stiffness

with_hands: gripper variant (policy-free)#

Path: src/mjlab/tasks/homie/config/h1/__init__.py and src/mjlab/tasks/homie/config/h1/env_cfgs.py

If you choose Mjlab-Homie-Unitree-H1-with_hands:

  • The robot config mounts 2F85 via get_h1_robot_cfg(hands=...) (default mount config: _default_hands_cfg).

  • The env adds a policy-free gripper action (0-dim) and an interval event that resamples gripper targets periodically (similar spirit to the upper-body action).

Why Homie is a good reference#

Homie is a great reference if you want to build tasks with:

  1. Mixed objectives: velocity tracking + height control in one setup.

  2. Partial actuation: policy controls only part of the body; the rest follows scripted / random targets.

  3. Curriculum beyond domain params: dynamically changing action behavior (not just friction/mass randomization).