.. _walkthrough-en-task-homie-h1: Example 3: Homie — Mixed Motion and Disturbances (Unitree H1) ==================================================================== Homie is a more “composite” task that mixes **velocity tracking**, **squatting (height control)**, and **upper-body random disturbances**. Two task ids are provided: - ``Mjlab-Homie-Unitree-H1``: the default version. - ``Mjlab-Homie-Unitree-H1-with_hands``: mounts Robotiq 2F85 grippers (and adds policy-free random gripper motion). The core idea is: **reduce the policy action space to the lower body, and treat the upper body (and optional grippers) as smooth, time-varying disturbances.** This helps the policy keep robust leg locomotion under changing body poses. Task registration ----------------- Path: ``src/mjlab_homierl/__init__.py`` The external package registers both task ids through ``mjlab.tasks.registry.register_mjlab_task``. The registered play env uses the same H1 override, but with ``play=True`` so the task can switch into a lighter inference-oriented configuration. Task skeleton: make_homie_env_cfg (base cfg) -------------------------------------------- Path: ``src/mjlab_homierl/homie_env_cfg.py`` Homie provides two command generators, both supporting env-group gating: 1. **twist** (``UniformVelocityCommand``): target base linear (x/y) and yaw velocities. 2. **height** (``RelativeHeightCommand``): target pelvis height relative to feet (squat motion). .. code-block:: python # file: src/mjlab_homierl/homie_env_cfg.py commands = { "twist": UniformVelocityCommandCfg( ..., active_env_group="velocity", rel_standing_envs=1.0 / 6.0, avoid_consecutive_standing=True, ), "height": RelativeHeightCommandCfg( entity_name="robot", active_env_group="squat", # Smooth height-command transitions (avoid step changes at resampling). interp_rate=0.02, foot_site_names=(), # filled by robot override ranges=RelativeHeightCommandCfg.Ranges(height=(0.6, 1.0)), ), } Env grouping: train three “subtasks” in one vectorized env -------------------------------------------------------------------- Path: ``src/mjlab_homierl/mdp/curriculums.py::assign_homie_env_groups`` Homie partitions the vectorized env into three env groups (masks), and uses the group names in ``commands`` / ``rewards`` / ``curriculum`` for gating: - ``squat``: ~20% (``set_x < 1/5``), focuses on height commands (squatting). - ``standing``: ~13.3% (``1/5 <= set_x <= 1/3``), focuses on “stand still” stability under disturbances. - ``velocity``: ~66.7% (``set_x > 1/3``), focuses on velocity tracking (walking/running). Two key takeaways: - **Command gating**: - ``twist`` has ``active_env_group="velocity"``: non-velocity envs are forced to ``twist=0`` (standing). - ``height`` has ``active_env_group="squat"``: non-squat envs are set to ``inactive_height`` (filled by robot override), avoiding height commands “confusing” walking envs. - **Reward gating**: many reward terms specify ``env_group=...`` to only activate on some groups (e.g., standing stabilization terms, squat-only geometric constraints). H1 override: unitree_h1_homie_env_cfg ------------------------------------- Path: ``src/mjlab_homierl/env_cfgs.py::unitree_h1_homie_env_cfg`` Homie still follows **base cfg + robot-specific override**. The H1 override mainly: - **Switches to plane terrain and disables terrain curriculum** (remove ``terrain_levels``). - **Keeps MuJoCo CCD at the flat-task default**: ``cfg.sim.mujoco.ccd_iterations = 50``. - **Splits actions**: policy controls legs (hip/knee/ankle); upper-body motion is generated by a policy-free action (next section). - **Binds commands to H1 foot geometry**: fill ``foot_site_names`` for the height command, and set squat/standing ranges + ``inactive_height``: - ``height_cmd.ranges.height = (0.4, 0.98)`` - ``height_cmd.inactive_height = 0.98`` (keep a stable standing height outside squat envs) - **Adds sensors and contact penalties**: adds ``self_collision`` and ``hip_knee_ground_contact`` sensors, and wires the ``hip_knee_contact`` reward term. - **Configures feet “parallel” rewards**: fills H1 foot corner sites (``*_foot_fi/fo/ri/ro``) for ``feet_ground_parallel`` / ``feet_parallel`` and reorders right-foot sites to match left/right local frames. - **Adds disturbances/randomization**: step-scheduled external pushes, and a reset-time constant downward hand load (0–5kg equivalent, ``hand_load``). - **Optional with_hands version**: when ``hands=True``, mounts 2F85 and adds a policy-free ``gripper`` action with interval resampling (see below). - **Play override**: when ``play=True``, the env removes critic observations, rewards, and curriculum, and disables push / hand-load disturbances so viewer playback stays responsive. Core feature: UpperBodyPoseAction (policy-free, 0-dim action) -------------------------------------------------------------------- Path: ``src/mjlab_homierl/env_cfgs.py`` Besides the policy-controlled ``joint_pos`` action, H1 Homie adds ``upper_body_pose`` (policy action dim = 0): - **0 policy dims**: does not increase the neural network output size. - **Smooth interpolation**: maintains an internal pose target and moves toward it via ``torch.lerp`` each step. - **Periodic resampling**: an ``EventTermCfg`` periodically samples a new goal pose (default: every 2 seconds). - **Optional rate limiting**: ``max_speed_rad_s`` clamps per-step target changes to avoid overly abrupt motion. .. code-block:: python # Upper-body action config (policy-free) cfg.actions["upper_body_pose"] = UpperBodyPoseActionCfg( entity_name="robot", joint_names=upper_body_joint_expr, interp_rate=0.05, max_speed_rad_s=1.0, target_range=(-0.6, 0.6), initial_ratio=0.0, # training starts at 0; play mode uses 1.0 use_sampled_ratio=True, ) # Interval event: resample goals (range is larger but clamped by joint limits + ratio) cfg.events["upper_body_random_targets"] = EventTermCfg( func=_sample_upper_body_targets_with_curriculum, mode="interval", interval_range_s=(2.0, 2.0), params={ "action_name": "upper_body_pose", "target_range": (-3.0, 1.0), "start_step": step_threshold, }, ) Curriculum: gradually increase disturbance strength -------------------------------------------------------------------- Path: ``src/mjlab_homierl/mdp/curriculums.py`` To avoid overwhelming early training, Homie uses ``upper_body_action_curriculum``: - **Performance-coupled**: when the average ``track_linear_velocity`` reward exceeds a threshold (e.g., 0.8), increase disturbance amplitude. - **Linear growth**: ratio increases from 0 to 1.0. .. code-block:: python cfg.curriculum["upper_body_action"] = CurriculumTermCfg( func=mdp.upper_body_action_curriculum, params={ "action_name": "upper_body_pose", "reward_name": "track_linear_velocity", "success_threshold": 0.8, "increment": 0.05, "max_ratio": 1.0, "start_step": step_threshold, }, ) Rewards & terminations: balancing mixed objectives -------------------------------------------------------------------- Homie needs to balance “walk” and “squat” objectives while being robust to upper-body disturbances. 1) Rewards: decouple objectives via env groups ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - **Env-group gating**: - many reward terms use ``env_group=...`` so they only apply to some groups - H1 override adds extra standing stabilization (``track_*_standing``) to reduce residual sway in ``standing`` - **Regularizers for robustness**: - ``knee_deviation_reward``: penalize knee lateral deviation during squat, encouraging reasonable squatting posture - ``upright``: keep the torso upright (critical for resisting upper-body disturbances) - ``feet_ground_parallel`` / ``feet_parallel``: constrain feet orientation vs ground / between feet (requires per-robot corner site config) - ``hip_knee_contact`` / ``self_collisions``: penalize “bad contacts” via rewards instead of terminating too early 2) Terminations: looser coupling ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - **Relaxed posture limits**: H1 has larger motion ranges and disturbances; ``fell_over`` thresholds are typically less strict than smaller robots. - **Self-collision handling**: Homie prefers to keep training signal via reward penalties rather than immediate termination for large-range motion. H1 override and H1 constants ---------------------------- Path: ``src/mjlab_homierl/robots/unitree_h1/h1_constants.py`` Homie uses H1-specific actuator parameters heavily: - **Multiple actuator groups**: H1 is split into ``HIP_KNEE``, ``ANKLE_TORSO``, and ``ARM`` groups with different stiffness/damping. - **Automatic action scale**: per-joint scaling computed from actuator ``effort_limit / stiffness``. .. code-block:: python # Compute action scale automatically for a in H1_ARTICULATION.actuators: names = a.target_names_expr for n in names: H1_ACTION_SCALE[n] = 0.25 * a.effort_limit / a.stiffness with_hands: gripper variant (policy-free) -------------------------------------------------------------------- Path: ``src/mjlab_homierl/env_cfgs.py`` and ``src/mjlab_homierl/robots/unitree_h1/h1_constants.py`` If you choose ``Mjlab-Homie-Unitree-H1-with_hands``: - The robot config mounts 2F85 via ``get_h1_robot_cfg(hands=...)`` (default mount config: ``_default_hands_cfg``). - The env adds a policy-free ``gripper`` action (0-dim) and an interval event that resamples gripper targets periodically (similar spirit to the upper-body action). - Hand collisions are disabled by default. HOMIE uses the grippers as disturbance attachments rather than manipulation contacts, which avoids saturating MuJoCo CCD on the locomotion task. Play-time runner behavior ------------------------- Path: ``src/mjlab_homierl/rl/runner.py`` Official ``mjlab`` play builds a full runner first, but HOMIE adds a custom inference-only path. When the play env omits the ``critic`` observation group, ``HomieHimOnPolicyRunner`` builds an actor-only policy and loads only the actor weights that match. This is why HOMIE play can drop the critic group without breaking checkpoint loading. Why Homie is a good reference ----------------------------- Homie is a great reference if you want to build tasks with: 1. **Mixed objectives**: velocity tracking + height control in one setup. 2. **Partial actuation**: policy controls only part of the body; the rest follows scripted / random targets. 3. **Curriculum beyond domain params**: dynamically changing action behavior (not just friction/mass randomization).