.. _walkthrough-en-task-homie-h1:

Example 3: Homie — Mixed Motion and Disturbances (Unitree H1)
====================================================================

Homie is a more “composite” task that mixes **velocity tracking**, **squatting (height control)**, and **upper-body random disturbances**.

Two task ids are provided:

- ``Mjlab-Homie-Unitree-H1``: the default version.
- ``Mjlab-Homie-Unitree-H1-with_hands``: mounts Robotiq 2F85 grippers (and adds policy-free random gripper motion).

The core idea is:
**reduce the policy action space to the lower body, and treat the upper body (and optional grippers) as smooth, time-varying disturbances.**
This helps the policy keep robust leg locomotion under changing body poses.

Task registration
-----------------

Path: ``src/mjlab_homierl/__init__.py``

The external package registers both task ids through
``mjlab.tasks.registry.register_mjlab_task``. The registered play env uses the
same H1 override, but with ``play=True`` so the task can switch into a lighter
inference-oriented configuration.

Task skeleton: make_homie_env_cfg (base cfg)
--------------------------------------------

Path: ``src/mjlab_homierl/homie_env_cfg.py``

Homie provides two command generators, both supporting env-group gating:

1. **twist** (``UniformVelocityCommand``): target base linear (x/y) and yaw velocities.
2. **height** (``RelativeHeightCommand``): target pelvis height relative to feet (squat motion).

.. code-block:: python

   # file: src/mjlab_homierl/homie_env_cfg.py
   commands = {
       "twist": UniformVelocityCommandCfg(
           ...,
           active_env_group="velocity",
           rel_standing_envs=1.0 / 6.0,
           avoid_consecutive_standing=True,
       ),
       "height": RelativeHeightCommandCfg(
           entity_name="robot",
           active_env_group="squat",
           # Smooth height-command transitions (avoid step changes at resampling).
           interp_rate=0.02,
           foot_site_names=(), # filled by robot override
           ranges=RelativeHeightCommandCfg.Ranges(height=(0.6, 1.0)),
       ),
   }

Env grouping: train three “subtasks” in one vectorized env
--------------------------------------------------------------------

Path: ``src/mjlab_homierl/mdp/curriculums.py::assign_homie_env_groups``

Homie partitions the vectorized env into three env groups (masks), and uses the group names in ``commands`` / ``rewards`` / ``curriculum`` for gating:

- ``squat``: ~20% (``set_x < 1/5``), focuses on height commands (squatting).
- ``standing``: ~13.3% (``1/5 <= set_x <= 1/3``), focuses on “stand still” stability under disturbances.
- ``velocity``: ~66.7% (``set_x > 1/3``), focuses on velocity tracking (walking/running).

Two key takeaways:

- **Command gating**:

  - ``twist`` has ``active_env_group="velocity"``: non-velocity envs are forced to ``twist=0`` (standing).
  - ``height`` has ``active_env_group="squat"``: non-squat envs are set to ``inactive_height`` (filled by robot override), avoiding height commands “confusing” walking envs.

- **Reward gating**: many reward terms specify ``env_group=...`` to only activate on some groups (e.g., standing stabilization terms, squat-only geometric constraints).

H1 override: unitree_h1_homie_env_cfg
-------------------------------------

Path: ``src/mjlab_homierl/env_cfgs.py::unitree_h1_homie_env_cfg``

Homie still follows **base cfg + robot-specific override**. The H1 override mainly:

- **Switches to plane terrain and disables terrain curriculum** (remove ``terrain_levels``).
- **Keeps MuJoCo CCD at the flat-task default**: ``cfg.sim.mujoco.ccd_iterations = 50``.
- **Splits actions**: policy controls legs (hip/knee/ankle); upper-body motion is generated by a policy-free action (next section).
- **Binds commands to H1 foot geometry**: fill ``foot_site_names`` for the height command, and set squat/standing ranges + ``inactive_height``:

  - ``height_cmd.ranges.height = (0.4, 0.98)``
  - ``height_cmd.inactive_height = 0.98`` (keep a stable standing height outside squat envs)

- **Adds sensors and contact penalties**: adds ``self_collision`` and ``hip_knee_ground_contact`` sensors, and wires the ``hip_knee_contact`` reward term.
- **Configures feet “parallel” rewards**: fills H1 foot corner sites (``*_foot_fi/fo/ri/ro``) for ``feet_ground_parallel`` / ``feet_parallel`` and reorders right-foot sites to match left/right local frames.
- **Adds disturbances/randomization**: step-scheduled external pushes, and a reset-time constant downward hand load (0–5kg equivalent, ``hand_load``).
- **Optional with_hands version**: when ``hands=True``, mounts 2F85 and adds a
  policy-free ``gripper`` action with interval resampling (see below).
- **Play override**: when ``play=True``, the env removes critic observations,
  rewards, and curriculum, and disables push / hand-load disturbances so viewer
  playback stays responsive.

Core feature: UpperBodyPoseAction (policy-free, 0-dim action)
--------------------------------------------------------------------

Path: ``src/mjlab_homierl/env_cfgs.py``

Besides the policy-controlled ``joint_pos`` action, H1 Homie adds ``upper_body_pose`` (policy action dim = 0):

- **0 policy dims**: does not increase the neural network output size.
- **Smooth interpolation**: maintains an internal pose target and moves toward it via ``torch.lerp`` each step.
- **Periodic resampling**: an ``EventTermCfg`` periodically samples a new goal pose (default: every 2 seconds).
- **Optional rate limiting**: ``max_speed_rad_s`` clamps per-step target changes to avoid overly abrupt motion.

.. code-block:: python

   # Upper-body action config (policy-free)
   cfg.actions["upper_body_pose"] = UpperBodyPoseActionCfg(
       entity_name="robot",
       joint_names=upper_body_joint_expr,
       interp_rate=0.05,
       max_speed_rad_s=1.0,
       target_range=(-0.6, 0.6),
       initial_ratio=0.0,  # training starts at 0; play mode uses 1.0
       use_sampled_ratio=True,
   )

   # Interval event: resample goals (range is larger but clamped by joint limits + ratio)
   cfg.events["upper_body_random_targets"] = EventTermCfg(
       func=_sample_upper_body_targets_with_curriculum,
       mode="interval",
       interval_range_s=(2.0, 2.0),
       params={
           "action_name": "upper_body_pose",
           "target_range": (-3.0, 1.0),
           "start_step": step_threshold,
       },
   )

Curriculum: gradually increase disturbance strength
--------------------------------------------------------------------

Path: ``src/mjlab_homierl/mdp/curriculums.py``

To avoid overwhelming early training, Homie uses ``upper_body_action_curriculum``:

- **Performance-coupled**: when the average ``track_linear_velocity`` reward exceeds a threshold (e.g., 0.8), increase disturbance amplitude.
- **Linear growth**: ratio increases from 0 to 1.0.

.. code-block:: python

   cfg.curriculum["upper_body_action"] = CurriculumTermCfg(
       func=mdp.upper_body_action_curriculum,
       params={
           "action_name": "upper_body_pose",
           "reward_name": "track_linear_velocity",
           "success_threshold": 0.8,
           "increment": 0.05,
           "max_ratio": 1.0,
           "start_step": step_threshold,
       },
   )

Rewards & terminations: balancing mixed objectives
--------------------------------------------------------------------

Homie needs to balance “walk” and “squat” objectives while being robust to upper-body disturbances.

1) Rewards: decouple objectives via env groups
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- **Env-group gating**:

  - many reward terms use ``env_group=...`` so they only apply to some groups
  - H1 override adds extra standing stabilization (``track_*_standing``) to reduce residual sway in ``standing``

- **Regularizers for robustness**:

  - ``knee_deviation_reward``: penalize knee lateral deviation during squat, encouraging reasonable squatting posture
  - ``upright``: keep the torso upright (critical for resisting upper-body disturbances)
  - ``feet_ground_parallel`` / ``feet_parallel``: constrain feet orientation vs ground / between feet (requires per-robot corner site config)
  - ``hip_knee_contact`` / ``self_collisions``: penalize “bad contacts” via rewards instead of terminating too early

2) Terminations: looser coupling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- **Relaxed posture limits**: H1 has larger motion ranges and disturbances; ``fell_over`` thresholds are typically less strict than smaller robots.
- **Self-collision handling**: Homie prefers to keep training signal via reward penalties rather than immediate termination for large-range motion.

H1 override and H1 constants
----------------------------

Path: ``src/mjlab_homierl/robots/unitree_h1/h1_constants.py``

Homie uses H1-specific actuator parameters heavily:

- **Multiple actuator groups**: H1 is split into ``HIP_KNEE``, ``ANKLE_TORSO``, and ``ARM`` groups with different stiffness/damping.
- **Automatic action scale**: per-joint scaling computed from actuator ``effort_limit / stiffness``.

.. code-block:: python

   # Compute action scale automatically
   for a in H1_ARTICULATION.actuators:
       names = a.target_names_expr
       for n in names:
           H1_ACTION_SCALE[n] = 0.25 * a.effort_limit / a.stiffness

with_hands: gripper variant (policy-free)
--------------------------------------------------------------------

Path: ``src/mjlab_homierl/env_cfgs.py`` and ``src/mjlab_homierl/robots/unitree_h1/h1_constants.py``

If you choose ``Mjlab-Homie-Unitree-H1-with_hands``:

- The robot config mounts 2F85 via ``get_h1_robot_cfg(hands=...)`` (default mount config: ``_default_hands_cfg``).
- The env adds a policy-free ``gripper`` action (0-dim) and an interval event that resamples gripper targets periodically (similar spirit to the upper-body action).
- Hand collisions are disabled by default. HOMIE uses the grippers as
  disturbance attachments rather than manipulation contacts, which avoids
  saturating MuJoCo CCD on the locomotion task.

Play-time runner behavior
-------------------------

Path: ``src/mjlab_homierl/rl/runner.py``

Official ``mjlab`` play builds a full runner first, but HOMIE adds a custom
inference-only path. When the play env omits the ``critic`` observation group,
``HomieHimOnPolicyRunner`` builds an actor-only policy and loads only the actor
weights that match. This is why HOMIE play can drop the critic group without
breaking checkpoint loading.

Why Homie is a good reference
-----------------------------

Homie is a great reference if you want to build tasks with:

1. **Mixed objectives**: velocity tracking + height control in one setup.
2. **Partial actuation**: policy controls only part of the body; the rest follows scripted / random targets.
3. **Curriculum beyond domain params**: dynamically changing action behavior (not just friction/mass randomization).