mjlab.envs#

RL environment classes.

Classes:

ManagerBasedRlEnv

Manager-based RL environment.

ManagerBasedRlEnvCfg

Configuration for a manager-based RL environment.

class mjlab.envs.ManagerBasedRlEnv[source]#

Bases: object

Manager-based RL environment.

Attributes:

is_vector_env

metadata

cfg

num_envs

Number of parallel environments.

physics_dt

Physics simulation step size.

step_dt

Environment step size (physics_dt * decimation).

device

Device for computation.

max_episode_length_s

Maximum episode length in seconds.

max_episode_length

Maximum episode length in steps.

unwrapped

Get the unwrapped environment (base case for wrapper chains).

Methods:

__init__(cfg, device[, render_mode])

setup_manager_visualizers()

load_managers()

Load and initialize all managers.

reset(*[, seed, env_ids, options])

step(action)

render()

close()

seed([seed])

update_visualizers(visualizer)

set_env_group_mask(group_name, mask)

Register or update a boolean mask identifying an environment group.

get_env_group_mask(group_name)

Return boolean mask (num_envs,) for the requested group.

filter_env_ids_by_group(env_ids, group_name)

Filter env_ids to those in the named environment group.

is_vector_env = True#
metadata = {'mujoco_version': '3.4.1', 'render_modes': [None, 'rgb_array'], 'warp_version': warp.config.version}#
__init__(cfg: ManagerBasedRlEnvCfg, device: str, render_mode: str | None = None, **kwargs) None[source]#
cfg: ManagerBasedRlEnvCfg#
env_group_masks: dict[str, Tensor]#
property num_envs: int#

Number of parallel environments.

property physics_dt: float#

Physics simulation step size.

property step_dt: float#

Environment step size (physics_dt * decimation).

property device: str#

Device for computation.

property max_episode_length_s: float#

Maximum episode length in seconds.

property max_episode_length: int#

Maximum episode length in steps.

property unwrapped: ManagerBasedRlEnv#

Get the unwrapped environment (base case for wrapper chains).

setup_manager_visualizers() None[source]#
load_managers() None[source]#

Load and initialize all managers.

Order is important! Event and command managers must be loaded first, then action and observation managers, then other RL managers.

reset(*, seed: int | None = None, env_ids: Tensor | None = None, options: dict[str, Any] | None = None) tuple[Dict[str, Tensor | Dict[str, Tensor]], dict][source]#
step(action: Tensor) tuple[Dict[str, Tensor | Dict[str, Tensor]], Tensor, Tensor, Tensor, dict][source]#
render() ndarray | None[source]#
close() None[source]#
static seed(seed: int = -1) int[source]#
update_visualizers(visualizer: DebugVisualizer) None[source]#
set_env_group_mask(group_name: str, mask: Tensor) None[source]#

Register or update a boolean mask identifying an environment group.

get_env_group_mask(group_name: str) Tensor[source]#

Return boolean mask (num_envs,) for the requested group.

filter_env_ids_by_group(env_ids: Tensor | slice, group_name: str) Tensor[source]#

Filter env_ids to those in the named environment group.

class mjlab.envs.ManagerBasedRlEnvCfg[source]#

Bases: object

Configuration for a manager-based RL environment.

This config defines all aspects of an RL environment: the physical scene, observations, actions, rewards, terminations, and optional features like commands and curriculum learning.

The environment step size is sim.mujoco.timestep * decimation. For example, with a 2ms physics timestep and decimation=10, the environment runs at 50Hz.

Attributes:

decimation

Number of physics simulation steps per environment step.

scene

Scene configuration defining terrain, entities, and sensors.

observations

Observation groups configuration.

actions

Action terms configuration.

events

Event terms for domain randomization and state resets.

seed

Random seed for reproducibility.

sim

Simulation configuration including physics timestep, solver iterations, contact parameters, and NaN guarding.

viewer

Viewer configuration for rendering (camera position, resolution, etc.).

episode_length_s

Duration of an episode (in seconds).

rewards

Reward terms configuration.

terminations

Termination terms configuration.

commands

Command generator terms (e.g., velocity targets).

curriculum

Curriculum terms for adaptive difficulty.

is_finite_horizon

Whether the task has a finite or infinite horizon.

scale_rewards_by_dt

Whether to multiply rewards by the environment step duration (dt).

Methods:

__init__(*, decimation, scene[, ...])

decimation: int#

Number of physics simulation steps per environment step. Higher values mean coarser control frequency. Environment step duration = physics_dt * decimation.

scene: SceneCfg#

Scene configuration defining terrain, entities, and sensors. The scene specifies num_envs, the number of parallel environments.

observations: dict[str, ObservationGroupCfg]#

Observation groups configuration. Each group (e.g., “policy”, “critic”) contains observation terms that are concatenated. Groups can have different settings for noise, history, and delay.

actions: dict[str, ActionTermCfg]#

Action terms configuration. Each term controls a specific entity/aspect (e.g., joint positions). Action dimensions are concatenated across terms.

events: dict[str, EventTermCfg]#

Event terms for domain randomization and state resets. Default includes reset_scene_to_default which resets entities to their initial state. Can be set to empty to disable all events including default reset.

seed: int | None = None#

Random seed for reproducibility. If None, a random seed is used. The actual seed used is stored back into this field after initialization.

sim: SimulationCfg#

Simulation configuration including physics timestep, solver iterations, contact parameters, and NaN guarding.

viewer: ViewerConfig#

Viewer configuration for rendering (camera position, resolution, etc.).

episode_length_s: float = 0.0#

Duration of an episode (in seconds).

Episode length in steps is computed as:

ceil(episode_length_s / (sim.mujoco.timestep * decimation))

rewards: dict[str, RewardTermCfg]#

Reward terms configuration.

terminations: dict[str, TerminationTermCfg]#

Termination terms configuration. If empty, episodes never reset. Use mdp.time_out with time_out=True for episode time limits.

commands: dict[str, CommandTermCfg]#

Command generator terms (e.g., velocity targets).

curriculum: dict[str, CurriculumTermCfg]#

Curriculum terms for adaptive difficulty.

is_finite_horizon: bool = False#

Whether the task has a finite or infinite horizon. Defaults to False (infinite).

  • Finite horizon (True): The time limit defines the task boundary. When reached, no future value exists beyond it, so the agent receives a terminal done signal.

  • Infinite horizon (False): The time limit is an artificial cutoff. The agent receives a truncated done signal to bootstrap the value of continuing beyond the limit.

scale_rewards_by_dt: bool = True#

Whether to multiply rewards by the environment step duration (dt).

When True (default), reward values are scaled by step_dt to normalize cumulative episodic rewards across different simulation frequencies. Set to False for algorithms that expect unscaled reward signals (e.g., HER, static reward scaling).

__init__(*, decimation: int, scene: ~mjlab.scene.scene.SceneCfg, observations: dict[str, ~mjlab.managers.observation_manager.ObservationGroupCfg] = <factory>, actions: dict[str, ~mjlab.managers.action_manager.ActionTermCfg] = <factory>, events: dict[str, ~mjlab.managers.event_manager.EventTermCfg] = <factory>, seed: int | None = None, sim: ~mjlab.sim.sim.SimulationCfg = <factory>, viewer: ~mjlab.viewer.viewer_config.ViewerConfig = <factory>, episode_length_s: float = 0.0, rewards: dict[str, ~mjlab.managers.reward_manager.RewardTermCfg] = <factory>, terminations: dict[str, ~mjlab.managers.termination_manager.TerminationTermCfg] = <factory>, commands: dict[str, ~mjlab.managers.command_manager.CommandTermCfg] = <factory>, curriculum: dict[str, ~mjlab.managers.curriculum_manager.CurriculumTermCfg] = <factory>, is_finite_horizon: bool = False, scale_rewards_by_dt: bool = True) None#