mjlab.envs

mjlab.envs#

RL environment classes.

Classes:

`ManagerBasedRlEnv`	Manager-based RL environment.
`ManagerBasedRlEnvCfg`	Configuration for a manager-based RL environment.

class mjlab.envs.ManagerBasedRlEnv[source]#

Bases: object

Manager-based RL environment.

Attributes:

`is_vector_env`
`metadata`
`cfg`
`num_envs`	Number of parallel environments.
`physics_dt`	Physics simulation step size.
`step_dt`	Environment step size (physics_dt * decimation).
`device`	Device for computation.
`max_episode_length_s`	Maximum episode length in seconds.
`max_episode_length`	Maximum episode length in steps.
`unwrapped`	Get the unwrapped environment (base case for wrapper chains).

Methods:

`__init__`(cfg, device[, render_mode])
`setup_manager_visualizers`()
`load_managers`()	Load and initialize all managers.
`reset`(*[, seed, env_ids, options])
`step`(action)
`render`()
`close`()
`seed`([seed])
`update_visualizers`(visualizer)
`set_env_group_mask`(group_name, mask)	Register or update a boolean mask identifying an environment group.
`get_env_group_mask`(group_name)	Return boolean mask (num_envs,) for the requested group.
`filter_env_ids_by_group`(env_ids, group_name)	Filter env_ids to those in the named environment group.

is_vector_env = True#

metadata = {'mujoco_version': '3.4.1', 'render_modes': [None, 'rgb_array'], 'warp_version': warp.config.version}#

__init__(cfg: ManagerBasedRlEnvCfg, device: str, render_mode: str | None = None, **kwargs) → None[source]#

cfg: ManagerBasedRlEnvCfg#

env_group_masks: dict[str, Tensor]#

property num_envs: int#: Number of parallel environments.

property physics_dt: float#: Physics simulation step size.

property step_dt: float#: Environment step size (physics_dt * decimation).

property device: str#: Device for computation.

property max_episode_length_s: float#: Maximum episode length in seconds.

property max_episode_length: int#: Maximum episode length in steps.

property unwrapped: ManagerBasedRlEnv#: Get the unwrapped environment (base case for wrapper chains).

setup_manager_visualizers() → None[source]#

load_managers() → None[source]#

Load and initialize all managers.

Order is important! Event and command managers must be loaded first, then action and observation managers, then other RL managers.

reset(*, seed: int | None = None, env_ids: Tensor | None = None, options: dict[str, Any] | None = None) → tuple[Dict[str, Tensor | Dict[str, Tensor]], dict][source]#

step(action: Tensor) → tuple[Dict[str, Tensor | Dict[str, Tensor]], Tensor, Tensor, Tensor, dict][source]#

render() → ndarray | None[source]#

close() → None[source]#

static seed(seed: int = -1) → int[source]#

update_visualizers(visualizer: DebugVisualizer) → None[source]#

set_env_group_mask(group_name: str, mask: Tensor) → None[source]#: Register or update a boolean mask identifying an environment group.

get_env_group_mask(group_name: str) → Tensor[source]#: Return boolean mask (num_envs,) for the requested group.

filter_env_ids_by_group(env_ids: Tensor | slice, group_name: str) → Tensor[source]#: Filter env_ids to those in the named environment group.

class mjlab.envs.ManagerBasedRlEnvCfg[source]#

Bases: object

Configuration for a manager-based RL environment.

This config defines all aspects of an RL environment: the physical scene, observations, actions, rewards, terminations, and optional features like commands and curriculum learning.

The environment step size is sim.mujoco.timestep * decimation. For example, with a 2ms physics timestep and decimation=10, the environment runs at 50Hz.

Attributes:

`decimation`	Number of physics simulation steps per environment step.
`scene`	Scene configuration defining terrain, entities, and sensors.
`observations`	Observation groups configuration.
`actions`	Action terms configuration.
`events`	Event terms for domain randomization and state resets.
`seed`	Random seed for reproducibility.
`sim`	Simulation configuration including physics timestep, solver iterations, contact parameters, and NaN guarding.
`viewer`	Viewer configuration for rendering (camera position, resolution, etc.).
`episode_length_s`	Duration of an episode (in seconds).
`rewards`	Reward terms configuration.
`terminations`	Termination terms configuration.
`commands`	Command generator terms (e.g., velocity targets).
`curriculum`	Curriculum terms for adaptive difficulty.
`is_finite_horizon`	Whether the task has a finite or infinite horizon.
`scale_rewards_by_dt`	Whether to multiply rewards by the environment step duration (dt).

Methods:

__init__(*, decimation, scene[, ...])

decimation: int#: Number of physics simulation steps per environment step. Higher values mean coarser control frequency. Environment step duration = physics_dt * decimation.

scene: SceneCfg#: Scene configuration defining terrain, entities, and sensors. The scene specifies num_envs, the number of parallel environments.

observations: dict[str, ObservationGroupCfg]#: Observation groups configuration. Each group (e.g., “policy”, “critic”) contains observation terms that are concatenated. Groups can have different settings for noise, history, and delay.

actions: dict[str, ActionTermCfg]#: Action terms configuration. Each term controls a specific entity/aspect (e.g., joint positions). Action dimensions are concatenated across terms.

events: dict[str, EventTermCfg]#: Event terms for domain randomization and state resets. Default includes reset_scene_to_default which resets entities to their initial state. Can be set to empty to disable all events including default reset.

seed: int | None = None#: Random seed for reproducibility. If None, a random seed is used. The actual seed used is stored back into this field after initialization.

sim: SimulationCfg#: Simulation configuration including physics timestep, solver iterations, contact parameters, and NaN guarding.

viewer: ViewerConfig#: Viewer configuration for rendering (camera position, resolution, etc.).

episode_length_s: float = 0.0#

Duration of an episode (in seconds).

Episode length in steps is computed as:: ceil(episode_length_s / (sim.mujoco.timestep * decimation))

rewards: dict[str, RewardTermCfg]#: Reward terms configuration.

terminations: dict[str, TerminationTermCfg]#: Termination terms configuration. If empty, episodes never reset. Use mdp.time_out with time_out=True for episode time limits.

commands: dict[str, CommandTermCfg]#: Command generator terms (e.g., velocity targets).

curriculum: dict[str, CurriculumTermCfg]#: Curriculum terms for adaptive difficulty.

is_finite_horizon: bool = False#

Whether the task has a finite or infinite horizon. Defaults to False (infinite).

Finite horizon (True): The time limit defines the task boundary. When reached, no future value exists beyond it, so the agent receives a terminal done signal.
Infinite horizon (False): The time limit is an artificial cutoff. The agent receives a truncated done signal to bootstrap the value of continuing beyond the limit.

scale_rewards_by_dt: bool = True#

Whether to multiply rewards by the environment step duration (dt).

When True (default), reward values are scaled by step_dt to normalize cumulative episodic rewards across different simulation frequencies. Set to False for algorithms that expect unscaled reward signals (e.g., HER, static reward scaling).

__init__(*, decimation: int, scene: ~mjlab.scene.scene.SceneCfg, observations: dict[str, ~mjlab.managers.observation_manager.ObservationGroupCfg] = <factory>, actions: dict[str, ~mjlab.managers.action_manager.ActionTermCfg] = <factory>, events: dict[str, ~mjlab.managers.event_manager.EventTermCfg] = <factory>, seed: int | None = None, sim: ~mjlab.sim.sim.SimulationCfg = <factory>, viewer: ~mjlab.viewer.viewer_config.ViewerConfig = <factory>, episode_length_s: float = 0.0, rewards: dict[str, ~mjlab.managers.reward_manager.RewardTermCfg] = <factory>, terminations: dict[str, ~mjlab.managers.termination_manager.TerminationTermCfg] = <factory>, commands: dict[str, ~mjlab.managers.command_manager.CommandTermCfg] = <factory>, curriculum: dict[str, ~mjlab.managers.curriculum_manager.CurriculumTermCfg] = <factory>, is_finite_horizon: bool = False, scale_rewards_by_dt: bool = True) → None#

mjlab.envs

Contents

mjlab.envs#