Update Cartpole Task - Bump rsl-rl by StafaH · Pull Request #6294 · isaac-sim/IsaacLab

StafaH · 2026-06-29T19:01:05Z

Description

This PR updates and aligns the direct and manager-based Cartpole tasks.

The RL workflow changes:

Bump rsl-rl-lib to 5.4.1 and use its native image-only CNNModel, removing the Isaac Lab override that is no longer required.
Add task-aware latest and best checkpoint selectors to the unified training and play commands for RL-Games, RSL-RL, SB3, and SKRL. Try it out with

uv run play --rl_library rsl_rl --task xxxxxx --checkpoint latest

Write a run.json manifest for new training runs so checkpoint selection is constrained by RL library, task, and agent metadata. Historical runs without a manifest remain available through explicit checkpoint paths but are intentionally excluded from selector-based discovery.
Document checkpoint selector usage in the reinforcement-learning workflow documentation.

The Cartpole changes:

Align direct and manager-based state observations, resets, termination bounds, episode horizon, reward terms, camera frame stacking, and lighting.
Use 64 by 64 camera observations and a shared RSL-RL camera PPO configuration with a camera actor and privileged state critic.
Add the ovphysx physics preset to manager-based Cartpole, enabling the full physics and renderer backend matrix for camera tasks.
Prevent camera type annotations from importing pxr before SimulationApp starts.

Existing policies trained against the previous Cartpole MDP must be retrained.

Dependencies:

rsl-rl-lib==5.4.1

Fixes # (not provided)

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (existing functionality will not work without user modification)
Documentation update

Testing

uv run --with pytest pytest scripts/reinforcement_learning/test/test_checkpoint_manifest.py -q (5 passed)
uv run --with pytest pytest source/isaaclab_tasks/test/core/test_env_cfg_no_forbidden_imports.py -k Isaac-Cartpole -q (6 passed)
uv run train --rl_library rsl_rl --task Isaac-Cartpole --num_envs 16 --max_iterations 1
One-iteration RSL-RL smoke training for both Isaac-Cartpole-Camera and Isaac-Cartpole-Camera-Direct with physics=newton_mjwarp renderer=newton_renderer
Verified that manager and direct camera runner configurations are identical except for experiment_name
./isaaclab.sh -f

Screenshots

Not applicable. This PR does not change a graphical user interface.

Checklist

I have read and understood the contribution guidelines
I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package (do not edit CHANGELOG.rst or bump extension.toml -- CI handles that)
I have added my name to the CONTRIBUTORS.md or my name already exists there

greptile-apps · 2026-06-29T19:12:21Z

Greptile Summary

This PR bumps rsl-rl-lib to 5.4.1, removes the Isaac Lab CNNModel override (now upstream handles image-only observations), and adds latest/best checkpoint selectors to all four RL library training and play scripts. On the Cartpole side it aligns the direct and manager-based tasks (observations, resets, rewards, camera resolution, lighting) and introduces an asymmetric actor-critic setup where the actor uses 64×64 camera images and the critic uses the privileged 4-dim joint state.

Checkpoint selector system (common.py): writes a run.json manifest on each new training run and resolves --checkpoint latest/best by scanning manifests, filtering by library/task/metadata, and natural-sorting checkpoint filenames. Historical runs without a manifest are intentionally excluded.
Cartpole MDP alignment: observation order changed to [cart_pos, pole_pos, cart_vel, pole_vel] (relative to defaults), reward multiplied by step_dt, cart-only termination (pole-angle termination removed), and initial cart position/velocity randomisation added — existing trained policies must be retrained.
CameraImageStack manager term (mdp/observations.py): new reusable term shared between direct and manager-based camera envs, handling frame stacking, deferred RGB normalisation, and depth inf→0 substitution.

Confidence Score: 4/5

Safe to merge; the intentional breaking changes are well-documented in the changelog and the core checkpoint selection logic is clean and tested.

The skrl checkpoint pattern r".*" could silently select a non-checkpoint file if the checkpoints directory ever contains auxiliary files, and the depth sensor buffer is mutated in-place in CameraImageStack, which could affect other consumers of that buffer within the same step. Both are quality concerns rather than showstoppers, but worth addressing before the feature is used broadly.

scripts/reinforcement_learning/skrl/play_skrl.py and train_skrl.py (broad checkpoint pattern); source/isaaclab_tasks/isaaclab_tasks/core/cartpole/mdp/observations.py (in-place depth buffer mutation).

Important Files Changed

Filename	Overview
scripts/reinforcement_learning/common.py	Adds the manifest-based checkpoint selector system: `write_run_manifest` (atomic write via temp-file rename), `resolve_checkpoint_selector` (scans manifests, filters by library/task/metadata, natural-sorts checkpoints), and helpers `_normalize_task_name`/`_natural_sort_key`. Logic is solid and well-tested.
scripts/reinforcement_learning/skrl/play_skrl.py	Adds checkpoint selector support; the `checkpoint_pattern=r".*"` is overly broad and could match non-checkpoint files in the checkpoints directory.
scripts/reinforcement_learning/skrl/train_skrl.py	Same broad `r".*"` checkpoint pattern issue as play_skrl.py; otherwise correctly adds manifest writing and selector resolution before training starts.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/mdp/observations.py	New shared `CameraImageStack` manager term and `resolve_camera_frame_stack` helper; depth in-place mutation at line 93 modifies the camera's internal buffer, which could affect other consumers reading the same sensor output in the same step.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_env.py	MDP alignment changes: observation reordered to [cart_pos, pole_pos, cart_vel, pole_vel] relative to defaults, reward scaled by step_dt, pole-angle termination removed, initial cart+pole randomisation added. All are intentional breaking changes documented in the changelog.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py	Refactors frame-stack resolution to call shared `resolve_camera_frame_stack`, adds asymmetric critic observation (`{"policy": image_obs, "critic": state_obs}`), and correctly updates `observation_space` for frame stacking in `__init__`.
source/isaaclab_rl/isaaclab_rl/rsl_rl/models.py	Deleted — the Isaac Lab `CNNModel` override is no longer needed as `rsl-rl-lib>=5.4.1` natively supports image-only observations. The `class_name` default in `RslRlCNNModelCfg` is updated to use the upstream `"CNNModel"` directly.
scripts/reinforcement_learning/test/test_checkpoint_manifest.py	Good coverage of the manifest system: tests normalization, latest/best selection, newer-run-without-checkpoint skip, and unmanifested historical run rejection. All five tests exercise distinct behaviours.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/agents/rsl_rl_ppo_cfg.py	Camera PPO config updated for asymmetric actor-critic: actor uses CNN (64×64, smaller kernels/strides), critic uses a compact MLP on privileged state, shared CNN encoders disabled, and `obs_groups` split into actor/critic groups.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["--checkpoint &lt;arg&gt;"] --> B{Is arg in CHECKPOINT_SELECTORS?}
    B -- yes --> C["resolve_checkpoint_selector()"]
    B -- no --> D{Is arg None?}
    D -- no --> E["retrieve_file_path(arg)\nUse explicit path"]
    D -- yes --> F["Use library default\nor pretrained checkpoint"]

    C --> G["Scan log_root.iterdir()"]
    G --> H{run.json manifest exists?}
    H -- no --> I["Skip run\n(historical run)"]
    H -- yes --> J{version / library / task / metadata match?}
    J -- no --> I
    J -- yes --> K["Collect run with created_at timestamp"]

    K --> L["Sort runs by created_at DESC"]
    L --> M{Any checkpoints in run dir?}
    M -- no --> N["Try next run"]
    M -- yes --> O{selector == 'best' AND preferred_pattern set?}
    O -- yes --> P{Any preferred checkpoints?}
    P -- yes --> Q["Use preferred subset"]
    P -- no --> R["Fall back to all checkpoints"]
    O -- no --> R
    Q --> S["Natural-sort, return last"]
    R --> S
    N --> T{More runs?}
    T -- yes --> M
    T -- no --> U["Raise ValueError\n(no compatible run found)"]
    S --> V["Absolute checkpoint path"]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["--checkpoint &lt;arg&gt;"] --> B{Is arg in CHECKPOINT_SELECTORS?}
    B -- yes --> C["resolve_checkpoint_selector()"]
    B -- no --> D{Is arg None?}
    D -- no --> E["retrieve_file_path(arg)\nUse explicit path"]
    D -- yes --> F["Use library default\nor pretrained checkpoint"]

    C --> G["Scan log_root.iterdir()"]
    G --> H{run.json manifest exists?}
    H -- no --> I["Skip run\n(historical run)"]
    H -- yes --> J{version / library / task / metadata match?}
    J -- no --> I
    J -- yes --> K["Collect run with created_at timestamp"]

    K --> L["Sort runs by created_at DESC"]
    L --> M{Any checkpoints in run dir?}
    M -- no --> N["Try next run"]
    M -- yes --> O{selector == 'best' AND preferred_pattern set?}
    O -- yes --> P{Any preferred checkpoints?}
    P -- yes --> Q["Use preferred subset"]
    P -- no --> R["Fall back to all checkpoints"]
    O -- no --> R
    Q --> S["Natural-sort, return last"]
    R --> S
    N --> T{More runs?}
    T -- yes --> M
    T -- no --> U["Raise ValueError\n(no compatible run found)"]
    S --> V["Absolute checkpoint path"]

_{Reviews (1): Last reviewed commit: "Make MDP match between direct and manage..." | Re-trigger Greptile}

greptile-apps · 2026-06-29T19:12:24Z

+            resume_path = resolve_checkpoint_selector(
+                log_root_path,
+                args_cli.checkpoint,
+                library="skrl",
+                task=train_task_name,
+                checkpoint_pattern=r".*",
+                other_dirs=["checkpoints"],
+                metadata={
+                    "agent": agent_cfg_entry_point,
+                    "algorithm": algorithm,
+                    "ml_framework": args_cli.ml_framework,
+                },
+            )


The checkpoint pattern r".*" matches every file in the checkpoints directory — including hidden files, stats dumps, or any other auxiliary file skrl might write there. The natural-sort key would then rank non-checkpoint files arbitrarily against real checkpoints, and the "last" entry returned could be a non-checkpoint file. Other libraries use extension-specific patterns (r".*\.pth", r"model_.*\.pt") for exactly this reason. A tighter pattern like r".*\.pt" (or whatever extension skrl uses) would prevent accidental matches.

Suggested change

resume_path = resolve_checkpoint_selector(

log_root_path,

args_cli.checkpoint,

library="skrl",

task=train_task_name,

checkpoint_pattern=r".*",

other_dirs=["checkpoints"],

metadata={

"agent": agent_cfg_entry_point,

"algorithm": algorithm,

"ml_framework": args_cli.ml_framework,

},

)

resume_path = resolve_checkpoint_selector(

log_root_path,

args_cli.checkpoint,

library="skrl",

task=train_task_name,

checkpoint_pattern=r".*\.pt",

other_dirs=["checkpoints"],

metadata={

"agent": agent_cfg_entry_point,

"algorithm": algorithm,

"ml_framework": args_cli.ml_framework,

},

)

greptile-apps · 2026-06-29T19:12:25Z

+            resume_path = resolve_checkpoint_selector(
+                log_root_path,
+                args_cli.checkpoint,
+                library="skrl",
+                task=args_cli.task,
+                checkpoint_pattern=r".*",
+                other_dirs=["checkpoints"],
+                metadata={
+                    "agent": agent_cfg_entry_point,
+                    "algorithm": algorithm,
+                    "ml_framework": args_cli.ml_framework,
+                },
+            )


Same overly broad r".*" checkpoint pattern as in the play script. If skrl ever writes a non-.pt file into the checkpoints directory (e.g. a training-stats JSON or a config pickle), _natural_sort_key could rank it above or below actual checkpoint files unpredictably, causing the selector to return the wrong file.

Suggested change

resume_path = resolve_checkpoint_selector(

log_root_path,

args_cli.checkpoint,

library="skrl",

task=args_cli.task,

checkpoint_pattern=r".*",

other_dirs=["checkpoints"],

metadata={

"agent": agent_cfg_entry_point,

"algorithm": algorithm,

"ml_framework": args_cli.ml_framework,

},

)

resume_path = resolve_checkpoint_selector(

log_root_path,

args_cli.checkpoint,

library="skrl",

task=args_cli.task,

checkpoint_pattern=r".*\.pt",

other_dirs=["checkpoints"],

metadata={

"agent": agent_cfg_entry_point,

"algorithm": algorithm,

"ml_framework": args_cli.ml_framework,

},

)

greptile-apps · 2026-06-29T19:12:26Z

+        if rgb_like and not defer_normalize:
+            camera_data = normalize_camera_image(camera_data, data_type)
+        elif data_type == "depth":
+            camera_data[camera_data == float("inf")] = 0


In-place mutation of the shared camera buffer

camera.data.output["depth"] is likely a view into the sensor's internal storage, not a private copy. Writing zeros over the inf pixels here mutates that storage for the current step. Any other observation term, reward term, or debug consumer that reads camera.data.output["depth"] afterward in the same step will see the modified tensor (zeros where inf was) rather than the raw sensor output. The same pattern exists in cartpole_direct_camera_env.py line 114. A safe fix is to clone or mask without mutating: camera_data = camera_data.clone(); camera_data[camera_data == float("inf")] = 0 or use torch.where.

AntoineRichard

I feel like this PR is bundeling multiple things and maybe it should be cut into two PRs.

on the shared checkpoint resolution mechanism
on the cartpole task unification

Both look fine, but they are distinct work.

About the cartpole PR work, I took a quick look at the golden images and some of them looked weird I don't know if it's github showing the diffs of images, or if they are genuinely weird looking. Might be good to check locally before merging.

On the cartpole side the unified framework looks nice, I left a couple comments here and here, but nothing blocking. This is in good shape.

AntoineRichard · 2026-06-30T12:03:37Z

-
+    initial_cart_position_range = (-1.0, 1.0)  # [m]
+    initial_cart_velocity_range = (-0.5, 0.5)  # [m/s]
+    initial_pole_angle_range = [-0.25, 0.25]  # range as multiples of pi [rad]


This is a bit confusing. Why not directly pass this as radiants?

AntoineRichard · 2026-06-30T12:04:01Z

            self.cfg.initial_pole_angle_range[0] * math.pi,
            self.cfg.initial_pole_angle_range[1] * math.pi,


Why do the math.pi multiplication here?

AntoineRichard · 2026-06-30T12:05:51Z

@@ -74,9 +72,9 @@ class BaseCartpoleCameraEnvCfg(CartpoleEnvCfg):
        initial_pole_angle_range = [-0.125, 0.125]


Should we do this in radians here as well?

AntoineRichard · 2026-06-30T12:10:48Z

+        expected_policy_terms = ["joint_pos_rel", "joint_vel_rel"]

-    expected_subtask_terms = ["grasp_1", "stack_1", "grasp_2"]
+        # Before the fix, only the last term would be present.


Do we need this comment here?

AntoineRichard · 2026-06-30T14:32:18Z

Would it make sense to move this to isaaclab_rl package? It's always weird to have code here in the scripts, these are not part of the extensions meaning users would not get them by installating isaaclab.

Agreed. I'll do a follow up PR and refactor some of these train/play into isaaclab_rl

AntoineRichard · 2026-06-30T14:34:22Z

+def _normalize_task_name(task: str) -> str:
+    """Normalize training and play variants to the same task name."""
+    return task.split(":")[-1].removesuffix("-Play")


This is a bit hacky. There are also tasks that have inference in the name. I'm not saying I have a better solution, just pointing it out.

I'll keep this in mind. I think we want to remove the -Play suffix by loading the play config when play.py is used, I'll think about the best solution and update it in a follow-up PR

StafaH added 3 commits June 28, 2026 12:22

Bump rsl-rl and remove custom cnn

b9b51eb

Add utilit for selecting latest checkpoing

b61e0aa

Make MDP match between direct and manager

8b86405

StafaH requested review from ClemensSchwarke, Mayankm96, jtigue-bdai, kellyguo11 and ooctipus as code owners June 29, 2026 19:01

github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team infrastructure labels Jun 29, 2026

greptile-apps Bot reviewed Jun 29, 2026

View reviewed changes

StafaH added 3 commits June 29, 2026 17:24

Use frame stack everywhere, distill cnn

1b1d9eb

Precommit

0eb7c96

Fix failing tests

1b39011

AntoineRichard approved these changes Jun 30, 2026

View reviewed changes

AntoineRichard reviewed Jun 30, 2026

View reviewed changes

StafaH and others added 7 commits June 30, 2026 19:42

Fix cartpole angle values

7037a06

Merge branch 'develop' into mh/rslrl_bump

ebeb880

Remove resolution change

f832ea9

Remove constants.py

16def0b

Revert artifacts from resolution change

428bfd6

Fix doc error

629aecc

Merge branch 'develop' into mh/rslrl_bump

9961d6c

		self.cfg.initial_pole_angle_range[0] * math.pi,
		self.cfg.initial_pole_angle_range[1] * math.pi,

		@@ -74,9 +72,9 @@ class BaseCartpoleCameraEnvCfg(CartpoleEnvCfg):
		initial_pole_angle_range = [-0.125, 0.125]

Uh oh!

Conversation

StafaH commented Jun 29, 2026

Description

Type of change

Testing

Screenshots

Checklist

Uh oh!

greptile-apps Bot commented Jun 29, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

AntoineRichard left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants