Skip to content

Update Cartpole Task - Bump rsl-rl#6294

Open
StafaH wants to merge 13 commits into
isaac-sim:developfrom
StafaH:mh/rslrl_bump
Open

Update Cartpole Task - Bump rsl-rl#6294
StafaH wants to merge 13 commits into
isaac-sim:developfrom
StafaH:mh/rslrl_bump

Conversation

@StafaH

@StafaH StafaH commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

This PR updates and aligns the direct and manager-based Cartpole tasks.

The RL workflow changes:

  • Bump rsl-rl-lib to 5.4.1 and use its native image-only CNNModel, removing the Isaac Lab override that is no longer required.
  • Add task-aware latest and best checkpoint selectors to the unified training and play commands for RL-Games, RSL-RL, SB3, and SKRL. Try it out with

uv run play --rl_library rsl_rl --task xxxxxx --checkpoint latest

  • Write a run.json manifest for new training runs so checkpoint selection is constrained by RL library, task, and agent metadata. Historical runs without a manifest remain available through explicit checkpoint paths but are intentionally excluded from selector-based discovery.
  • Document checkpoint selector usage in the reinforcement-learning workflow documentation.

The Cartpole changes:

  • Align direct and manager-based state observations, resets, termination bounds, episode horizon, reward terms, camera frame stacking, and lighting.
  • Use 64 by 64 camera observations and a shared RSL-RL camera PPO configuration with a camera actor and privileged state critic.
  • Add the ovphysx physics preset to manager-based Cartpole, enabling the full physics and renderer backend matrix for camera tasks.
  • Prevent camera type annotations from importing pxr before SimulationApp starts.

Existing policies trained against the previous Cartpole MDP must be retrained.

Dependencies:

  • rsl-rl-lib==5.4.1

Fixes # (not provided)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Testing

  • uv run --with pytest pytest scripts/reinforcement_learning/test/test_checkpoint_manifest.py -q (5 passed)
  • uv run --with pytest pytest source/isaaclab_tasks/test/core/test_env_cfg_no_forbidden_imports.py -k Isaac-Cartpole -q (6 passed)
  • uv run train --rl_library rsl_rl --task Isaac-Cartpole --num_envs 16 --max_iterations 1
  • One-iteration RSL-RL smoke training for both Isaac-Cartpole-Camera and Isaac-Cartpole-Camera-Direct with physics=newton_mjwarp renderer=newton_renderer
  • Verified that manager and direct camera runner configurations are identical except for experiment_name
  • ./isaaclab.sh -f

Screenshots

Not applicable. This PR does not change a graphical user interface.

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have added a changelog fragment under source/<pkg>/changelog.d/ for every touched package (do not edit CHANGELOG.rst or bump extension.toml -- CI handles that)
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team infrastructure labels Jun 29, 2026
@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR bumps rsl-rl-lib to 5.4.1, removes the Isaac Lab CNNModel override (now upstream handles image-only observations), and adds latest/best checkpoint selectors to all four RL library training and play scripts. On the Cartpole side it aligns the direct and manager-based tasks (observations, resets, rewards, camera resolution, lighting) and introduces an asymmetric actor-critic setup where the actor uses 64×64 camera images and the critic uses the privileged 4-dim joint state.

  • Checkpoint selector system (common.py): writes a run.json manifest on each new training run and resolves --checkpoint latest/best by scanning manifests, filtering by library/task/metadata, and natural-sorting checkpoint filenames. Historical runs without a manifest are intentionally excluded.
  • Cartpole MDP alignment: observation order changed to [cart_pos, pole_pos, cart_vel, pole_vel] (relative to defaults), reward multiplied by step_dt, cart-only termination (pole-angle termination removed), and initial cart position/velocity randomisation added — existing trained policies must be retrained.
  • CameraImageStack manager term (mdp/observations.py): new reusable term shared between direct and manager-based camera envs, handling frame stacking, deferred RGB normalisation, and depth inf→0 substitution.

Confidence Score: 4/5

Safe to merge; the intentional breaking changes are well-documented in the changelog and the core checkpoint selection logic is clean and tested.

The skrl checkpoint pattern r".*" could silently select a non-checkpoint file if the checkpoints directory ever contains auxiliary files, and the depth sensor buffer is mutated in-place in CameraImageStack, which could affect other consumers of that buffer within the same step. Both are quality concerns rather than showstoppers, but worth addressing before the feature is used broadly.

scripts/reinforcement_learning/skrl/play_skrl.py and train_skrl.py (broad checkpoint pattern); source/isaaclab_tasks/isaaclab_tasks/core/cartpole/mdp/observations.py (in-place depth buffer mutation).

Important Files Changed

Filename Overview
scripts/reinforcement_learning/common.py Adds the manifest-based checkpoint selector system: write_run_manifest (atomic write via temp-file rename), resolve_checkpoint_selector (scans manifests, filters by library/task/metadata, natural-sorts checkpoints), and helpers _normalize_task_name/_natural_sort_key. Logic is solid and well-tested.
scripts/reinforcement_learning/skrl/play_skrl.py Adds checkpoint selector support; the checkpoint_pattern=r".*" is overly broad and could match non-checkpoint files in the checkpoints directory.
scripts/reinforcement_learning/skrl/train_skrl.py Same broad r".*" checkpoint pattern issue as play_skrl.py; otherwise correctly adds manifest writing and selector resolution before training starts.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/mdp/observations.py New shared CameraImageStack manager term and resolve_camera_frame_stack helper; depth in-place mutation at line 93 modifies the camera's internal buffer, which could affect other consumers reading the same sensor output in the same step.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_env.py MDP alignment changes: observation reordered to [cart_pos, pole_pos, cart_vel, pole_vel] relative to defaults, reward scaled by step_dt, pole-angle termination removed, initial cart+pole randomisation added. All are intentional breaking changes documented in the changelog.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/cartpole_direct_camera_env.py Refactors frame-stack resolution to call shared resolve_camera_frame_stack, adds asymmetric critic observation ({"policy": image_obs, "critic": state_obs}), and correctly updates observation_space for frame stacking in __init__.
source/isaaclab_rl/isaaclab_rl/rsl_rl/models.py Deleted — the Isaac Lab CNNModel override is no longer needed as rsl-rl-lib>=5.4.1 natively supports image-only observations. The class_name default in RslRlCNNModelCfg is updated to use the upstream "CNNModel" directly.
scripts/reinforcement_learning/test/test_checkpoint_manifest.py Good coverage of the manifest system: tests normalization, latest/best selection, newer-run-without-checkpoint skip, and unmanifested historical run rejection. All five tests exercise distinct behaviours.
source/isaaclab_tasks/isaaclab_tasks/core/cartpole/agents/rsl_rl_ppo_cfg.py Camera PPO config updated for asymmetric actor-critic: actor uses CNN (64×64, smaller kernels/strides), critic uses a compact MLP on privileged state, shared CNN encoders disabled, and obs_groups split into actor/critic groups.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["--checkpoint &lt;arg&gt;"] --> B{Is arg in CHECKPOINT_SELECTORS?}
    B -- yes --> C["resolve_checkpoint_selector()"]
    B -- no --> D{Is arg None?}
    D -- no --> E["retrieve_file_path(arg)\nUse explicit path"]
    D -- yes --> F["Use library default\nor pretrained checkpoint"]

    C --> G["Scan log_root.iterdir()"]
    G --> H{run.json manifest exists?}
    H -- no --> I["Skip run\n(historical run)"]
    H -- yes --> J{version / library / task / metadata match?}
    J -- no --> I
    J -- yes --> K["Collect run with created_at timestamp"]

    K --> L["Sort runs by created_at DESC"]
    L --> M{Any checkpoints in run dir?}
    M -- no --> N["Try next run"]
    M -- yes --> O{selector == 'best' AND preferred_pattern set?}
    O -- yes --> P{Any preferred checkpoints?}
    P -- yes --> Q["Use preferred subset"]
    P -- no --> R["Fall back to all checkpoints"]
    O -- no --> R
    Q --> S["Natural-sort, return last"]
    R --> S
    N --> T{More runs?}
    T -- yes --> M
    T -- no --> U["Raise ValueError\n(no compatible run found)"]
    S --> V["Absolute checkpoint path"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A["--checkpoint &lt;arg&gt;"] --> B{Is arg in CHECKPOINT_SELECTORS?}
    B -- yes --> C["resolve_checkpoint_selector()"]
    B -- no --> D{Is arg None?}
    D -- no --> E["retrieve_file_path(arg)\nUse explicit path"]
    D -- yes --> F["Use library default\nor pretrained checkpoint"]

    C --> G["Scan log_root.iterdir()"]
    G --> H{run.json manifest exists?}
    H -- no --> I["Skip run\n(historical run)"]
    H -- yes --> J{version / library / task / metadata match?}
    J -- no --> I
    J -- yes --> K["Collect run with created_at timestamp"]

    K --> L["Sort runs by created_at DESC"]
    L --> M{Any checkpoints in run dir?}
    M -- no --> N["Try next run"]
    M -- yes --> O{selector == 'best' AND preferred_pattern set?}
    O -- yes --> P{Any preferred checkpoints?}
    P -- yes --> Q["Use preferred subset"]
    P -- no --> R["Fall back to all checkpoints"]
    O -- no --> R
    Q --> S["Natural-sort, return last"]
    R --> S
    N --> T{More runs?}
    T -- yes --> M
    T -- no --> U["Raise ValueError\n(no compatible run found)"]
    S --> V["Absolute checkpoint path"]
Loading

Reviews (1): Last reviewed commit: "Make MDP match between direct and manage..." | Re-trigger Greptile

Comment on lines +152 to +164
resume_path = resolve_checkpoint_selector(
log_root_path,
args_cli.checkpoint,
library="skrl",
task=train_task_name,
checkpoint_pattern=r".*",
other_dirs=["checkpoints"],
metadata={
"agent": agent_cfg_entry_point,
"algorithm": algorithm,
"ml_framework": args_cli.ml_framework,
},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The checkpoint pattern r".*" matches every file in the checkpoints directory — including hidden files, stats dumps, or any other auxiliary file skrl might write there. The natural-sort key would then rank non-checkpoint files arbitrarily against real checkpoints, and the "last" entry returned could be a non-checkpoint file. Other libraries use extension-specific patterns (r".*\.pth", r"model_.*\.pt") for exactly this reason. A tighter pattern like r".*\.pt" (or whatever extension skrl uses) would prevent accidental matches.

Suggested change
resume_path = resolve_checkpoint_selector(
log_root_path,
args_cli.checkpoint,
library="skrl",
task=train_task_name,
checkpoint_pattern=r".*",
other_dirs=["checkpoints"],
metadata={
"agent": agent_cfg_entry_point,
"algorithm": algorithm,
"ml_framework": args_cli.ml_framework,
},
)
resume_path = resolve_checkpoint_selector(
log_root_path,
args_cli.checkpoint,
library="skrl",
task=train_task_name,
checkpoint_pattern=r".*\.pt",
other_dirs=["checkpoints"],
metadata={
"agent": agent_cfg_entry_point,
"algorithm": algorithm,
"ml_framework": args_cli.ml_framework,
},
)

Comment on lines +175 to +187
resume_path = resolve_checkpoint_selector(
log_root_path,
args_cli.checkpoint,
library="skrl",
task=args_cli.task,
checkpoint_pattern=r".*",
other_dirs=["checkpoints"],
metadata={
"agent": agent_cfg_entry_point,
"algorithm": algorithm,
"ml_framework": args_cli.ml_framework,
},
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Same overly broad r".*" checkpoint pattern as in the play script. If skrl ever writes a non-.pt file into the checkpoints directory (e.g. a training-stats JSON or a config pickle), _natural_sort_key could rank it above or below actual checkpoint files unpredictably, causing the selector to return the wrong file.

Suggested change
resume_path = resolve_checkpoint_selector(
log_root_path,
args_cli.checkpoint,
library="skrl",
task=args_cli.task,
checkpoint_pattern=r".*",
other_dirs=["checkpoints"],
metadata={
"agent": agent_cfg_entry_point,
"algorithm": algorithm,
"ml_framework": args_cli.ml_framework,
},
)
resume_path = resolve_checkpoint_selector(
log_root_path,
args_cli.checkpoint,
library="skrl",
task=args_cli.task,
checkpoint_pattern=r".*\.pt",
other_dirs=["checkpoints"],
metadata={
"agent": agent_cfg_entry_point,
"algorithm": algorithm,
"ml_framework": args_cli.ml_framework,
},
)

if rgb_like and not defer_normalize:
camera_data = normalize_camera_image(camera_data, data_type)
elif data_type == "depth":
camera_data[camera_data == float("inf")] = 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 In-place mutation of the shared camera buffer

camera.data.output["depth"] is likely a view into the sensor's internal storage, not a private copy. Writing zeros over the inf pixels here mutates that storage for the current step. Any other observation term, reward term, or debug consumer that reads camera.data.output["depth"] afterward in the same step will see the modified tensor (zeros where inf was) rather than the raw sensor output. The same pattern exists in cartpole_direct_camera_env.py line 114. A safe fix is to clone or mask without mutating: camera_data = camera_data.clone(); camera_data[camera_data == float("inf")] = 0 or use torch.where.

@AntoineRichard AntoineRichard left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this PR is bundeling multiple things and maybe it should be cut into two PRs.

  1. on the shared checkpoint resolution mechanism
  2. on the cartpole task unification

Both look fine, but they are distinct work.

About the cartpole PR work, I took a quick look at the golden images and some of them looked weird I don't know if it's github showing the diffs of images, or if they are genuinely weird looking. Might be good to check locally before merging.

On the cartpole side the unified framework looks nice, I left a couple comments here and here, but nothing blocking. This is in good shape.


initial_cart_position_range = (-1.0, 1.0) # [m]
initial_cart_velocity_range = (-0.5, 0.5) # [m/s]
initial_pole_angle_range = [-0.25, 0.25] # range as multiples of pi [rad]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit confusing. Why not directly pass this as radiants?

Comment on lines 127 to 128
self.cfg.initial_pole_angle_range[0] * math.pi,
self.cfg.initial_pole_angle_range[1] * math.pi,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do the math.pi multiplication here?

@@ -74,9 +72,9 @@ class BaseCartpoleCameraEnvCfg(CartpoleEnvCfg):
initial_pole_angle_range = [-0.125, 0.125]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do this in radians here as well?

expected_policy_terms = ["joint_pos_rel", "joint_vel_rel"]

expected_subtask_terms = ["grasp_1", "stack_1", "grasp_2"]
# Before the fix, only the last term would be present.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this comment here?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to move this to isaaclab_rl package? It's always weird to have code here in the scripts, these are not part of the extensions meaning users would not get them by installating isaaclab.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I'll do a follow up PR and refactor some of these train/play into isaaclab_rl

Comment on lines +405 to +407
def _normalize_task_name(task: str) -> str:
"""Normalize training and play variants to the same task name."""
return task.split(":")[-1].removesuffix("-Play")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit hacky. There are also tasks that have inference in the name. I'm not saying I have a better solution, just pointing it out.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll keep this in mind. I think we want to remove the -Play suffix by loading the play config when play.py is used, I'll think about the best solution and update it in a follow-up PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation infrastructure isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants