Update Cartpole Task - Bump rsl-rl#6294
Conversation
Greptile SummaryThis PR bumps
Confidence Score: 4/5Safe to merge; the intentional breaking changes are well-documented in the changelog and the core checkpoint selection logic is clean and tested. The skrl checkpoint pattern
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["--checkpoint <arg>"] --> B{Is arg in CHECKPOINT_SELECTORS?}
B -- yes --> C["resolve_checkpoint_selector()"]
B -- no --> D{Is arg None?}
D -- no --> E["retrieve_file_path(arg)\nUse explicit path"]
D -- yes --> F["Use library default\nor pretrained checkpoint"]
C --> G["Scan log_root.iterdir()"]
G --> H{run.json manifest exists?}
H -- no --> I["Skip run\n(historical run)"]
H -- yes --> J{version / library / task / metadata match?}
J -- no --> I
J -- yes --> K["Collect run with created_at timestamp"]
K --> L["Sort runs by created_at DESC"]
L --> M{Any checkpoints in run dir?}
M -- no --> N["Try next run"]
M -- yes --> O{selector == 'best' AND preferred_pattern set?}
O -- yes --> P{Any preferred checkpoints?}
P -- yes --> Q["Use preferred subset"]
P -- no --> R["Fall back to all checkpoints"]
O -- no --> R
Q --> S["Natural-sort, return last"]
R --> S
N --> T{More runs?}
T -- yes --> M
T -- no --> U["Raise ValueError\n(no compatible run found)"]
S --> V["Absolute checkpoint path"]
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
A["--checkpoint <arg>"] --> B{Is arg in CHECKPOINT_SELECTORS?}
B -- yes --> C["resolve_checkpoint_selector()"]
B -- no --> D{Is arg None?}
D -- no --> E["retrieve_file_path(arg)\nUse explicit path"]
D -- yes --> F["Use library default\nor pretrained checkpoint"]
C --> G["Scan log_root.iterdir()"]
G --> H{run.json manifest exists?}
H -- no --> I["Skip run\n(historical run)"]
H -- yes --> J{version / library / task / metadata match?}
J -- no --> I
J -- yes --> K["Collect run with created_at timestamp"]
K --> L["Sort runs by created_at DESC"]
L --> M{Any checkpoints in run dir?}
M -- no --> N["Try next run"]
M -- yes --> O{selector == 'best' AND preferred_pattern set?}
O -- yes --> P{Any preferred checkpoints?}
P -- yes --> Q["Use preferred subset"]
P -- no --> R["Fall back to all checkpoints"]
O -- no --> R
Q --> S["Natural-sort, return last"]
R --> S
N --> T{More runs?}
T -- yes --> M
T -- no --> U["Raise ValueError\n(no compatible run found)"]
S --> V["Absolute checkpoint path"]
Reviews (1): Last reviewed commit: "Make MDP match between direct and manage..." | Re-trigger Greptile |
| resume_path = resolve_checkpoint_selector( | ||
| log_root_path, | ||
| args_cli.checkpoint, | ||
| library="skrl", | ||
| task=train_task_name, | ||
| checkpoint_pattern=r".*", | ||
| other_dirs=["checkpoints"], | ||
| metadata={ | ||
| "agent": agent_cfg_entry_point, | ||
| "algorithm": algorithm, | ||
| "ml_framework": args_cli.ml_framework, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
The checkpoint pattern
r".*" matches every file in the checkpoints directory — including hidden files, stats dumps, or any other auxiliary file skrl might write there. The natural-sort key would then rank non-checkpoint files arbitrarily against real checkpoints, and the "last" entry returned could be a non-checkpoint file. Other libraries use extension-specific patterns (r".*\.pth", r"model_.*\.pt") for exactly this reason. A tighter pattern like r".*\.pt" (or whatever extension skrl uses) would prevent accidental matches.
| resume_path = resolve_checkpoint_selector( | |
| log_root_path, | |
| args_cli.checkpoint, | |
| library="skrl", | |
| task=train_task_name, | |
| checkpoint_pattern=r".*", | |
| other_dirs=["checkpoints"], | |
| metadata={ | |
| "agent": agent_cfg_entry_point, | |
| "algorithm": algorithm, | |
| "ml_framework": args_cli.ml_framework, | |
| }, | |
| ) | |
| resume_path = resolve_checkpoint_selector( | |
| log_root_path, | |
| args_cli.checkpoint, | |
| library="skrl", | |
| task=train_task_name, | |
| checkpoint_pattern=r".*\.pt", | |
| other_dirs=["checkpoints"], | |
| metadata={ | |
| "agent": agent_cfg_entry_point, | |
| "algorithm": algorithm, | |
| "ml_framework": args_cli.ml_framework, | |
| }, | |
| ) |
| resume_path = resolve_checkpoint_selector( | ||
| log_root_path, | ||
| args_cli.checkpoint, | ||
| library="skrl", | ||
| task=args_cli.task, | ||
| checkpoint_pattern=r".*", | ||
| other_dirs=["checkpoints"], | ||
| metadata={ | ||
| "agent": agent_cfg_entry_point, | ||
| "algorithm": algorithm, | ||
| "ml_framework": args_cli.ml_framework, | ||
| }, | ||
| ) |
There was a problem hiding this comment.
Same overly broad
r".*" checkpoint pattern as in the play script. If skrl ever writes a non-.pt file into the checkpoints directory (e.g. a training-stats JSON or a config pickle), _natural_sort_key could rank it above or below actual checkpoint files unpredictably, causing the selector to return the wrong file.
| resume_path = resolve_checkpoint_selector( | |
| log_root_path, | |
| args_cli.checkpoint, | |
| library="skrl", | |
| task=args_cli.task, | |
| checkpoint_pattern=r".*", | |
| other_dirs=["checkpoints"], | |
| metadata={ | |
| "agent": agent_cfg_entry_point, | |
| "algorithm": algorithm, | |
| "ml_framework": args_cli.ml_framework, | |
| }, | |
| ) | |
| resume_path = resolve_checkpoint_selector( | |
| log_root_path, | |
| args_cli.checkpoint, | |
| library="skrl", | |
| task=args_cli.task, | |
| checkpoint_pattern=r".*\.pt", | |
| other_dirs=["checkpoints"], | |
| metadata={ | |
| "agent": agent_cfg_entry_point, | |
| "algorithm": algorithm, | |
| "ml_framework": args_cli.ml_framework, | |
| }, | |
| ) |
| if rgb_like and not defer_normalize: | ||
| camera_data = normalize_camera_image(camera_data, data_type) | ||
| elif data_type == "depth": | ||
| camera_data[camera_data == float("inf")] = 0 |
There was a problem hiding this comment.
In-place mutation of the shared camera buffer
camera.data.output["depth"] is likely a view into the sensor's internal storage, not a private copy. Writing zeros over the inf pixels here mutates that storage for the current step. Any other observation term, reward term, or debug consumer that reads camera.data.output["depth"] afterward in the same step will see the modified tensor (zeros where inf was) rather than the raw sensor output. The same pattern exists in cartpole_direct_camera_env.py line 114. A safe fix is to clone or mask without mutating: camera_data = camera_data.clone(); camera_data[camera_data == float("inf")] = 0 or use torch.where.
AntoineRichard
left a comment
There was a problem hiding this comment.
I feel like this PR is bundeling multiple things and maybe it should be cut into two PRs.
- on the shared checkpoint resolution mechanism
- on the cartpole task unification
Both look fine, but they are distinct work.
About the cartpole PR work, I took a quick look at the golden images and some of them looked weird I don't know if it's github showing the diffs of images, or if they are genuinely weird looking. Might be good to check locally before merging.
On the cartpole side the unified framework looks nice, I left a couple comments here and here, but nothing blocking. This is in good shape.
|
|
||
| initial_cart_position_range = (-1.0, 1.0) # [m] | ||
| initial_cart_velocity_range = (-0.5, 0.5) # [m/s] | ||
| initial_pole_angle_range = [-0.25, 0.25] # range as multiples of pi [rad] |
There was a problem hiding this comment.
This is a bit confusing. Why not directly pass this as radiants?
| self.cfg.initial_pole_angle_range[0] * math.pi, | ||
| self.cfg.initial_pole_angle_range[1] * math.pi, |
There was a problem hiding this comment.
Why do the math.pi multiplication here?
| @@ -74,9 +72,9 @@ class BaseCartpoleCameraEnvCfg(CartpoleEnvCfg): | |||
| initial_pole_angle_range = [-0.125, 0.125] | |||
There was a problem hiding this comment.
Should we do this in radians here as well?
| expected_policy_terms = ["joint_pos_rel", "joint_vel_rel"] | ||
|
|
||
| expected_subtask_terms = ["grasp_1", "stack_1", "grasp_2"] | ||
| # Before the fix, only the last term would be present. |
There was a problem hiding this comment.
Do we need this comment here?
There was a problem hiding this comment.
Would it make sense to move this to isaaclab_rl package? It's always weird to have code here in the scripts, these are not part of the extensions meaning users would not get them by installating isaaclab.
There was a problem hiding this comment.
Agreed. I'll do a follow up PR and refactor some of these train/play into isaaclab_rl
| def _normalize_task_name(task: str) -> str: | ||
| """Normalize training and play variants to the same task name.""" | ||
| return task.split(":")[-1].removesuffix("-Play") |
There was a problem hiding this comment.
This is a bit hacky. There are also tasks that have inference in the name. I'm not saying I have a better solution, just pointing it out.
There was a problem hiding this comment.
I'll keep this in mind. I think we want to remove the -Play suffix by loading the play config when play.py is used, I'll think about the best solution and update it in a follow-up PR
Description
This PR updates and aligns the direct and manager-based Cartpole tasks.
The RL workflow changes:
rsl-rl-libto 5.4.1 and use its native image-onlyCNNModel, removing the Isaac Lab override that is no longer required.latestandbestcheckpoint selectors to the unified training and play commands for RL-Games, RSL-RL, SB3, and SKRL. Try it out withuv run play --rl_library rsl_rl --task xxxxxx --checkpoint latestrun.jsonmanifest for new training runs so checkpoint selection is constrained by RL library, task, and agent metadata. Historical runs without a manifest remain available through explicit checkpoint paths but are intentionally excluded from selector-based discovery.The Cartpole changes:
ovphysxphysics preset to manager-based Cartpole, enabling the full physics and renderer backend matrix for camera tasks.pxrbeforeSimulationAppstarts.Existing policies trained against the previous Cartpole MDP must be retrained.
Dependencies:
rsl-rl-lib==5.4.1Fixes # (not provided)
Type of change
Testing
uv run --with pytest pytest scripts/reinforcement_learning/test/test_checkpoint_manifest.py -q(5 passed)uv run --with pytest pytest source/isaaclab_tasks/test/core/test_env_cfg_no_forbidden_imports.py -k Isaac-Cartpole -q(6 passed)uv run train --rl_library rsl_rl --task Isaac-Cartpole --num_envs 16 --max_iterations 1Isaac-Cartpole-CameraandIsaac-Cartpole-Camera-Directwithphysics=newton_mjwarp renderer=newton_rendererexperiment_name./isaaclab.sh -fScreenshots
Not applicable. This PR does not change a graphical user interface.
Checklist
pre-commitchecks with./isaaclab.sh --formatsource/<pkg>/changelog.d/for every touched package (do not editCHANGELOG.rstor bumpextension.toml-- CI handles that)CONTRIBUTORS.mdor my name already exists there