Skip to content

Add baremetal RISC-V smoke tests (rv32, rv64)#19917

Open
luhenry wants to merge 4 commits into
pytorch:mainfrom
riseproject-dev:riscv-testing-baremetal
Open

Add baremetal RISC-V smoke tests (rv32, rv64)#19917
luhenry wants to merge 4 commits into
pytorch:mainfrom
riseproject-dev:riscv-testing-baremetal

Conversation

@luhenry
Copy link
Copy Markdown
Contributor

@luhenry luhenry commented Jun 1, 2026

Summary

This adds testing on CI and with local script of the portable backend on baremetal for RISC-V. It cross-compiles with riscv64-unknown-elf + picolibc, embeds the .bpte into the ELF, and runs under qemu-system-riscv{32,64} with semihosting. We use the same bundled-IO PASS criterion as the existing linux runs for pass/fail.

Test plan

The CI is the testing plan as we add configurations to CI. I've validated it works locally with examples/riscv/test-matrix.sh.

cc @psiddh @AdrianLundell @digantdesai @rascani

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jun 1, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19917

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 18 Awaiting Approval

As of commit 66edf4e with merge base 4469d84 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Jun 1, 2026

cc @rascani @mergennachin @willChuai

Cross-compiles with riscv64-unknown-elf + picolibc, embeds the .bpte into
the ELF, and runs under qemu-system-riscv{32,64} -machine virt with
semihosting carrying stdout and exit status. Same bundled-IO PASS criterion
as the existing linux runs.
@luhenry luhenry force-pushed the riscv-testing-baremetal branch from 7cc42fe to 00d0173 Compare June 1, 2026 16:59
@rascani
Copy link
Copy Markdown
Contributor

rascani commented Jun 1, 2026

@claude Review this PR.

@claude
Copy link
Copy Markdown

claude Bot commented Jun 1, 2026

Claude finished @rascani's task in 2m 51s —— View job


Review: Add baremetal RISC-V smoke tests (rv32, rv64)

  • Gather context and read changed files
  • Review CI scripts and workflow files
  • Review baremetal source code (CMakeLists, runner, linker script, startup)
  • Review toolchain files and setup scripts
  • Review test matrix and run scripts
  • Post comprehensive review feedback

Overall this is a well-structured PR that cleanly extends the existing RISC-V linux testing to baremetal (rv32 + rv64) using QEMU system mode with semihosting. The architecture mirrors examples/arm/executor_runner/standalone/ which is good for consistency. A few issues to address:


Bugs

1. Stale preset path in examples/riscv/baremetal/CMakeLists.txt:49

The default EXECUTORCH_BUILD_PRESET_FILE references riscv64_baremetal.cmake but the actual file is riscv_baremetal.cmake:

set(EXECUTORCH_BUILD_PRESET_FILE
    "${EXECUTORCH_ROOT}/tools/cmake/preset/riscv64_baremetal.cmake"  # ← doesn't exist

This won't break CI (because run.sh:164 passes -DEXECUTORCH_BUILD_PRESET_FILE=...riscv_baremetal.cmake explicitly), but it will break anyone who runs cmake on the standalone project without the override. Fix this →

2. Stale path filter in .github/workflows/riscv64.yml:15

- tools/cmake/preset/riscv64_*.cmake

The preset files were renamed to riscv_linux.cmake and riscv_baremetal.cmake (no 64 in the name). This glob will never match, so changes to the preset files won't trigger CI on PRs. Should be tools/cmake/preset/riscv_*.cmake. Fix this →

3. Stale reference to setup.sh in examples/riscv/run.sh:196

echo "[run.sh] ERROR: ${qemu} not found on PATH; install with examples/riscv/setup.sh" >&2

setup.sh was renamed to setup-linux.sh in this PR. This error message will confuse users. Should say setup-linux.sh or setup-baremetal.sh depending on the ${os} variable. Fix this →


Minor Issues

4. README --arch docs are stale (examples/riscv/README.md:23)

The table shows only rv64 as the valid value for --arch, but rv32 is now supported for baremetal. The "Values" column should show rv32, rv64.

5. README link to setup.sh (examples/riscv/README.md:36)

see [setup.sh](setup.sh)

This is a broken link since the file was renamed.


Design Observations (non-blocking)

  • The std::random_device asm trampoline stubs (lines 92-108 of the baremetal runner) are a pragmatic solution for the linker issue, but they're tied to the GCC/libstdc++ Itanium mangling. If the toolchain upgrades and the mangled names change, this will silently fail to link. A comment noting the GCC version these names were captured from would help future debuggers. (The existing comment is good but doesn't note the version.)

  • The 64 MiB planned_memory_pool + 4 MiB temp_allocator_pool + 8 MiB method_allocator_pool in .bss means the ELF requires ~76 MiB minimum even for the add model. This is fine for QEMU with -m 512M but worth noting for anyone who tries to reduce the RAM allocation.

  • The loop termination strategy (lines 245-258) of using InvalidArgument from load_bundled_input past the last testset is clever but fragile — if InvalidArgument is returned for a different reason on testset 0, the test would silently succeed with rc = 0 (since testset_idx > 0 check wouldn't apply). The current code does handle testset 0 failure specifically (lines 250-256), so this is fine in practice.


Summary

Three bugs that should be fixed before merge (stale file references after the setup.shsetup-linux.sh rename and riscv64_*.cmakeriscv_*.cmake rename). The architecture, linker script, semihosting implementation, and CI matrix are all solid.


@nil-is-all nil-is-all added the module: microcontrollers For embedded MCUs like Cortex-M, or RTOS like Zephyr, does not track NPU backend like Arm Ethos. label Jun 1, 2026
@rascani
Copy link
Copy Markdown
Contributor

rascani commented Jun 1, 2026

Thanks for putting this together @luhenry!

sentencepiece fails to compile on GCC 15 due to missing #include <cstdint>
@willChuai
Copy link
Copy Markdown

@luhenry Thanks for putting this together. From the MCU / bare-metal angle we discussed in #19666, this looks like the right RV32/RV64 portable-kernel CI baseline to me.

One question: examples/riscv/test-matrix.sh says it mirrors riscv64.yml, but portable seems to get extra local coverage compared with CI: CI uses only ["v=false"], while the local matrix also runs portable with zba/zbb/zbs,v=false and one RVV variant (v=true,vlen=128). Is that intended as an extended local sweep, or should it match CI exactly?

@luhenry
Copy link
Copy Markdown
Contributor Author

luhenry commented Jun 2, 2026

@luhenry Thanks for putting this together. From the MCU / bare-metal angle we discussed in #19666, this looks like the right RV32/RV64 portable-kernel CI baseline to me.

Yes, Portable Kernels is the baseline. A following contribution could be to add a rv64gc scalar and a rv64gc+RVV backend so that we can have that as a baseline for future RISC-V contributions.

One question: examples/riscv/test-matrix.sh says it mirrors riscv64.yml, but portable seems to get extra local coverage compared with CI: CI uses only ["v=false"], while the local matrix also runs portable with zba/zbb/zbs,v=false and one RVV variant (v=true,vlen=128). Is that intended as an extended local sweep, or should it match CI exactly?

examples/riscv/test-matrix.sh Is not meant to be a 1-to-1 mapping of what’s on CI but a local tool to verify most of the configurations. You’re right that we are not testing RVV on portable backend today on CI, let me add that. However, it will not be used in the sense that the code won’t be compiled with RVV enabled, except if there is run-time detection of extensions and using RVV-enabled kernels then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. module: microcontrollers For embedded MCUs like Cortex-M, or RTOS like Zephyr, does not track NPU backend like Arm Ethos.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants