Skip to content

Add AMD ROCm (gfx942) support for the image→3D generation stack#72

Merged
HochCC merged 2 commits into
HorizonRobotics:masterfrom
ZJLi2013:amd_support
Jun 30, 2026
Merged

Add AMD ROCm (gfx942) support for the image→3D generation stack#72
HochCC merged 2 commits into
HorizonRobotics:masterfrom
ZJLi2013:amd_support

Conversation

@ZJLi2013

Copy link
Copy Markdown
Contributor

Summary

Enable EmbodiedGen's image→3D generation to run on AMD GPUs (ROCm/HIP), by swapping the
CUDA-only libraries for verified ROCm builds plus two small runtime shims. All changes are
additive (new files under docker/); no existing CUDA code path is modified.

Verified end-to-end on an AMD Instinct MI300X: python -m embodied_gen.models.sam3d
(SAM3D backend, no GPT, no texture-bake) produces outputs/splat.ply (6.5 MB 3D Gaussian
Splat) from the bundled sample_00.jpg.

Changes (all new files)

  • docker/install_rocm.sh — one-shot ROCm install: requirements minus CUDA libs, numpy<2
    pin, the ROCm dependency swaps (table below), deploys the two shims as sitecustomize,
    and runs an import smoke (PASS/FAIL map).
  • docker/Dockerfile.rocm — full-generation ROCm image (rocm/pytorch:rocm6.4.3...2.6.0)
    that runs install_rocm.sh.
  • docker/spconv_rocm_compat.py — converts spconv KRSC checkpoints to the Native layout at
    load time (see Related issue).
  • docker/kaolin_stub.pysitecustomize bypass for the CUDA-only kaolin (used only in
    the texture-backprojection / mesh-IO stage; core geometry path only calls
    kaolin.utils.testing.check_tensor).
  • docker/README.rocm.md — user-facing run-through.

CUDA → ROCm dependency map

Upstream (CUDA) ROCm replacement
spconv-cu120/121 ZJLi2013/spconv_rocm (2.3.8+rocm1, source)
nvdiffrast ZJLi2013/nvdiffrast@rocm
gsplat amd_gsplat (pypi.amd.com/rocm-6.4.3; import name stays gsplat)
pytorch3d ROCm 6.4 / py3.12 prebuilt wheel
flash-attn FA2-Triton (FLASH_ATTENTION_TRITON_AMD_ENABLE=TRUE at install + runtime)
xformers not needed — SAM3D attention auto-selects sdpa
numpy (base = 2.x) pinned <2 (diffusers/transformers requirement)
kaolin (no ROCm wheel) sitecustomize stub (docker/kaolin_stub.py)
diff-gaussian-rasterization optional ('inria' GS backend); gsplat is the default

Tested on

  • GPU: AMD Instinct MI300X (gfx942)
  • ROCm: 6.4.3
  • PyTorch: 2.6.0 (+HIP 6.4)
  • Docker: rocm/pytorch:rocm6.4.3_ubuntu24.04_py3.12_pytorch_release_2.6.0

Results

  • outputs/splat.ply (6.5 MB) from apps/assets/example_image/sample_00.jpg
  • Running cost 28.9 s, Max VRAM 9.74 GB; attention on AOTriton SDPA

Notes / scope

  • Backward-compatible: only adds files under docker/; CUDA users are unaffected.

  • Out of scope (documented gaps, not regressions): texture-backprojection (kaolin is
    CUDA-only and stubbed), GPT quality-checkers (need an API key). Core image→3D
    (segmentation → SAM3D geometry + gaussian + mesh export) runs without them.

  • Optional follow-up (happy to include if desired): make the kaolin imports in
    embodied_gen/data/utils.py lazy/optional so the stub isn't needed.

  • Depends on / related: spconv KRSC checkpoint loading on ROCm — ZJLi2013/spconv_rocm#<pr>.
    Until merged, docker/spconv_rocm_compat.py provides the equivalent fix consumer-side.

  • License: this PR is for study/research purposes only and adds ROCm build/integration
    scripts; it ships no model weights. Any models used (e.g. SAM-3D-Objects, TRELLIS, Kolors,
    SD3.5, etc.) remain governed by their own respective licenses — please refer to each model's
    license before use.

Swap the CUDA-only generation stack for verified ROCm builds (spconv_rocm, nvdiffrast@rocm, amd_gsplat, pytorch3d ROCm wheel, FA2-Triton) plus two runtime shims: a kaolin sitecustomize bypass (texture-stage only) and a spconv KRSC->Native checkpoint-load bridge. All additive under docker/; CUDA paths unchanged. Verified e2e on AMD Instinct MI300X / ROCm 6.4.3 / torch 2.6: SAM3D image->3D produces splat.ply (28.9s, 9.74GB VRAM).
@HochCC

HochCC commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Hi @ZJLi2013 ,
Thanks for contributing the AMD version docker! I will take time to review it in detail this week. Also, do you have anything else to update regarding this PR?

@ZJLi2013

Copy link
Copy Markdown
Contributor Author

Hi @ZJLi2013 , Thanks for contributing the AMD version docker! I will take time to review it in detail this week. Also, do you have anything else to update regarding this PR?

hi, thanks for replying. I'd love to support more works from HR on AMD GPUs future, both functional level and performance level if any interested

@HochCC

HochCC commented Jun 23, 2026

Copy link
Copy Markdown
Collaborator

Hi @ZJLi2013 , Thanks for contributing the AMD version docker! I will take time to review it in detail this week. Also, do you have anything else to update regarding this PR?

hi, thanks for replying. I'd love to support more works from HR on AMD GPUs future, both functional level and performance level if any interested

Sure, you are more than welcome. Currently, our main focus remains on NVIDIA GPU CUDA, and AMD GPUs are undoubtedly an important complement.

Comment thread docker/amd_rocm/install_rocm.sh
Comment thread docker/amd_rocm/Dockerfile
@HochCC HochCC merged commit e9fc2ef into HorizonRobotics:master Jun 30, 2026
2 checks passed
@HochCC

HochCC commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

hi @ZJLi2013 ,
thanks for the contribution, this PR has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants