Skip to content

Adding CUDA Support for stable-diffusion.cpp#7

Merged
kenvandine merged 24 commits into
lemonade-sdk:lemonadefrom
Phqen1x:lemonade
Jun 1, 2026
Merged

Adding CUDA Support for stable-diffusion.cpp#7
kenvandine merged 24 commits into
lemonade-sdk:lemonadefrom
Phqen1x:lemonade

Conversation

@Phqen1x
Copy link
Copy Markdown

@Phqen1x Phqen1x commented Jun 1, 2026

This pull request expands the CI build matrix to add CUDA builds for both Ubuntu and Windows, targeting multiple CUDA compute architectures. It introduces new build jobs that automate the setup, compilation, and artifact packaging for CUDA-enabled environments, and ensures these new CUDA artifacts are included in release uploads.

New CUDA build jobs:

  • Added ubuntu-latest-cuda and windows-latest-cuda jobs to .github/workflows/build.yml, each building for several CUDA compute architectures (sm_75 through sm_120). These jobs handle CUDA toolkit installation, build configuration, and artifact packaging for both platforms.

Workflow integration:

  • Updated the release job dependencies to include the new CUDA build jobs, ensuring CUDA artifacts are part of the release process.

Release artifact handling:

  • Modified the release upload logic to include .tar.xz files (used for Ubuntu CUDA builds) in addition to .zip files, so all relevant CUDA build artifacts are uploaded in releases.

Phqen1x and others added 14 commits May 31, 2026 11:55
Adds ubuntu-latest-cuda and windows-latest-cuda jobs covering sm_75
through sm_120, modeled after the llama.cpp CUDA CI. Linux installs
CUDA 12.8 via apt, bundles cublas runtime libs, and patches RPATHs.
Windows uses Jimver/cuda-toolkit@v0.2.22 with MSVC + Ninja Multi-Config.
Both jobs are wired into the release job.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: add Linux and Windows CUDA builds per compute capability
Linux CUDA artifacts are now named ubuntu-cuda-sm_XX-x64.tar.xz and
Windows CUDA artifacts are named windows-cuda-sm_XX-x64.zip, matching
the filenames Lemonade SDK constructs when downloading from releases.

Also removes the sd-* pattern filter from the release download step so
CUDA artifacts (which don't carry the sd- prefix) are included, and
adds .tar.xz support to the release upload script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: fix CUDA artifact names to match Lemonade SDK expectations
Renames CUDA artifacts to match the format Lemonade SDK expects:
  sd-master-148f69f-ubuntu-cuda-sm_89-x64.tar.xz
  sd-master-148f69f-windows-cuda-sm_89-x64.zip

Also restores the sd-* download pattern in the release job and keeps
.tar.xz support in the upload script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: use sd-{branch}-{hash}-ubuntu/windows-cuda-{sm}-x64 artifact names
Restores the filenames introduced in the initial CUDA PR:
  sd-master-{hash}-bin-Linux-Ubuntu-22.04-x86_64-cuda-{sm}.zip
  sd-master-{hash}-bin-win-cuda-{sm}-x64.zip

Reverts the intermediate ubuntu-cuda-* and windows-cuda-* name
experiments, and drops the .tar.xz branch from the release upload script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: revert CUDA artifact names to original sd-*-bin-* format
Linux:   sd-master-{hash}-ubuntu-cuda-sm_XX-x64.tar.xz
Windows: sd-master-{hash}-windows-cuda-sm_XX-x64.zip

Also adds .tar.xz support to the release upload script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: standardize CUDA artifact names and compression formats
Lemonade's sd_server.cpp builds the Windows CUDA download URL with a
.7z extension. The CI was producing .zip, causing a 404 on download
which presented as an extraction failure.

Also adds .7z to the release upload script allowlist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: package Windows CUDA artifacts as .7z
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: revert Windows CUDA artifacts back to .zip
@kenvandine
Copy link
Copy Markdown
Member

Please update the title to be more accurate as this is adding cuda not fixing

@Phqen1x Phqen1x changed the title Standardize and fix CUDA artifact naming and packaging Adding CUDA Support for stable-diffusion.cpp Jun 1, 2026
@kenvandine kenvandine requested a review from Copilot June 1, 2026 00:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the GitHub Actions release workflow to build CUDA-enabled stable-diffusion.cpp binaries for Ubuntu and Windows across multiple NVIDIA compute architectures, and updates release upload handling for the new archive format.

Changes:

  • Adds Ubuntu and Windows CUDA matrix jobs for sm_75 through sm_120.
  • Packages CUDA build outputs as .tar.xz on Ubuntu and .zip on Windows.
  • Updates release dependencies and upload filtering to include CUDA artifacts.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/workflows/build.yml Outdated
Comment on lines +163 to +165
cp -v ${cuda_lib}/libcublas.so* build/bin/ 2>/dev/null || true
cp -v ${cuda_lib}/libcublasLt.so* build/bin/ 2>/dev/null || true
cp -v ${cuda_lib}/libcurand.so* build/bin/ 2>/dev/null || true
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added libcudart.so* to the bundle step alongside the existing cublas/cublasLt/curand copies. Without it, the binary would fail to start on any system that doesn't have the matching CUDA runtime installed, since the RPATH is set to $ORIGIN only.

Comment thread .github/workflows/build.yml
Linux: add libcudart.so* to the bundled runtime libraries so binaries
can run on systems without a matching CUDA toolkit installed.

Windows: add id to cuda-toolkit step and robocopy cudart64/cublas64/
cublasLt64 DLLs into the zip so the build runs on machines without a
pre-installed CUDA runtime.

Addresses Copilot review comments on lemonade-sdk#7.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Phqen1x and others added 3 commits May 31, 2026 21:55
ci: bundle CUDA runtime libs for portability on both platforms
Linux:
- Add libcudart.so*, libnvJitLink.so* to bundled runtime libs
- Use cp -av (preserves symlinks) instead of cp -v
- RPATH step now patches all ELF files via file(1) instead of glob

Windows:
- Add curand and nvjitlink to CUDA toolkit sub-packages
- Copy cudart64, cublas64, cublasLt64, curand64, nvJitLink DLLs
  from CUDA_PATH\bin before zipping

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: match lemonade-sdk/llama.cpp CUDA library bundling
@Phqen1x Phqen1x requested a review from Copilot June 1, 2026 02:15
@kenvandine kenvandine requested review from Copilot and removed request for Copilot June 1, 2026 02:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

Comment thread .github/workflows/build.yml Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.

@Phqen1x Phqen1x marked this pull request as ready for review June 1, 2026 02:40
@kenvandine kenvandine self-requested a review June 1, 2026 02:41
Phqen1x and others added 5 commits May 31, 2026 22:53
Without an explicit ref, actions/checkout makes a GitHub API call to
determine the default branch of leejet/stable-diffusion.cpp. With 14
parallel CUDA jobs all doing this simultaneously this call can fail,
presenting as a checkout error. Pinning to master skips the API call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: pin CUDA clone steps to ref: master
Use multi-line array literal format for $runtimeDllPatterns and remove
the trailing backslash from the Copy-Item destination path, which was
causing an 'Unexpected token }' parser error at the foreach closing brace.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ci: fix PowerShell parser error in Windows CUDA pack step
Copy link
Copy Markdown
Member

@kenvandine kenvandine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

@kenvandine kenvandine merged commit da0c3b3 into lemonade-sdk:lemonade Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants