Skip to content

Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML#789

Open
polarG wants to merge 1 commit intomainfrom
dev/hongtaozhang/exclude-gpu-stream-from-rocm
Open

Bugfix - gpu_stream: remove ROCm build support, require CUDA with NVML#789
polarG wants to merge 1 commit intomainfrom
dev/hongtaozhang/exclude-gpu-stream-from-rocm

Conversation

@polarG
Copy link
Contributor

@polarG polarG commented Mar 14, 2026

Summary

The gpu_stream benchmark has NVIDIA-specific dependencies that prevent it from compiling on ROCm 6.3+. This change makes it CUDA-only, gracefully skipping the build with a warning on non-NVIDIA
environments.

Problem

The gpu_stream benchmark fails to compile on ROCm 6.3+ due to multiple NVIDIA-specific dependencies:

  1. nvml.h — NVIDIA Management Library header, used for querying actual memory clock rates. No HIP equivalent. Referenced in gpu_stream.cu and gpu_stream_utils.hpp.
  2. cuda.h in headers — Three .hpp files (gpu_stream.hpp, gpu_stream_kernels.hpp, gpu_stream_utils.hpp) directly include <cuda.h> and <cuda_runtime.h>. These headers are not processed by hipify-perl (only
    .cu source files are), so they fail to resolve on ROCm.
  3. Deprecated hipDeviceProp_t struct fields — The code accesses memoryBusWidth, memoryClockRate, and ECCEnabled from the device properties struct. These fields were removed from hipDeviceProp_t in ROCm
    6.3, causing compilation errors after hipification.

The existing ROCm path was marked as incomplete (# TODO: test for ROC) and was never fully functional on recent ROCm versions.

Changes

  • Removed the non-functional ROCm/HIP build path from gpu_stream/CMakeLists.txt
  • When CUDA is not found, prints a warning and returns gracefully instead of attempting a broken hipify build or raising FATAL_ERROR
  • No changes to the NVIDIA/CUDA build path — it continues to work as before

Impact

  • NVIDIA builds: No change — gpu_stream builds and installs normally
  • ROCm builds: gpu_stream is skipped with a warning message. Previously it would fail the entire make cppbuild step, blocking the Docker image build
  • Other benchmarks: Unaffected — build.sh continues to the next benchmark after gpu_stream returns

gpu_stream depends on NVML (nvidia-ml) for GPU monitoring which is
NVIDIA-specific and has no ROCm equivalent. Remove the ROCm/HIP build
path and skip the build cleanly when CUDA is not found.
@polarG polarG requested a review from a team as a code owner March 14, 2026 02:16
Copilot AI review requested due to automatic review settings March 14, 2026 02:16
@polarG polarG added bug Something isn't working CI/CD Continuous integration or deployment labels Mar 14, 2026
@polarG polarG self-assigned this Mar 14, 2026
@polarG polarG requested review from WenqingLan1 and guoshzhao March 14, 2026 02:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Removes the broken ROCm/HIP build path for the gpu_stream benchmark, making it CUDA-only with a graceful warning when CUDA is not found.

Changes:

  • Replaced the non-functional ROCm/HIP build path and FATAL_ERROR fallback with a WARNING message and return()
  • No changes to the CUDA build path

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Mar 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.70%. Comparing base (6b8e810) to head (2bf019a).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #789   +/-   ##
=======================================
  Coverage   85.70%   85.70%           
=======================================
  Files         102      102           
  Lines        7703     7703           
=======================================
  Hits         6602     6602           
  Misses       1101     1101           
Flag Coverage Δ
cpu-python3.10-unit-test 70.96% <ø> (ø)
cpu-python3.12-unit-test 70.96% <ø> (ø)
cpu-python3.7-unit-test 70.43% <ø> (ø)
cuda-unit-test 83.59% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working CI/CD Continuous integration or deployment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants