Skip to content

Conversation

@Sa4dUs
Copy link
Contributor

@Sa4dUs Sa4dUs commented Dec 8, 2025

This PR moves the shared LLVM global variables logic out of the offload intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic.

r? @ZuseZ4

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 8, 2025
if cx.sess().opts.unstable_opts.offload.contains(&Offload::Enable)
&& !cx.sess().target.is_like_gpu
{
cx.offload_globals.replace(Some(OffloadGlobals::declare(&cx)));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm a bit unsure about this location. we could also cache these globals and generate them on the first intrinsic call, but that felt like overloading intrinsic codegen a bit too much

i don't have a strong opinion though, so happy to go with whatever u think is best

@ZuseZ4 ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Dec 10, 2025
@ZuseZ4 ZuseZ4 mentioned this pull request Dec 10, 2025
5 tasks
@ZuseZ4
Copy link
Member

ZuseZ4 commented Dec 18, 2025

Thanks for the IR and code cleanup, the struct makes it much nicer.
Since this adds some entries to the fullCx, I'll run a

@bors try @rust-timer queue

I don't think it will have any impact, but then again, we thought the same when accidentally causing a regression with autodiff, so let's see.

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Dec 18, 2025
Move shared offload globals and define per-kernel globals once
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 18, 2025
@rust-timer

This comment has been minimized.

@rust-bors
Copy link

rust-bors bot commented Dec 18, 2025

☀️ Try build successful (CI)
Build commit: b120fe9 (b120fe9e9f5cbf9dd3fe1c38c97b3b2d44dd94ee, parent: ed0006a7ba2dc8fabab8ea94d6f843886311b3c7)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b120fe9): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
1.1% [1.1%, 1.1%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 481.444s -> 482.856s (0.29%)
Artifact size: 390.60 MiB -> 390.60 MiB (-0.00%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants