Move shared offload globals and define per-kernel globals once #149788

Sa4dUs · 2025-12-08T21:57:26Z

This PR moves the shared LLVM global variables logic out of the offload intrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic.

r? @ZuseZ4

Sa4dUs · 2025-12-08T22:18:33Z

compiler/rustc_codegen_llvm/src/base.rs

+            if cx.sess().opts.unstable_opts.offload.contains(&Offload::Enable)
+                && !cx.sess().target.is_like_gpu
+            {
+                cx.offload_globals.replace(Some(OffloadGlobals::declare(&cx)));


i'm a bit unsure about this location. we could also cache these globals and generate them on the first intrinsic call, but that felt like overloading intrinsic codegen a bit too much

i don't have a strong opinion though, so happy to go with whatever u think is best

ZuseZ4 · 2025-12-18T12:46:10Z

Thanks for the IR and code cleanup, the struct makes it much nicer.
Since this adds some entries to the fullCx, I'll run a

@bors try @rust-timer queue

I don't think it will have any impact, but then again, we thought the same when accidentally causing a regression with autodiff, so let's see.

Move shared offload globals and define per-kernel globals once

rust-bors · 2025-12-18T15:03:34Z

☀️ Try build successful (CI)
Build commit: b120fe9 (b120fe9e9f5cbf9dd3fe1c38c97b3b2d44dd94ee, parent: ed0006a7ba2dc8fabab8ea94d6f843886311b3c7)

rust-timer · 2025-12-18T15:44:49Z

Finished benchmarking commit (b120fe9): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.1%	[1.1%, 1.1%]	2
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 481.444s -> 482.856s (0.29%)
Artifact size: 390.60 MiB -> 390.60 MiB (-0.00%)

rustbot assigned ZuseZ4 Dec 8, 2025

Sa4dUs added 2 commits December 8, 2025 23:00

Split runtime global logic and cache kernel specific one

c9cf689

Remove outdated comment

b357e0c

Sa4dUs force-pushed the offload-cleanup branch from 9de91f2 to b357e0c Compare December 8, 2025 22:00

Sa4dUs commented Dec 8, 2025

View reviewed changes

ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Dec 10, 2025

ZuseZ4 mentioned this pull request Dec 10, 2025

Tracking Issue for GPU-offload #131513

Open

5 tasks

Remove region_id unnamed attr

37df9e8

This comment has been minimized.

Sign in to view

rust-bors bot added a commit that referenced this pull request Dec 18, 2025

Auto merge of #149788 - Sa4dUs:offload-cleanup, r=<try>

b120fe9

Move shared offload globals and define per-kernel globals once

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 18, 2025

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Dec 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Move shared offload globals and define per-kernel globals once #149788

Move shared offload globals and define per-kernel globals once #149788

Sa4dUs commented Dec 8, 2025

Uh oh!

Sa4dUs Dec 8, 2025

Uh oh!

ZuseZ4 commented Dec 18, 2025 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Dec 18, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Move shared offload globals and define per-kernel globals once #149788

Are you sure you want to change the base?

Move shared offload globals and define per-kernel globals once #149788

Conversation

Sa4dUs commented Dec 8, 2025

Uh oh!

Sa4dUs Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

ZuseZ4 commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

rust-bors bot commented Dec 18, 2025

Uh oh!

This comment has been minimized.

rust-timer commented Dec 18, 2025

Overall result: ❌ regressions - no action needed

Instruction count

Max RSS (memory usage)

Cycles

Binary size

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ZuseZ4 commented Dec 18, 2025 •

edited

Loading