-
-
Notifications
You must be signed in to change notification settings - Fork 14.2k
Move shared offload globals and define per-kernel globals once #149788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9de91f2 to
b357e0c
Compare
| if cx.sess().opts.unstable_opts.offload.contains(&Offload::Enable) | ||
| && !cx.sess().target.is_like_gpu | ||
| { | ||
| cx.offload_globals.replace(Some(OffloadGlobals::declare(&cx))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm a bit unsure about this location. we could also cache these globals and generate them on the first intrinsic call, but that felt like overloading intrinsic codegen a bit too much
i don't have a strong opinion though, so happy to go with whatever u think is best
|
Thanks for the IR and code cleanup, the struct makes it much nicer. @bors try @rust-timer queue I don't think it will have any impact, but then again, we thought the same when accidentally causing a regression with autodiff, so let's see. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Move shared offload globals and define per-kernel globals once
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (b120fe9): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)This benchmark run did not return any relevant results for this metric. CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 481.444s -> 482.856s (0.29%) |
This PR moves the shared LLVM global variables logic out of the
offloadintrinsic codegen and generates kernel-specific variables only ont he first call of the intrinsic.r? @ZuseZ4