-
Notifications
You must be signed in to change notification settings - Fork 270
Add generate_identity_sequences helper and replace lambdas with named functors #3628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add generate_identity_sequences helper and replace lambdas with named functors #3628
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR aims to reduce C++ template instantiations (and improve build times) by introducing reusable helpers for common sequence/tuple metaprogramming patterns and by replacing per-call-site lambdas with named functors.
Changes:
- Added
generate_identity_sequences<N>()helper to generateTuple<Sequence<0>, ..., Sequence<N-1>>without lambdas. - Added named sequence utilities (
merge_sequences_functor,unpack_and_merge_sequences) and replaced lambdas intransform_tensor_descriptor/TensorDescriptorlogic. - Updated multiple call sites to use the new helper(s) and added unit tests.
Reviewed changes
Copilot reviewed 18 out of 18 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| test/util/unit_sequence_helper.cpp | Adds unit tests for generate_identity_sequences and unpack_and_merge_sequences. |
| test/util/CMakeLists.txt | Adds a new gtest executable target for the new unit tests. |
| include/ck/wrapper/utils/tensor_partition.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/utils/layout_utils.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/tensor.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/operations/gemm.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/wrapper/layout.hpp | Switches identity-dimension tuple generation to generate_identity_sequences. |
| include/ck/utility/tuple_helper.hpp | Introduces generate_identity_sequences helper implementation. |
| include/ck/utility/sequence_helper.hpp | Introduces named functors and unpack_and_merge_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3_scatter.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r3.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v7r2.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r2.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_gather.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1_dequant.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/thread/threadwise_tensor_slice_transfer_v3r1.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_operation/gpu/device/matrix_padder.hpp | Replaces identity sequence generation lambda with generate_identity_sequences. |
| include/ck/tensor_description/tensor_descriptor.hpp | Replaces lambdas with named functors and uses unpack_and_merge_sequences. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Replace inline lambdas with named functor structs in transform_tensor_descriptor to reduce template instantiation overhead and improve compile times. Changes: - Add three named functors in tensor_descriptor.hpp: - convert_visible_to_hidden_id: maps visible dimension ID to hidden ID - convert_visible_ids_to_hidden_ids: maps sequence of visible IDs to hidden IDs - generate_arithmetic_sequence_from_scan: generates consecutive hidden dim ID ranges - Add utility functions in sequence_helper.hpp and tuple_helper.hpp: - unpack_and_merge_sequences(): unpacks tuple of sequences and merges them - generate_identity_sequences(): creates Tuple<Sequence<0>, Sequence<1>, ...> - Update 14 call sites across threadwise transfer, wrapper, and device files to use generate_identity_sequences() instead of generate_tuple with lambdas - Add comprehensive unit tests: - unit_sequence_helper.cpp: tests for new utility functions - unit_tensor_descriptor_functors.cpp: tests for new functors Co-Authored-By: Claude <noreply@anthropic.com>
95c6c4b to
bce6ec1
Compare
| } | ||
|
|
||
| // Functor wrapper for merge_sequences to enable reuse across call sites | ||
| struct merge_sequences_functor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing to consider is whether those new helper functors are the implementation detail and not the part of the header interface
|
Imported to ROCm/rocm-libraries |
Summary
generate_identity_sequences<N>()helper that returnsTuple<Sequence<0>, Sequence<1>, ..., Sequence<N-1>>transform_tensor_descriptorunpack_and_merge_sequenceshelper functortransform_tensor_descriptorinstantiations from 388 to 32 (92% reduction)Motivation
Multiple call sites use
generate_tuple([](auto i) { return Sequence<i>{}; }, Number<N>{})pattern. A named helper reduces lambda instantiations.Additionally, each lambda in
transform_tensor_descriptorcreates a unique closure type, causing the function to be instantiated separately for every call site. Named functors share a single type, so the compiler reuses the same instantiation.Changes
Part 1: generate_identity_sequences helper
Part 2: Named functors in transform_tensor_descriptor
unpack_and_merge_sequenceshelper to replace lambda inGetNumOfHiddenDimensiongenerate_identity_sequencesinmatrix_padder.hppTest Plan
generate_identity_sequencesunpack_and_merge_sequencesRelated PRs
This PR merges the functionality from:
Part of PR stack for issue #3575 (Reduce CK/CKTile Build Times)
Note: This PR supersedes #3588 and #3589, which can be closed once this is merged.