-
Notifications
You must be signed in to change notification settings - Fork 269
WMMA grouped conv fwd large tensor extra flavors #3582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WMMA grouped conv fwd large tensor extra flavors #3582
Conversation
f5f3e01 to
3af62e5
Compare
3af62e5 to
34333d6
Compare
...eration/gpu/device/impl/device_grouped_conv_fwd_multiple_d_wmma_cshuffle_v3_large_tensor.hpp
Outdated
Show resolved
Hide resolved
|
Looks good! I have a couple comments:
|
|
Looks good from my side. No comments. |
|
Hi @krithalith
|
...tance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_cshufflev3_large_tensor_instance.hpp
Outdated
Show resolved
Hide resolved
Ok great, the new generic instances look good and thanks for checking the coverage. Yes I agree that in this case another Large test is not necessary. All good on my end, I had one micro-nit about the instance list headers but approved. |
bartekxk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm please rebase
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds WMMA large tensor support for grouped convolution forward operations with clamp and bias+clamp element-wise operations for FP16 and BF16 data types. The changes enable additional operation flavors that were previously commented out.
Changes:
- Added device instance files for grouped_conv2d_fwd_clamp and grouped_conv3d_fwd_clamp with large tensor support (FP16/BF16, both regular and generic variants)
- Added device instance files for grouped_conv2d_fwd_bias_clamp and grouped_conv3d_fwd_bias_clamp with large tensor support (FP16/BF16, both regular and generic variants)
- Modified device implementation to properly handle operations with and without D tensors (bias), using
if constexprguards and replacing manual struct initialization withEmplacemethod - Added
Emplaceutility method to Array class for in-place construction with designated initializers - Uncommented and enabled previously disabled function declarations and calls for large tensor instances
Reviewed changes
Copilot reviewed 27 out of 27 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_clamp/wmma/large_tensor/*.cpp | 4 new instance files for 3D convolution forward with clamp (F16/BF16, regular/generic) |
| library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_bias_clamp/wmma/large_tensor/*.cpp | 4 new instance files for 3D convolution forward with bias+clamp (F16/BF16, regular/generic) |
| library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd_clamp/wmma/large_tensor/*.cpp | 4 new instance files for 2D convolution forward with clamp (F16/BF16, regular/generic) |
| library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd_bias_clamp/wmma/large_tensor/*.cpp | 4 new instance files for 2D convolution forward with bias+clamp (F16/BF16, regular/generic) |
| library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_clamp/CMakeLists.txt | Registers new 3D clamp instance files in build system |
| library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_bias_clamp/CMakeLists.txt | Registers new 3D bias+clamp instance files in build system |
| library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd_clamp/CMakeLists.txt | Registers new 2D clamp instance files in build system |
| library/src/tensor_operation_instance/gpu/grouped_conv2d_fwd_bias_clamp/CMakeLists.txt | Registers new 2D bias+clamp instance files in build system |
| library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp_wmma_cshufflev3.inc | Uncomments function declarations for large tensor clamp instances |
| library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_clamp.hpp | Enables calls to large tensor clamp instance functions |
| library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp_wmma_cshufflev3.inc | Uncomments function declarations for large tensor bias+clamp instances |
| library/include/ck/library/tensor_operation_instance/gpu/grouped_convolution_forward_bias_clamp.hpp | Enables calls to large tensor bias+clamp instance functions |
| library/include/ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_cshufflev3_large_tensor_instance.hpp | Adds generic instance templates and element-wise operation type aliases |
| include/ck/utility/array.hpp | Adds Emplace method for in-place construction with proper object lifetime management |
| include/ck/tensor_operation/gpu/device/impl/device_grouped_conv_fwd_multiple_d_wmma_cshuffle_v3_large_tensor.hpp | Refactors initialization to use Emplace with designated initializers and adds conditional compilation guards for D tensor operations |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
library/src/tensor_operation_instance/gpu/grouped_conv3d_fwd_clamp/CMakeLists.txt
Outdated
Show resolved
Hide resolved
8651b4c
7475853 to
72a000f
Compare
- added F16/BF16 clamp operation - added F16/BF16 bias_clamp operation - small modification to the device code to accomodate extra tensors
72a000f to
3c64b49
Compare
Added additional flavors for WMMA conv fwd large tensor.
Following operations are added for FP16/BF16 data type and NHWGCxGKYXC layout.
Plus small modification to the device code to accommodate extra tensors.
Proposed changes
Please describe the motivation behind the pull request, whether it enables a new feature or fixes a bug. If there are associated pull requests or issues, please link them to the pull request.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered