Add hipdnn convolution support#3049
Conversation
|
Jenkins build for 2cea5f50e97a741b501c0fce1667b713966152db commit finished as FAILURE |
|
Jenkins build for 168eeb1b729889b93ad504773a9b8c3c5be09592 commit finished as FAILURE |
|
Jenkins build for b0ca170b13923c1d321e2aed5f3b2daef7d5dd09 commit finished as NOT_BUILT |
|
Jenkins build for b0ca170b13923c1d321e2aed5f3b2daef7d5dd09 commit finished as FAILURE Detected error during Pytorch building: |
2befc15 to
f6a72d1
Compare
| // --------------------------------------------------------------------------- | ||
| // Cache key: captures everything that determines graph topology | ||
| // --------------------------------------------------------------------------- | ||
| constexpr int hipdnn_max_dim = 3; |
There was a problem hiding this comment.
MIOPEN_DIM_MAX = 5 already defined and used
There was a problem hiding this comment.
I'd like to leave this separate, since I don't want to use any of these named constants from other source files.
E.g., hipdnn might have other backends which support more dim combinations than miopen.
|
I think I need a rebase, one sec. |
b0ca170 to
184c4c3
Compare
|
Jenkins build for 184c4c3223e16aa7318a8d4f9d82bcad635636c0 commit finished as SUCCESS |
|
Jenkins build for a2cb42c0f7117d294a9cbd6ffad6ad27341fc523 commit finished as NOT_BUILT |
|
Jenkins build for a2cb42c0f7117d294a9cbd6ffad6ad27341fc523 commit finished as FAILURE |
mousdahl-amd
left a comment
There was a problem hiding this comment.
Nothing major jumped out at me.
| if (!at::globalContext().userEnabledHipdnn()) return false; | ||
| if (!detail::getCUDAHooks().compiledWithHipDNN()) return false; | ||
| if (!input.is_cuda()) return false; | ||
| auto dtype = input.scalar_type(); |
There was a problem hiding this comment.
These datatype / dimension checks are interesting to me. I understand wanting a fast path to skip if hipDNN doesn't support certain datatypes / tensor dimensions, but this could very easily change going forward. I wonder if there's a way we can check in with hipDNN to see if it's got applicable engines instead. I just want to avoid a maintenance headache if possible.
The problem with that is it may be weightier than you want to do.
Implement hipDNN-based forward and backward convolution (2D and 3D) with a thread-local LRU graph cache to amortize the expensive graph->build() cost. Supports contiguous and channels-last memory formats, grouped/depthwise configurations, and transposed convolution. The graph cache follows the cuDNN v8 pattern from Conv_v8.cpp with configurable size via TORCH_HIPDNN_CONV_LRU_CACHE_LIMIT env var. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add Hipdnn and HipdnnTranspose to ConvBackend enum and wire them through the full dispatch path: backend selection (use_hipdnn check inserted before use_miopen), memory format selection, forward switch, backward switch, and Python enum exposure. hipDNN takes priority over MIOpen when torch.backends.hipdnn.enabled is True. The hipdnn_conv_suggest_memory_format() supports NHWC/NDHWC unconditionally since hipDNN handles strides natively. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The functions used SymIntArrayRef/SymInt parameter types but the dispatch registration in native_functions.yaml maps to the non-symint overload, which expects IntArrayRef/int64_t. This caused undefined reference linker errors in libtorch_hip.so. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix cache limit env var: replace check_env() (returns bool) with get_env() + stoi() so custom limits are actually parsed - Remove cudnn_enabled gate from use_hipdnn(); userEnabledHipdnn() already handles the enable/disable check independently - Fuse bias into forward conv graph via pointwise(ADD), eliminating the separate output.add_(reshape_bias(...)) call - Remove unused #include <mutex> (cache is thread-local) - Unify namespace style to `namespace at::native` - Add clarifying comment on implicit group count via tensor shapes Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add test_hipdnn_conv.py with forward+backward tests for hipDNN conv. Working: fp32 basic conv, dilation. Xfail: bias (plugin needs conv+bias +activ 3-node graph), bf16, grouped/strided backward, transposed conv. Skip: depthwise (GPU fault). Fix buildConvFpropGraph to set intermediate_data_type on graph and compute_data_type on pointwise attributes, matching the hipDNN sample pattern for fused conv+bias. Bias fusion still blocked by MIOpen legacy plugin only supporting 3-node (conv+bias+activ) graphs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
hipDNN infers group count from tensor dimensions. For wgrad, both graph inputs (dy and x) have full channel counts, so without explicitly setting the output weight shape, hipDNN cannot determine the group count and defaults to groups=1. This causes out-of-bounds GPU memory access for grouped/depthwise convolutions. Set output dims on both dgrad and wgrad graphs using the known input_size and weight_size respectively. Authored with Claude. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…loat16 numerics. Signed-off-by: zjgarvey <zjgarvey@gmail.com>
Replace the custom HipdnnGraphCache with a shared ParamsLRUCache<K,V> template, remove stored UIDs from the cached graph struct in favor of enum constants with semantic aliases, and move contiguous() calls into the hipDNN entry points so the dispatch site doesn't need them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Include the graph output dimensions in HipdnnConvParams so that dgrad graphs built for different output_padding values (e.g. transposed conv with output_padding=1 vs 0) are cached separately. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Wire bias through buildConvDgradGraph and runHipdnnConvDgrad so dgrad+pointwise(ADD) fusion is ready when hipDNN backend plugins support it. For now the transposed conv entry point still applies bias separately since no plugin handles this pattern yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix cache_limit == 0 to mean "unlimited" (no eviction, no LRU tracking) instead of the previous behavior where entries were still added to cache_order unnecessarily. Add TORCH_INTERNAL_ASSERT on eviction erase to catch cache corruption early. Plumb benchmark and deterministic flags through to the cache key so that different flag combinations produce separate cache entries, preparing for when HipDNN supports algorithm/engine selection based on these flags. Authored with Claude. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
deterministic=true is a correctness contract that hipDNN cannot honor yet (no engine-level determinism filtering), so raise an error rather than silently producing non-deterministic results. benchmark=true is a performance hint (algorithm search), safe to ignore but users should know it has no effect — emit TORCH_WARN_ONCE. Also removes both flags from the cache key since they do not affect graph construction. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of applying bias separately after the dgrad call, pass it through to runHipdnnConvDgrad so it can be fused when the backend supports it. Signed-off-by: Zach Garvey <zachary.garvey@amd.com> Assisted-By: Claude Opus 4.6 <noreply@anthropic.com>
a2cb42c to
cd74159
Compare
|
Jenkins build for cd7415905e38d51aec57a82df23904df59fd1bb7 commit finished as FAILURE |
5e55e83 to
a5d098f
Compare
Per-backend forward ops (hipdnn_convolution, hipdnn_convolution_transpose) are invisible to the compiler -- AOT autograd captures aten.convolution as an opaque node, never per-backend ops. Removing them from native_functions.yaml and using dispatch stubs (the same mechanism backward has used since Joel Schlosser's 2021 refactor) eliminates derivatives.yaml entries, FC allowlist entries, HasDecomp entries, and trace_rules entries. hipDNN is the first convolution backend to use dispatch stubs for forward, proving the pattern works for a potential broader cleanup of cuDNN/MIOpen. Assisted-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: zjgarvey <zjgarvey@gmail.com>
|
Jenkins build for 5e5cd8a61c1d5ee5381f87ee5a44f69e9e4a6620 commit finished as FAILURE |
Implements forward and backward convolution (2D/3D) through the hipDNN frontend graph API, providing an alternative to the MIOpen backend on ROCm.
createTensorAttributes()intoATen/hipdnn/Utils.hfor reuse across hipDNN ops (BatchNorm, Conv)ParamsLRUCache<K,V>) to amortizegraph->build()costConvBackend::HipdnnandConvBackend::HipdnnTransposevariants wired through backend selection, memory format selection, forward/backward switches, and Python enum exposure. hipDNN takes priority over MIOpen whentorch.backends.hipdnn.enabledisTrueoutput.add_()call