-
Notifications
You must be signed in to change notification settings - Fork 633
[PyTorch][Core][JAX] Expand troubleshooting docs #2602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyTorch][Core][JAX] Expand troubleshooting docs #2602
Conversation
Signed-off-by: Jeremy Berchtold <jberchtold@nvidia.com>
Greptile OverviewGreptile Summary
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant UV as uv/venv
participant Pip as pip/uv pip
participant Build as TE build (PEP517)
participant TE as transformer_engine
participant JAX
User->>UV: Activate virtual environment
User->>Pip: Install TE (uv pip install --no-build-isolation ...)
Pip->>Build: Build TE without isolation
Build-->>Pip: Wheel / install artifacts
Pip-->>TE: Importable package in venv
User->>TE: Run workload
alt cuDNN sublibrary loading failed
TE->>TE: dlopen cuDNN libs
TE-->>User: CUDNN_STATUS_SUBLIBRARY_LOADING_FAILED
User->>UV: Ensure venv cuDNN packages used
User->>Build: Set CUDNN_PATH/CUDNN_HOME/LD_LIBRARY_PATH
end
alt JAX FFI not registered
TE->>JAX: Register custom calls during init
JAX-->>User: No registered implementation for custom call (CUDA)
User->>Pip: Reinstall/build with --no-build-isolation
end
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
|
|
||
| .. code-block:: bash | ||
|
|
||
| export CUDNN_PATH=$(pwd)/.venv/lib/python3.12/site-packages/nvidia/cudnn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: hardcoded Python version may not work for all users - consider using a generic placeholder like pythonX.Y or explaining users should adjust this
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: extra blank line - RST should have only one blank line before code blocks (see lines 305-306 for consistent formatting)
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Signed-off-by: jberchtold-nvidia <158520091+jberchtold-nvidia@users.noreply.github.com>
|
/te-ci |
|
/te-ci |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 2 comments
| @@ -315,6 +315,37 @@ Troubleshooting | |||
| cd transformer_engine | |||
| pip install -v -v -v --no-build-isolation . | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RST list nesting broken
The new troubleshooting section uses 1./2. numbered items with nested bullet * **Symptoms:** / * **Solution:** lines, but there’s no blank line separating the list item from the nested bullet list. In reStructuredText this often breaks nesting/formatting (the * bullets can get treated as literal text or start a new top-level list). Add a blank line after each numbered item title (e.g., after 1. **Import Error:**) before the indented * bullets, and likewise for the JAX section.
Also appears at README.rst:325, README.rst:338, and README.rst:346 (same pattern).
| .. code-block:: bash | ||
|
|
||
| export CUDNN_PATH=$(pwd)/.venv/lib/python3.12/site-packages/nvidia/cudnn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra blank lines
There are two blank lines before the .. code-block:: bash directive. In RST, extra blank lines inside list items can cause the directive to detach from the list item and/or render with unexpected spacing. Reduce to a single blank line before the directive so it stays correctly nested under the Solution: bullet.
tdophung
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Expand the troubleshooting installation docs with a few recently debugged issues.

Type of change
Changes
uvvenvs and JAX-specific issue symptoms.Checklist: