-
Notifications
You must be signed in to change notification settings - Fork 618
Fix Github workflows issues #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Pawel Gadzinski <[email protected]>
Greptile OverviewGreptile SummaryFixed critical CI/CD infrastructure issues including OOM failures in build jobs and incompatible package versions in docs deployment. Build workflow fixes:
Docs deployment fixes:
Confidence Score: 4/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant GH as GitHub Event
participant Build as Build Workflow
participant Docker as Docker Container
participant Disk as Disk Space
participant Docs as Docs Workflow
participant Pages as GitHub Pages
Note over GH,Build: build.yml workflow
GH->>Build: PR or workflow_dispatch
Build->>Disk: Free up disk space (rm boost, tools, swift, etc.)
Build->>Disk: Maximize build space (reduce reserves)
Build->>Docker: Start container (ghcr.io/nvidia/jax:jax)
Docker->>Docker: Install dependencies (cmake, torch, etc.)
Docker->>Docker: pip cache purge
Build->>Docker: docker exec -e MAX_JOBS=1 -e NVTE_FRAMEWORK=pytorch
Docker->>Docker: Build TransformerEngine
Docker->>Docker: Run sanity checks
Note over GH,Pages: deploy_nightly_docs.yml workflow
GH->>Docs: Push to main or workflow_dispatch
Docs->>Docs: Build documentation
Docs->>Pages: Upload artifact (v4)
Pages->>Pages: Prepare pages artifact (v3)
Pages->>Pages: Deploy to GitHub Pages (v4)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Signed-off-by: Pawel Gadzinski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, no comments
Signed-off-by: Pawel Gadzinski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
Signed-off-by: Pawel Gadzinski <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
| - name: Start named container | ||
| run: | | ||
| docker run -v $(pwd):$(pwd) -w $(pwd) --name builder -d nvcr.io/nvidia/cuda:12.8.0-devel-ubuntu22.04 sleep infinity | ||
| docker run -v $(pwd):$(pwd) -w $(pwd) --name builder -d ghcr.io/nvidia/jax:jax sleep infinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched from nvcr.io/nvidia/cuda:12.8.0-devel-ubuntu22.04 to ghcr.io/nvidia/jax:jax base image
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Description
This PR fixes following issues:
Fixes # (issue)
Type of change
Checklist: