-
Notifications
You must be signed in to change notification settings - Fork 232
Streamline KD & QAD transformers Trainers #708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #708 +/- ##
==========================================
- Coverage 74.69% 74.68% -0.02%
==========================================
Files 192 192
Lines 18946 18950 +4
==========================================
+ Hits 14152 14153 +1
- Misses 4794 4797 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
examples/llm_qat/README.md
Outdated
| > **_NOTE:_** `launch.sh` defaults to use `LlamaDecoderLayer` as the transformer layer class. If your model uses a different class, you need to pass `--fsdp_transformer_layer_cls_to_wrap <your_layer_class>` to the `launch.sh` script. For example, for `Qwen/Qwen3-8B`, specify `--fsdp_transformer_layer_cls_to_wrap Qwen3DecoderLayer` as an additional argument. | ||
| > **_NOTE:_** The script defaults to using FSDP1. To use FSDP2, pass "--use_fsdp2 True" to the `launch.sh` script. Note that FSDP2 is less stable than FSDP1 currently. Use it with caution. | ||
| > **_NOTE:_** The script defaults to using FSDP1. To use FSDP2, pass "--backend=fsdp2" to the `launch.sh` script. Note that FSDP2 is less stable than FSDP1 currently. Use it with caution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this statement still valid? Note that FSDP2 is less stable than FSDP1 currently. Use it with caution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt it, but I don't have proof. I don't have proof that it is less stable either.
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
bde5788 to
190e4d2
Compare
c4c0d19 to
9f3b0f8
Compare
Signed-off-by: Asha Anoosheh <[email protected]>
9f3b0f8 to
50cb0f4
Compare
Signed-off-by: Asha Anoosheh <[email protected]>
Signed-off-by: Asha Anoosheh <[email protected]>
f6e1196 to
796b023
Compare
Signed-off-by: Asha Anoosheh <[email protected]>
| model = self.accelerator.unwrap_model(self.model) | ||
| with model.hide_teacher_model(), model.hide_loss_modules(enable=not _internal_call): | ||
| return QATTrainer.save_model(self, output_dir, _internal_call, *args, **kwargs) | ||
| else: | ||
| return KDTrainer.save_model(self, output_dir, _internal_call, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need def save_model( for QADTrainer?
MRO for QADTrainer.save_model is `QATTrainer.save_model -> KDTrainer.save_model -> HF Trainer.svae_model right? What in particular is being achieved here now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't respect this MRO, the fix in https://github.com/NVIDIA/Model-Optimizer/pull/546/files wont be applied for QAD checkpoints
What does this PR do?
Type of change: ? Refactor and stabilization
Overview:
Usage
# Add a code snippet demonstrating how to use thisTesting
Before your PR is "Ready for review"
Additional Information