Support complicated use cases with TiedLayerSpec#7208
Support complicated use cases with TiedLayerSpec#7208loadams merged 2 commits intodeepspeedai:masterfrom
Conversation
Extend the builtin `getattr` to a recursive version `PipelineModule._recursive_getattr` for nested tied weights, e.g., "linear.weight". Meanwhile, sort tie_keys in `PipelineModule._index_tied_modules` to avoid hanging. Signed-off-by: Mingjie Li <limingjie@chinamobile.com>
|
Thank you @limjcst for the contribution! This is a significant improvement. |
|
nv-accelerate-v100 failed, raising "invalid command 'bdist_wheel'". However, this job succeeded in another run. Note that the failed job used "cached wheel-0.46.1-py3-none-any.whl.metadata". |
Thanks @limjcst - I saw this failure on another PR and will take a look and merge the fixes into your PR when ready. |
@limjcst - it looks like the |
|
Nevertheless, upgrading to |
@agronholm - yes, we have a PR for one here which we will prioritize merging as we know this is needed. |
I want to reuse a composed module in the pipeline. For example, the
following `MyModule` has a member `linear`, which is also a module.
```python
class MyModule(torch.nn.Module):
def __init__(self, n_in: int, n_out: int):
super().__init__()
self.linear = torch.nn.Linear(n_in, n_out)
self.layer_norm = torch.nn.LayerNorm(n_out)
def forward(self, data: torch.Tensor) -> torch.Tensor:
hidden = self.linear(data)
hidden = self.layer_norm(hidden)
return hidden
```
`MyModule.linear.weight` should be synchronized among related ranks. As
a result, I add `linear.weight` to `TiedLayerSpec.tied_weight_attr`.
BTW, I generate the whole `tied_weight_attr` by the following
instruction.
```python
tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1]
```
However, the builtin `getattr` used by `PipelineModule` fails to find a
nested attribute like `linear.weight`.
Hence, this PR first extends the builtin `getattr` to a recursive
version `PipelineModule._recursive_getattr`, accessing each attribute
segment one by one.
Meanwhile, the order of tied weights matters in synchronization. This PR
suggests to sort tie_keys in `PipelineModule._index_tied_modules` to
avoid hanging.
Signed-off-by: Mingjie Li <limingjie@chinamobile.com>
Co-authored-by: Mingjie Li <limingjie@chinamobile.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Signed-off-by: yisheng <yi.sheng@intel.com>
I want to reuse a composed module in the pipeline. For example, the
following `MyModule` has a member `linear`, which is also a module.
```python
class MyModule(torch.nn.Module):
def __init__(self, n_in: int, n_out: int):
super().__init__()
self.linear = torch.nn.Linear(n_in, n_out)
self.layer_norm = torch.nn.LayerNorm(n_out)
def forward(self, data: torch.Tensor) -> torch.Tensor:
hidden = self.linear(data)
hidden = self.layer_norm(hidden)
return hidden
```
`MyModule.linear.weight` should be synchronized among related ranks. As
a result, I add `linear.weight` to `TiedLayerSpec.tied_weight_attr`.
BTW, I generate the whole `tied_weight_attr` by the following
instruction.
```python
tied_weight_attr = [name for name, p in layer.named_parameters() if p.numel() > 1]
```
However, the builtin `getattr` used by `PipelineModule` fails to find a
nested attribute like `linear.weight`.
Hence, this PR first extends the builtin `getattr` to a recursive
version `PipelineModule._recursive_getattr`, accessing each attribute
segment one by one.
Meanwhile, the order of tied weights matters in synchronization. This PR
suggests to sort tie_keys in `PipelineModule._index_tied_modules` to
avoid hanging.
Signed-off-by: Mingjie Li <limingjie@chinamobile.com>
Co-authored-by: Mingjie Li <limingjie@chinamobile.com>
Co-authored-by: Masahiro Tanaka <81312776+tohtana@users.noreply.github.com>
Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
I want to reuse a composed module in the pipeline. For example, the following
MyModulehas a memberlinear, which is also a module.MyModule.linear.weightshould be synchronized among related ranks. As a result, I addlinear.weighttoTiedLayerSpec.tied_weight_attr.BTW, I generate the whole
tied_weight_attrby the following instruction.However, the builtin
getattrused byPipelineModulefails to find a nested attribute likelinear.weight.Hence, this PR first extends the builtin
getattrto a recursive versionPipelineModule._recursive_getattr, accessing each attribute segment one by one.Meanwhile, the order of tied weights matters in synchronization. This PR suggests to sort tie_keys in
PipelineModule._index_tied_modulesto avoid hanging.