Skip to content

Quantization support for GroupedTensor: FP8 per-tensor #2449

@ptrendx

Description

@ptrendx

Implement quantization support for the GroupedTensor type with FP8 per-tensor quantization.
The needed modifications to the existing kernel:

  • handling multiple amax values, each for different tensor
  • ignore padding in the allocation

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions