Support MOSS-TTSD v0.7 fused with codec#6
Merged
CloudRipple merged 1 commit intomainfrom Mar 12, 2026
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This pull request introduces support for multi-channel audio generation models, specifically adding configuration and runtime changes for Moss-TTSD-With-Codec. Key enhancements include new model configuration, improved handling of multi-channel input/output, and sampler logic updates to support multi-channel generation. The changes are grouped below by theme:
Modifications
Multi-channel audio model support:
MossTTSDWithCodecConfiginpython/sglang/srt/configs/moss_ttsd_with_codec.pyand registered it inpython/sglang/srt/configs/__init__.pyto support Moss-TTSD-With-Codec audio generation models. [1] [2] [3]_init_channelsmethod and related logic inModelConfigto normalize and handle multi-channel metadata from model configs (channelsorn_vq). [1] [2]is_audio_gen_modelutility and detection logic for audio generation models, updating model type checks and health endpoints. [1] [2] [3]Input/output handling for multi-channel models:
generateAPI inengine.pyto accept nested lists forinput_idsand propagatemulti_channelflag. [1] [2]Sampler logic improvements:
MultiChannelSamplerclass insampler.pyto handle sampling for multi-channel logits, and updatedcreate_samplerto select the appropriate sampler based onmulti_channelflag. [1] [2]LogitsProcessorand its buffer copying logic to handle per-channel vocab sizes and output slicing for multi-channel models. [1] [2]Miscellaneous:
sampler.pyand addedtempfileimport indetokenizer_manager.py. [1] [2]Accuracy Tests
Checklist