[Docs]Update DSK-V3.1 Docs#6347
Open
Linboyan-trc wants to merge 4 commits intoPaddlePaddle:developfrom
Open
Conversation
|
|
|
Thanks for your contribution! |
chang-wenbin
reviewed
Feb 5, 2026
| #### 2.2.3 Chunked Prefill | ||
| **原理:** 采用分块策略,将预填充(Prefill)阶段请求拆解为小规模子任务,与解码(Decode)请求混合批处理执行。可以更好地平衡计算密集型(Prefill)和访存密集型(Decode)操作,优化GPU资源利用率,减少单次Prefill的计算量和显存占用,从而降低显存峰值,避免显存不足的问题。 具体请参考[Chunked Prefill](../features/chunked_prefill.md) | ||
|
|
||
| - **参数:** `--enable-chunked-prefill` |
Collaborator
There was a problem hiding this comment.
这个命令已经失效了,注意文档内容的验证,保证所有参数都是可用的;
Collaborator
There was a problem hiding this comment.
deepseek 需要关闭这个参数,可以找一下当前如何关闭;并在文档里指出
|
|
||
| - **相关配置**: | ||
|
|
||
| `--max-num-batched-tokens`:限制每个chunk的最大token数量。多模场景下每个chunk会向上取整保持图片的完整性,因此实际每次推理的总token数会大于该值。推荐设置为384。 |
| **启用方式:** | ||
| 自2.2版本开始(包括develop分支),Prefix Caching已经默认开启。 | ||
|
|
||
| 对于2.1及更早的版本,需要手动开启。其中`--enable-prefix-caching`表示启用前缀缓存,`--swap-space`表示在GPU缓存的基础上,额外开启CPU缓存,大小为GB,应根据机器实际情况调整。建议取值为`(机器总内存 - 模型大小) * 20%`。如果因为其他程序占用内存等原因导致服务启动失败,可以尝试减小`--swap-space`的值。 |
|
|
||
| > **最大序列数量** | ||
| - **参数:** `--max-num-seqs` | ||
| - **描述:** 控制服务可以处理的最大序列数量,支持1~256。 |
| --quantization wint4 & | ||
| ``` | ||
| 其中: | ||
| - `--quantization`: 量化策略,可选: |
| ### 2.1 基础:启动服务 | ||
| 通过下列命令启动服务 | ||
| ```bash | ||
| python -m fastdeploy.entrypoints.openai.api_server \ |
Collaborator
There was a problem hiding this comment.
deepseek需要添加部分环境变量,要给出来,
best_practices宗旨是用户可以简单快速的部署模型,并且拥有较好的性能和精度;
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Modifications
Usage or Command
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.