Release v0.4.0 · modelscope/Trinity-RFT

Overview

⭐️ Highlights

Add Tinker backend for users without GPUs to leverage Trinity-RFT. See example for more details.

Explorer

Add Tinker SamplingClient backend for users without GPUs.
Support vLLM v0.12.0 (v0.10.2 ~ v0.11.0 are still supported).
Add a tinker-compatible sample API to the vLLM backend.
Enhance serve mode for online RL.
Fix several bugs in the vLLM OpenAI API.

Trainer

Add Tinker TrainingClient backend for users without GPUs.
Add a switch in PPOPolicyLossFn to ignore explorer-generated logprobs.

Buffer

Support staleness control, which mitigates the negative effects of excessively off-policy data.
Add a Streamlit viewer to visualize the experience data.

Others

Add benchmark comparisons with veRL and rLLM.
Refactor registration system to avoid loading all modules during initialization.
Add algorithms: SAPO, on-policy distillation.
Enhance debug mode; add --module viewer to visualize experience data generated during debugging.
Add SwanLab monitor.
Add tutorial on aligning configuration with veRL.
Add tutorial on choosing model context length based on GPU and model size.
Optimize README and Sphinx docs.

🚨 Breaking Changes

The schema of SQL experience buffer is changed. Experience data saved in previous version cannot be used.
The registration system has been refactored. Developers no longer need to use @REGISTRY.register_module to register modules. See Developer Guide for details.
Tinker requires Python >= 3.11. (For users who do not use tinker, Python 3.10 is still supported)
vLLM 0.12.0 requires CUDA >= 12.9. (For users using vLLM 0.11.0 or lower, CUDA 12.8 is still supported)
Refactor SampleStrategy, add kwargs to its inputs, and change the output type from Experiences to List[Experience].
Experiences (not Experience) is going to be deprecated

What's Changed

Fix docker build action by @pan-x-c in #415
Add benchmark scripts for Guru-Math by @chenyushuo in #417
Add bench results for frozenlake and alfworld by @hiyuchang in #416
Add corrected kl with importance sampling by @garyzhang99 in #419
Add SAPO algorithm by @garyzhang99 in #422
Enhance debug mode by @pan-x-c in #421
Update README with supported algorithm and news by @hiyuchang in #424
Fix openai api history by @pan-x-c in #428
Add Experience Viewer by @pan-x-c in #427
Add sequence mask for grpo by @garyzhang99 in #420
Fix the mismatch between vLLM OpenAI API and vLLM generate by @pan-x-c in #431
Add Guru-Math report. by @chenyushuo in #432
Refactor and Merge several compute_score functions by @hiyuchang in #430
Add scripts to search context length capacity on given settings. by @chenyushuo in #423
[Doc] Align with verl by @hiyuchang in #433
[Doc] add a new page with example list from the dataset perspective by @HYLcool in #434
General Multi-turn FrozenLake by @pan-x-c in #429
Fix sft warmup yaml by @hiyuchang in #435
Update Benchmark doc and FAQ by @pan-x-c in #436
Fix typo in doc and check selector by @hiyuchang in #437
Add report for gsm8k alignment experiment. by @chenyushuo in #439
Alfworld Concatenated Multi-turn RFT SFT format AND settings. by @kokolerk in #442
BOTS reference evaluation results collection by @ShenQianli in #440
add fallback_to_policy_gradient option by @binary-husky in #443
Support vLLM v0.12.0 by @pan-x-c in #438
Optimize registry by @hiyuchang in #441
Add staleness control by @chenyushuo in #445
Implement On-Policy Distillation by @garyzhang99 in #444
Fix some typos by @hiyuchang in #447
Explorer API collects feedback from agent applications by @pan-x-c in #295
Check Explorer GPU Number by @hiyuchang in #453
impl swanlab monitor by @binary-husky in #450
Pre release 0.4.0 by @pan-x-c in #455
improve ray stat by @binary-husky in #454
Implement Tinker compatible sample API by @pan-x-c in #456
Enhance AgentScope Workflow Adapter by @pan-x-c in #457
Add GSPO-style REC variant by @yanxi-chen in #380
Add tinker backend. by @chenyushuo in #448
Release v0.4.0 by @pan-x-c in #459

New Contributors

@kokolerk made their first contribution in #442
@binary-husky made their first contribution in #443

Full Changelog: v0.3.3...v0.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Overview

⭐️ Highlights

Explorer

Trainer

Buffer

Others

🚨 Breaking Changes

What's Changed

New Contributors

Contributors

Uh oh!