Overview
⭐️ Highlights
Add Tinker backend for users without GPUs to leverage Trinity-RFT. See example for more details.
Explorer
- Add Tinker
SamplingClientbackend for users without GPUs. - Support vLLM v0.12.0 (v0.10.2 ~ v0.11.0 are still supported).
- Add a tinker-compatible
sampleAPI to the vLLM backend. - Enhance
servemode for online RL. - Fix several bugs in the vLLM OpenAI API.
Trainer
- Add Tinker
TrainingClientbackend for users without GPUs. - Add a switch in
PPOPolicyLossFnto ignore explorer-generated logprobs.
Buffer
- Support staleness control, which mitigates the negative effects of excessively off-policy data.
- Add a Streamlit viewer to visualize the experience data.
Others
- Add benchmark comparisons with veRL and rLLM.
- Refactor registration system to avoid loading all modules during initialization.
- Add algorithms: SAPO, on-policy distillation.
- Enhance debug mode; add
--module viewerto visualize experience data generated during debugging. - Add SwanLab monitor.
- Add tutorial on aligning configuration with veRL.
- Add tutorial on choosing model context length based on GPU and model size.
- Optimize README and Sphinx docs.
🚨 Breaking Changes
- The schema of SQL experience buffer is changed. Experience data saved in previous version cannot be used.
- The registration system has been refactored. Developers no longer need to use
@REGISTRY.register_moduleto register modules. See Developer Guide for details. - Tinker requires Python >= 3.11. (For users who do not use tinker, Python 3.10 is still supported)
- vLLM 0.12.0 requires CUDA >= 12.9. (For users using vLLM 0.11.0 or lower, CUDA 12.8 is still supported)
- Refactor
SampleStrategy, addkwargsto its inputs, and change the output type fromExperiencestoList[Experience]. Experiences(notExperience) is going to be deprecated
What's Changed
- Fix docker build action by @pan-x-c in #415
- Add benchmark scripts for Guru-Math by @chenyushuo in #417
- Add bench results for frozenlake and alfworld by @hiyuchang in #416
- Add corrected kl with importance sampling by @garyzhang99 in #419
- Add SAPO algorithm by @garyzhang99 in #422
- Enhance debug mode by @pan-x-c in #421
- Update README with supported algorithm and news by @hiyuchang in #424
- Fix openai api history by @pan-x-c in #428
- Add Experience Viewer by @pan-x-c in #427
- Add sequence mask for grpo by @garyzhang99 in #420
- Fix the mismatch between vLLM OpenAI API and vLLM
generateby @pan-x-c in #431 - Add Guru-Math report. by @chenyushuo in #432
- Refactor and Merge several
compute_scorefunctions by @hiyuchang in #430 - Add scripts to search context length capacity on given settings. by @chenyushuo in #423
- [Doc] Align with verl by @hiyuchang in #433
- [Doc] add a new page with example list from the dataset perspective by @HYLcool in #434
- General Multi-turn FrozenLake by @pan-x-c in #429
- Fix sft warmup yaml by @hiyuchang in #435
- Update Benchmark doc and FAQ by @pan-x-c in #436
- Fix typo in doc and check selector by @hiyuchang in #437
- Add report for gsm8k alignment experiment. by @chenyushuo in #439
- Alfworld Concatenated Multi-turn RFT SFT format AND settings. by @kokolerk in #442
- BOTS reference evaluation results collection by @ShenQianli in #440
- add fallback_to_policy_gradient option by @binary-husky in #443
- Support vLLM v0.12.0 by @pan-x-c in #438
- Optimize registry by @hiyuchang in #441
- Add staleness control by @chenyushuo in #445
- Implement On-Policy Distillation by @garyzhang99 in #444
- Fix some typos by @hiyuchang in #447
- Explorer API collects feedback from agent applications by @pan-x-c in #295
- Check Explorer GPU Number by @hiyuchang in #453
- impl swanlab monitor by @binary-husky in #450
- Pre release 0.4.0 by @pan-x-c in #455
- improve ray stat by @binary-husky in #454
- Implement Tinker compatible sample API by @pan-x-c in #456
- Enhance AgentScope Workflow Adapter by @pan-x-c in #457
- Add GSPO-style REC variant by @yanxi-chen in #380
- Add tinker backend. by @chenyushuo in #448
- Release v0.4.0 by @pan-x-c in #459
New Contributors
- @kokolerk made their first contribution in #442
- @binary-husky made their first contribution in #443
Full Changelog: v0.3.3...v0.4.0