We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 66.1k 12.2k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.5k 335
Common recipes to run vLLM
Jupyter Notebook 303 111
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python 174 22
Intelligent Router for Mixture-of-Models
Go 2.6k 363
Community maintained hardware plugin for vLLM on Ascend
A framework for efficient model inference with omni-modality models
Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
The vLLM XPU kernels for Intel GPU
TPU inference for vLLM, with unified JAX and PyTorch support.
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Community maintained hardware plugin for vLLM on Intel Gaudi
vLLM Daily Summarization of Merged PRs