vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
TL;DR · 30-second scan
vllm (Python) — A high-throughput and memory-efficient inference and serving engine for LLMs
You're running open-source LLMs in production and need real throughput.
AI & ML
Production LLM serving. 10-24× faster than naive Hugging Face Transformers for batched inference. Used by Mistral, Together AI, and most enterprise LLM teams running open models in production. ~30k stars. The standard for self-hosted LLM inference at scale in 2026.
You're running open-source LLMs in production and need real throughput.
You're running a personal LLM on a laptop — Ollama is simpler.
Add this badge to your README to show your project is curated on StackPicks. Free, lightweight (180×28 SVG), and gives your visitors a one-click way to see honest take + alternatives.
[](https://stackpicks.dev/repo/vllm-project-vllm)
<a href="https://stackpicks.dev/repo/vllm-project-vllm"><img src="https://stackpicks.dev/api/badge/vllm-project-vllm" alt="Featured on StackPicks" width="180" height="28" /></a>
Are you the maintainer of vllm-project/vllm? Add the badge and we'll feature your project in the next weekly newsletter (~2,000 builders).