stackpicks.dev
vllm-project

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

80.7k stars17.1k forks550 watchers2003 open issuesPythonApache-2.0

TL;DR · 30-second scan

What it is

vllm (Python)A high-throughput and memory-efficient inference and serving engine for LLMs

What it does for you

You're running open-source LLMs in production and need real throughput.

Best for

AI & ML

80.7k GitHub starsLicense: Apache-2.0Last updated 1 hour ago
EDITOR'S DEEP TAKE

Production LLM serving. 10-24× faster than naive Hugging Face Transformers for batched inference. Used by Mistral, Together AI, and most enterprise LLM teams running open models in production. ~30k stars. The standard for self-hosted LLM inference at scale in 2026.

Use this if

You're running open-source LLMs in production and need real throughput.

Skip if

You're running a personal LLM on a laptop — Ollama is simpler.

Categories
Topics
gptllmpytorchmodel-servingtransformerllm-servinginferencellamaamdcudatpudeepseekqwenblackwelldeepseek-v3gpt-osskimimoeopenaiqwen3
Maintainer? Embed our badge

Add this badge to your README to show your project is curated on StackPicks. Free, lightweight (180×28 SVG), and gives your visitors a one-click way to see honest take + alternatives.

Preview
Featured on StackPicks
Markdown (for GitHub README)
[![Featured on StackPicks](https://stackpicks.dev/api/badge/vllm-project-vllm)](https://stackpicks.dev/repo/vllm-project-vllm)
HTML (for blogs / docs)
<a href="https://stackpicks.dev/repo/vllm-project-vllm"><img src="https://stackpicks.dev/api/badge/vllm-project-vllm" alt="Featured on StackPicks" width="180" height="28" /></a>

Are you the maintainer of vllm-project/vllm? Add the badge and we'll feature your project in the next weekly newsletter (~2,000 builders).

Created 09 Feb 2023
Last push 1 hour ago
Stats refreshed 1 hour ago