vllm-project

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

81.9k stars17.7k forks564 watchers1992 open issuesPythonApache-2.0

View on GitHub Visit homepage

TL;DR · 30-second scan

What it is

vllm (Python) — A high-throughput and memory-efficient inference and serving engine for LLMs

What it does for you

You're running open-source LLMs in production and need real throughput.

Best for

AI & ML

81.9k GitHub starsLicense: Apache-2.0Last updated 1 month ago

EDITOR'S DEEP TAKE

Production LLM serving. 10-24× faster than naive Hugging Face Transformers for batched inference. Used by Mistral, Together AI, and most enterprise LLM teams running open models in production. ~30k stars. The standard for self-hosted LLM inference at scale in 2026.

Use this if

You're running open-source LLMs in production and need real throughput.

Skip if

You're running a personal LLM on a laptop — Ollama is simpler.

Categories

AI & ML

Topics

gptllmpytorchmodel-servingtransformerllm-servinginferencellamaamdcudatpudeepseekqwenblackwelldeepseek-v3gpt-osskimimoeopenaiqwen3

Maintainer? Embed our badge

Add this badge to your README to show your project is curated on StackPicks. Free, lightweight (180×28 SVG), and gives your visitors a one-click way to see honest take + alternatives.

Preview

Markdown (for GitHub README)

[![Featured on StackPicks](https://stackpicks.dev/api/badge/vllm-project-vllm)](https://stackpicks.dev/repo/vllm-project-vllm)

HTML (for blogs / docs)

<a href="https://stackpicks.dev/repo/vllm-project-vllm"><img src="https://stackpicks.dev/api/badge/vllm-project-vllm" alt="Featured on StackPicks" width="180" height="28" /></a>

Are you the maintainer of vllm-project/vllm? Add the badge and we'll feature your project in the weekly curator newsletter.

Created 09 Feb 2023

Last push 1 month ago

Stats refreshed 1 month ago