LiteLLM vs vLLM

Self-host pick — both replace OpenAI API (LLM inference API).

Both LiteLLM and vLLM self-host as a replacement for OpenAI API (LLM inference API). Pick LiteLLM if you want the lighter footprint — 15min docker-compose (proxy + Postgres for usage logs), $5 vps for the proxy itself; the underlying model server (ollama / vllm / openai passthrough) is the real cost line. Pick vLLM if you need production inference at scale — vLLM's continuous batching is what you want when 10+ concurrent users hit the endpoint — 30min docker run with --gpus and $200-1500/mo depending on gpu class; an a100 80gb runs llama 3.

	LiteLLMopen-source	vLLMopen-source
License	`MIT`	`Apache-2.0`
Setup time	15min docker-compose (proxy + Postgres for usage logs)	30min docker run with --gpus
Monthly cost	$5 VPS for the proxy itself; the underlying model server (Ollama / vLLM / OpenAI passthrough) is the real cost line.	$200-1500/mo depending on GPU class; an A100 80GB runs Llama 3.1 70B comfortably with PagedAttention batching.
GitHub	BerriAI/litellm ★ 51.9k · last commit todayalive	vllm-project/vllm ★ 84.7k · last commit todayalive
Replaces	OpenAI API	OpenAI API

Good fit for

LiteLLM

Teams that want one OpenAI-shaped endpoint in front of many backends (mix of self-hosted + hosted Anthropic + hosted OpenAI for fallback).

Weak at:Not a model server itself — you still need Ollama/vLLM/cloud APIs behind it; LiteLLM is glue, not GPU.

vLLM

Production inference at scale — vLLM's continuous batching is what you want when 10+ concurrent users hit the endpoint.

Weak at:Single-GPU model fit — large models (70B+) need multi-GPU tensor parallelism and careful VRAM budgeting.

In a terminal? npx -y github:SolvoHQ/os-alt-cli openai-api prints OpenAI API's self-host options including both — how the CLI works →

FAQ

Which is easier to self-host, LiteLLM or vLLM?

LiteLLM: 15min docker-compose (proxy + Postgres for usage logs). vLLM: 30min docker run with --gpus.

What does each cost to run?

LiteLLM: $5 VPS for the proxy itself; the underlying model server (Ollama / vLLM / OpenAI passthrough) is the real cost line.. vLLM: $200-1500/mo depending on GPU class; an A100 80GB runs Llama 3.1 70B comfortably with PagedAttention batching.. Both projects are free and open source.

Do LiteLLM and vLLM replace the same SaaS?

Yes — both are open-source alternatives to OpenAI API.