LiteLLM

What is LiteLLM?

LiteLLM is an AI gateway built by Berri AI, backed by Y Combinator, that provides model access, fallbacks, and spend tracking across 100+ LLMs. It uses the OpenAI-compatible format, so developers can switch between providers without rewriting code. The platform has served over 1 billion requests and logged 240 million Docker pulls, with 1,005+ contributors. It simplifies how platform teams give developers access to LLMs like OpenAI, Azure, Gemini, Bedrock, and Anthropic.

Application scenarios

Multi-provider LLM access
Give developers access to OpenAI, Azure, Gemini, Bedrock, and Anthropic models through a single gateway.
Cost tracking and chargebacks
Accurately charge teams for their LLM usage by attributing cost to keys, users, teams, or orgs.
Budget and rate limit management
Set budgets and rate limits (RPM/TPM) to control spending and prevent overuse.
LLM fallbacks
Automatically route requests to alternative models if the primary provider fails or is overloaded.
Observability and logging
Log spend to S3, GCS, or other storage, and integrate with observability tools like Langfuse, Arize Phoenix, Langsmith, and OpenTelemetry.
Prompt management
Manage and format prompts, including support for Hugging Face models.
Enterprise access control
Use JWT auth, SSO, and audit logs for secure, governed LLM access in large organizations.

Core Features

Spend tracking
Attribute cost to key/user/team/org with automatic tracking across OpenAI, Azure, Bedrock, GCP, and other providers, plus tag-based spend tracking.
Budgets and rate limits
Set per-key or per-team budgets and enforce RPM/TPM limits to control usage.
OpenAI-compatible API
All requests use the OpenAI format, so developers don't need to transform inputs or outputs across providers.
LLM fallbacks
Configure automatic fallbacks to alternative models if the primary provider is unavailable.
Virtual keys and teams
Create virtual API keys, manage teams, and assign budgets at scale.
LLM guardrails
Apply guardrails to filter or modify LLM outputs for safety and compliance.
Batch API support
Process multiple requests in batch for efficiency.
Pass-through endpoints
Forward requests directly to underlying providers when needed.
Prompt management
Format prompts for different models, including Hugging Face models, without manual transformation.
S3 logging
Log all spend and usage data to S3, GCS, or other cloud storage for auditing.

Target users

Platform teams and engineering leaders who need to give developers secure, cost-controlled access to multiple LLMs. Ideal for organizations scaling from a few developers to hundreds, especially those using Netflix, Lemonade, or similar high-volume environments. Also useful for DevOps, MLOps, and AI infrastructure engineers managing LLM governance.

How to use LiteLLM?

Deploy LiteLLM on-premises or use the cloud-hosted version. Developers interact with it via the OpenAI-compatible API, so they can call any supported model using familiar code. For self-hosted setups, follow the deployment docs on the official site. The platform includes a demo video to walk through setup and key features.

Pricing and free trial

The Open Source plan is free ($0) and includes 100+ LLM provider integrations, virtual keys, budgets, teams, load balancing, RPM/TPM limits, and LLM guardrails. The Enterprise plan offers cloud or self-hosted deployment, enterprise support with custom SLAs, JWT auth, SSO, and audit logs. Pricing for Enterprise is available via request, with a 30-day trial.

Effect review

LiteLLM is a practical, battle-tested gateway for teams juggling multiple LLM providers. The 1 billion+ requests served and positive testimonials from Netflix and Lemonade confirm it handles real production loads. The OpenAI-compatible format eliminates the friction of switching models, while granular cost tracking and budget controls give platform teams the visibility they need. For organizations already using multiple LLMs, LiteLLM removes a lot of operational overhead. The open-source tier is generous, and the enterprise plan adds the security and support large teams require. It’s a solid choice for any team that wants to standardize LLM access without vendor lock-in.

Frequently Asked Questions

What is LiteLLM?

LiteLLM is an LLM gateway by Berri AI that provides a unified OpenAI-format API to manage authentication, load balancing, and spend tracking across 100+ language models.

Which LLMs does LiteLLM support?

LiteLLM supports over 100 LLMs, including OpenAI, Anthropic, Cohere, Hugging Face, and many others, all accessible through a single endpoint.

How does LiteLLM handle load balancing?

LiteLLM automatically distributes requests across multiple models or providers based on configurable rules, ensuring high availability and optimal performance.

Can LiteLLM track API spending?

Yes, LiteLLM provides built-in spend tracking and logging, allowing you to monitor usage and costs across all models and users in real time.

Is LiteLLM compatible with existing OpenAI code?

Yes, LiteLLM uses the OpenAI format, so you can replace the base URL in your existing code with the LiteLLM endpoint without changing your application logic.

Does LiteLLM offer authentication management?

Yes, LiteLLM includes authentication management features such as API key validation, user-level access control, and rate limiting to secure your LLM usage.

What is LiteLLM?

Application scenarios

Core Features

Target users

How to use LiteLLM?

Pricing and free trial

Effect review

Frequently Asked Questions

LiteLLM - AI Tool Detail