How to Choose an AI Gateway: 5 Solutions Compared (2026)
Hands-on comparison of LLM Gateway, Bifrost, LiteLLM, One API, and OpenRouter — covering deployment, latency, guardrails, pricing, and Claude Code integration to help you pick the right AI gateway.
Use case: solo developer, routing Claude Code / custom backends, wants to save money, stay secure, and avoid ops overhead.
As LLM API calls pile up, an AI gateway (also called an LLM gateway or LLM proxy) has gone from “nice to have” to “you really need one.” It unifies different provider API formats, manages keys and quotas, controls costs, and adds a security layer in between.
But there are a lot of options — open source, hosted, self-managed, each with its own tradeoffs. I tested five mainstream solutions: LLM Gateway, Bifrost, LiteLLM, One API, and OpenRouter, and hit plenty of rough edges along the way. Here’s what I learned, so you can skip the trial and error.
TL;DR — Quick Pick
| Your situation | Best pick | Why |
|---|---|---|
| Want it running in 5 minutes, no infra | LLM Gateway (hosted) | Sign up, set 3 env vars, done |
| Need speed, security, and self-hosting | Bifrost | Go binary, 11μs overhead, best guardrails |
| Team with IaC workflows, widest coverage | LiteLLM | 100+ providers, YAML config as code |
| Managing multi-user API distribution | One API / New API | Mature token & quota system, 30+ providers |
| Try many models quickly, benchmarking | OpenRouter | 200+ models, one key, pay-as-you-go |
My path: started with LLM Gateway hosted → moved to self-hosted Bifrost when I needed more. Details below.
Why You Need an AI Gateway
Calling OpenAI or Anthropic’s API directly works fine at first. But as your setup gets more complex, pain points stack up fast:
Vendor lock-in. Your code is hardwired to the Anthropic SDK. Switching to Gemini or DeepSeek means rewriting a bunch of calls. A gateway sits in the middle doing protocol translation — your app talks to one unified endpoint, and swapping providers is just a config change.
Key management and cost control. Handing out API keys to team members with no way to cap usage or track spend. A gateway’s virtual keys and quota system let you see exactly how much each person or project is burning.
Data security and compliance. Your prompts and responses go straight to the provider. If you need PII redaction, audit logs, or policy enforcement in between, the gateway is that interception point.
Availability and failover. One provider goes down, everything stops. A gateway lets you configure automatic fallback — when provider A is unreachable, requests route to provider B.
For me, the immediate trigger was connecting Claude Code — I needed a middle layer to manage requests, control costs, and keep my real API keys from being exposed.
5 AI Gateway Solutions, Hands-On
I’ll go through each in the order I tried them, covering real experience, who it’s for, and where the gotchas are.
1. LLM Gateway (llmgateway.io) — Easiest Start, Hosted and Ready
This was the first one I tried, for a simple reason: it has a hosted version, no deployment needed.
LLM Gateway is a TypeScript open-source project (AGPL license), around 1.4K GitHub stars. Its biggest selling point is that BYOK (Bring Your Own Key) mode is completely free — you plug in your own provider key, requests route through their hosted service, and there’s zero platform fee. If you use their Credits model instead, the platform takes 5%.
Claude Code in three lines:
export ANTHROPIC_BASE_URL=https://api.llmgateway.io
export ANTHROPIC_AUTH_TOKEN=your-key
claude
The dashboard is well-designed, with cost breakdowns and latency analytics — handy if you want to keep an eye on spending. Adding a custom provider is straightforward: in the dashboard, add a Custom Provider with a lowercase name, base URL, and API token. The catch is it only accepts OpenAI-compatible custom backends. If your backend uses a different protocol, you’re out of luck.
On data privacy, they explicitly state they “only use your data for routing, never log prompt or response content” (metadata only by default), and won’t use your data for model training. But your data still passes through their servers, which isn’t ideal for privacy-sensitive workloads.
Best for: Individual developers who want to get started fast without managing infrastructure. Use the hosted version to validate the concept, then consider self-hosting later.
Gotchas:
- Guardrails are basic — PII redaction and advanced policy enforcement lag behind other options
- Custom backends limited to OpenAI-compatible format only
- AGPL license restricts commercial forks
2. Bifrost — When You Need Speed, Security, and Strong Guardrails
After experiencing LLM Gateway’s hosted convenience, I wanted a self-hosted option. My core requirements were performance and security guardrails. That’s when I found Bifrost.
Bifrost is an open-source Go project by Maxim AI (Apache 2.0 license), around 6.1K GitHub stars. It compiles to a single Go binary with no Python dependency tree — deployment is literally one file. Their benchmark claims ~11 microseconds of overhead at 5,000 RPS — a number that looks almost unbelievable compared to other solutions, and I’ll get to that.
Claude Code setup:
export ANTHROPIC_API_KEY=dummy-key
export ANTHROPIC_BASE_URL=http://localhost:8080/anthropic
claude
It provides Anthropic, OpenAI, and Gemini-compatible endpoints simultaneously, with automatic protocol translation. That means your app can send requests in OpenAI format, and Bifrost auto-translates them to Anthropic format before forwarding to Claude.
Guardrails are where Bifrost really shines: hierarchical spending limits, real-time policy violation detection, and virtual credentials that can be scoped down to “which key can use which provider’s which model.” It also has MCP support — functioning as both MCP client and server, with a deny-by-default allowlist for external tool access.
Semantic caching is worth mentioning too: instead of simple exact-match caching, it uses vector similarity matching. Ask a similar question and it returns the cached result, which saves a meaningful amount of tokens in practice.
Best for: Developers and teams that are latency-sensitive, need strong guardrails and audit capabilities, or must keep data on their own servers.
Gotchas:
- No hosted version — you must self-host (Docker or bare binary)
- Advanced enterprise features (SSO, RBAC, advanced auditing) require a commercial license
- Relatively young project (launched late 2024), community resources and docs still maturing
3. LiteLLM — Biggest Ecosystem, but Config-Heavy
LiteLLM is probably the most well-known LLM gateway, with over 52K GitHub stars and nearly 100 million monthly PyPI downloads. It supports 100+ providers, making it the broadest open-source option by coverage.
But honestly, I couldn’t get into it. The reason is personal: it requires YAML config files to define model routing, load balancing, fallback strategies, and more. For teams used to infrastructure-as-code, this is natural. But for a solo developer like me who just wanted to get something running, the learning curve and setup time were steep.
# Config example (config.yaml)
model_list:
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-20250514
api_key: os.environ/ANTHROPIC_KEY
- model_name: gpt-4
litellm_params:
model: openai/gpt-4
api_key: os.environ/OPENAI_KEY
general_settings:
master_key: sk-my-master-key
Start with litellm --config config.yaml, Python environment, quite a few dependencies. It provides OpenAI-compatible /v1/chat/completions and Anthropic-compatible /v1/messages endpoints, with virtual keys, budget management, and fallback chains.
Performance is a real issue. Third-party benchmarks (Helicone) show it adds 50ms+ of latency per request. The Python (FastAPI/Uvicorn) architecture doesn’t hold up as well under high concurrency compared to Go-based alternatives. LiteLLM’s own claim of 8ms P95 overhead is measured under ideal conditions.
Something else worth flagging: In March 2026, LiteLLM’s PyPI packages were hit by a supply chain attack — malicious code was injected into published releases, potentially exposing cloud credentials and private keys. Andrej Karpathy publicly warned about it, and VentureBeat covered the story. The issue has been fixed, but it deepened my awareness of Python dependency chain risks — and it’s one reason I later gravitated toward Go-based solutions like Bifrost.
Best for: Teams that need the widest provider coverage, are comfortable with YAML configuration, and have ops capacity.
Gotchas:
- YAML config learning curve — expect 15–30 minutes of setup for newcomers
- Long Python dependency chain, higher supply chain security risk
- Noticeably higher latency overhead than Go-based alternatives
- No GUI admin panel in the open-source version
4. One API — The Most Popular Option in Asian Developer Communities
If you search for “API relay” or “API gateway” in Asian developer communities, you’ll almost certainly encounter One API. With around 35K GitHub stars, written in Go + React, MIT licensed, it’s the most widely deployed open-source gateway in that ecosystem and has spawned an entire family of forks (notably New API).
One API’s core strength is deep optimization for the Asian market: it supports Baidu ERNIE, Alibaba Qwen, iFlytek Spark, Zhipu ChatGLM, Moonshot, Baichuan, MiniMax, and other regional models, alongside OpenAI, Anthropic, and Gemini — 30+ providers total.
It has a complete token management system: redemption codes, referral rebates, usage-based billing, IP whitelisting. Many people use it to run “API relay stations” — essentially API proxy distribution businesses.
Deployment is lightweight, one Docker command:
docker run -d -p 3000:3000 justsong/one-api
Uses SQLite by default for zero-dependency startup. For production, switch to MySQL + Redis.
Claude Code setup:
export ANTHROPIC_BASE_URL=http://your-oneapi-address:3000
export ANTHROPIC_AUTH_TOKEN=token-from-oneapi
claude
But I didn’t choose it, for three reasons:
First, high-concurrency performance drops noticeably. Community feedback shows SQLite has severe write-locking under concurrent load — you have to switch to MySQL, but even then it’s not as fast as pure Go solutions like Bifrost.
Second, database upgrades are fragile. Schema changes between versions can be incompatible, migration docs are thin, and some people lost data after upgrading.
Third, there’s essentially one maintainer (JustSong), 900+ open issues, and an inconsistent release cadence. The latest version, v0.6.10, was released in February 2025 — over a year ago.
Also, One API doesn’t support Midjourney (which is why the New API fork exists), and it lacks observability and log export features.
Best for: Developers managing multi-user API distribution, especially those working with Asian LLM providers.
Gotchas:
- High-concurrency performance requires dedicated tuning
- Version upgrades can break database schemas
- Single maintainer, 900+ open issues
- Default password is root/123456 — a security red flag
- No observability or log auditing
5. OpenRouter — Most Models, but Not a Traditional Gateway
Strictly speaking, OpenRouter isn’t a gateway — it’s an AI model marketplace and API aggregator. It provides a unified API to access 200+ models (both frontier and open-source), one key for everything, zero deployment required.
In 2025, it closed a $113 million Series B round with participation from NVIDIA and Google, at a $1.3 billion valuation. It processes roughly 100 trillion tokens per month, which is substantial.
Claude Code integration (via third-party Claude Code Router):
# Install claude-code-router first
npm install -g @musistudio/claude-code-router
# Configure with your OpenRouter key
Pricing is markup-based: 5–15% on top of provider rates. Fine for light usage, but if you’re spending hundreds per month, the markup adds up fast. There are plenty of free models too (DeepSeek R1, Llama 4 Maverick, Qwen 3 235B, etc.), though free-tier accounts are now limited to 50 requests per day (cut from 200 in 2026).
The biggest issue is data. All requests and responses pass through OpenRouter’s servers. They offer a Zero Data Retention (ZDR) option, but for sensitive codebases or proprietary data, an extra hop means extra risk. It’s a US-based company, so GDPR compliance requires additional attention.
Best for: Rapid prototyping, model benchmarking, or anyone who wants to try many models with a single key.
Gotchas:
- Not a real gateway — no self-hosting, no custom routing policies
- 5–15% markup gets expensive at high volume
- Data passes through third-party servers
- Free tier significantly reduced
- Can’t connect custom backends or private models
Side-by-Side Comparison
All five at a glance:
| Dimension | LLM Gateway | Bifrost | LiteLLM | One API | OpenRouter |
|---|---|---|---|---|---|
| Pricing | Free (BYOK) / 5% (Credits) | Free (Apache 2.0) | Free (MIT) | Free (MIT) | 5–15% markup |
| Deployment | Hosted / Self-hosted | Self-hosted only | Self-hosted only | Self-hosted only | SaaS only |
| Setup difficulty | Lowest | Low | Medium (YAML) | Low | Lowest |
| GitHub stars | ~1.4K | ~6.1K | ~52K | ~35K | N/A |
| Language | TypeScript | Go | Python | Go | N/A |
| Latency overhead | Moderate | ~11μs (negligible) | 50ms+ | Moderate | ~25ms |
| Provider coverage | Many | 23+ providers | 100+ | 30+ | 200+ models |
| Claude Code support | Native | Native | Native | Supported | Third-party tool |
| Custom backends | OpenAI-compatible only | Yes (protocol translation) | Yes | Yes | No |
| Guardrails | Basic | Strong (tiered limits + MCP) | Available | Weak | None |
| Observability | Dashboard analytics | Prometheus / OTEL | Third-party integrations | None | Basic |
| Data stays local | Not on hosted tier | Yes | Yes | Yes | No |
My Selection Path
Looking back, my journey went something like this:
Step one: LLM Gateway hosted. Got Claude Code running in three minutes. Validated the concept and value of a gateway. Great for anyone just getting started.
Step two: Self-hosted Bifrost. When I needed stronger performance, data sovereignty, and MCP tool governance, I switched to Bifrost. Deploying a Go binary was simpler than I expected — pull the Docker image, set environment variables, done. 11 microseconds of overhead is effectively zero.
Paths I wouldn’t recommend:
- If you don’t want to write config files and lack Python ops experience, LiteLLM’s learning curve may not be worth it. But if your team already has IaC workflows, it’s the most feature-rich option.
- One API has unique strengths for API distribution scenarios, but as a production-grade gateway, database stability and single-maintainer risk are real concerns.
- OpenRouter is better suited for model evaluation and quick prototyping, not as a long-term gateway.
Security and Privacy: The Baseline That’s Easy to Overlook
Choosing a gateway isn’t just about features and performance — data security is the baseline. A few recommendations:
Self-host when possible. Data never passes through third-party servers — the strongest privacy guarantee. Both Bifrost and LiteLLM support fully self-hosted deployment.
Check the data policy and training clauses. Some services include language like “may use your data to improve the service.” Read before deciding. LLM Gateway explicitly states they don’t use your data for model training, which is a plus.
Change default passwords. One API ships with root/123456. That’s not a suggestion to change it — it’s mandatory.
Supply chain security. The LiteLLM incident is a reminder: Python projects have long dependency chains and larger attack surfaces. Go-compiled single-binary solutions (Bifrost, One API) are inherently safer in this regard.
HTTPS and authentication. Regardless of which solution you choose, always enable HTTPS in production, set strong passwords, and restrict IP access. Running bare HTTP with default credentials in production is asking for trouble.
Wrapping Up
| Your situation | Recommended solution |
|---|---|
| Want it running fast, no fuss | LLM Gateway hosted |
| Need performance, security, self-hosted | Bifrost |
| Team with IaC workflows, widest coverage needed | LiteLLM |
| Multi-user API distribution | One API / New API |
| Try many models, benchmarking | OpenRouter |
There’s no one-size-fits-all solution — only the one that fits your current stage. My recommendation is to start with LLM Gateway’s hosted tier, get things working, then migrate to a self-hosted solution as your needs grow. If budget and privacy are both priorities, Bifrost is the best overall choice right now.
FAQ
What’s the difference between an AI gateway and an API relay/proxy service?
An AI gateway is infrastructure you deploy yourself to manage LLM API requests across multiple providers, with data flowing through your own servers. An API relay or proxy service is typically run by a third party — you use their endpoint and keys, and your data passes through their infrastructure. Security depends entirely on the operator’s trustworthiness. Self-hosted gateways (Bifrost, LiteLLM) are significantly more secure than third-party relay services.
Which AI gateway is the most cost-effective for individual developers?
For personal use with Claude Code or ChatGPT API, LLM Gateway’s BYOK mode is completely free with no server deployment needed. To reduce API costs, watch for free tiers and promotional credits from providers. If usage grows, self-hosting Bifrost with semantic caching can meaningfully cut token spend on repeated requests.
Is the LiteLLM supply chain attack a dealbreaker?
The March 2026 supply chain attack was a serious incident affecting PyPI packages. If you’re already using LiteLLM, check whether your installed version was affected and pin to a verified safe version. LiteLLM itself is powerful and has an active community, but if you’re starting a new project, consider Go-based alternatives like Bifrost to sidestep Python dependency chain risks entirely.
Do I need an AI gateway to use Claude Code?
No. Claude Code can connect directly to Anthropic’s API. But a gateway adds benefits: unified key management across providers, automatic failover (switch to a backup when the primary is down), cost and usage controls, and keeping your real API keys from being exposed. If you’re using a single provider and don’t need these features, direct connection works fine.
What server specs do I need to self-host an AI gateway?
Most open-source gateways have modest requirements. Bifrost is a single Go binary — a 1-core, 1GB VPS handles it fine. One API with SQLite is nearly zero-dependency. LiteLLM, being Python-based, is a bit heavier — 2 cores and 2GB RAM is a reasonable minimum. For personal use, any low-spec VPS will do.
Comments
Join the conversation
A quiet margin for reflections after reading.
No comments yet. Be the first to leave a thought.