Today we're opening early access to the ngrok AI gateway—a single endpoint that routes your AI requests to OpenAI, Anthropic, Google, and any other provider you want, with automatic failover when things go wrong.
You've probably been here: your app hammers OpenAI's API, hits a rate limit, and your users stare at spinning loaders while you scramble to add Anthropic as a backup. Or maybe you're burning through API credits on GPT-4o when a cheaper model would work fine. Or you're trying to route to your self-hosted Ollama instance during development but production runs against cloud providers.
All of these problems share the same root cause: your app talks directly to AI providers, and that means you're stuck writing routing, failover, and retry logic yourself.
The AI gateway sits between your app and AI providers. Point your OpenAI SDK at your AI gateway, and we handle the rest:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://your-gateway.ngrok.app/v1",
apiKey: "your-api-key",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});That's it. Your code stays the same. Behind the scenes, the gateway:
Use any SDK. We're OpenAI-compatible, so if your SDK can set a baseURL, it
works with us—that includes the official OpenAI
SDK, Vercel AI
SDK,
LangChain,
and pretty much everything else.
Failover across providers. Configure OpenAI as primary and Anthropic as backup. If OpenAI fails, we try Anthropic. You sleep better.
Rotate API keys. Use multiple API keys for the same provider. When one hits rate limits, we switch to another. Your app keeps running.
Define selection strategies. Control exactly how models get selected using CEL expressions. Prefer cheaper models? Lowest latency? Only models with tool-calling support? Write a one-liner:
model_selection:
strategy:
# First, prioritise low latency.
- "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 1000)"
# If that gets no results, prioritise low cost.
- "ai.models.sortBy('price')"Route to self-hosted models. Send requests to Ollama, vLLM, LM Studio, or any OpenAI-compatible endpoint. Mix cloud and local inference in the same configuration.
Modify content in transit. Redact PII before requests hit AI providers. Sanitize responses on the way back. Inject system prompts. All in your traffic policy.
We're shipping with:
We're actively working on:
Check out the quickstart guide for the full walkthrough.
This is early access, which means we're building this with you. We want to know:
The feature set will evolve based on what you tell us. That's not marketing speak—we genuinely don't know exactly what this needs to become until we see how people use it.
Ready to tell us what you think? Find the ngrok.ai team on Discord.
Our AI gateway is available in early access today. Sign up at ngrok.ai and read the docs for everything else.