Introducing the ngrok AI gateway (early access)

•

Dec 15, 2025

•

585 words

•

Today we're opening early access to the ngrok AI gateway—a single endpoint that routes your AI requests to OpenAI, Anthropic, Google, and any other provider you want, with automatic failover when things go wrong.

The problem

You've probably been here: your app hammers OpenAI's API, hits a rate limit, and your users stare at spinning loaders while you scramble to add Anthropic as a backup. Or maybe you're burning through API credits on GPT-4o when a cheaper model would work fine. Or you're trying to route to your self-hosted Ollama instance during development but production runs against cloud providers.

All of these problems share the same root cause: your app talks directly to AI providers, and that means you're stuck writing routing, failover, and retry logic yourself.

A better approach

The AI gateway sits between your app and AI providers. Point your OpenAI SDK at your AI gateway, and we handle the rest:

import OpenAI from "openai";

const client = new OpenAI({
	baseURL: "https://your-gateway.ngrok.app/v1",
	apiKey: "your-api-key",
});

const response = await client.chat.completions.create({
	model: "gpt-4o",
	messages: [{ role: "user", content: "Hello!" }],
});

That's it. Your code stays the same. Behind the scenes, the gateway:

Receives your request
Selects which model and provider to use (based on your configuration)
Forwards the request with the appropriate provider API key
Retries with the next option if it fails
Returns the response

What you can do

Use any SDK. We're OpenAI-compatible, so if your SDK can set a baseURL, it works with us—that includes the official OpenAI SDK, Vercel AI SDK, LangChain, and pretty much everything else.

Failover across providers. Configure OpenAI as primary and Anthropic as backup. If OpenAI fails, we try Anthropic. You sleep better.

Rotate API keys. Use multiple API keys for the same provider. When one hits rate limits, we switch to another. Your app keeps running.

Define selection strategies. Control exactly how models get selected using CEL expressions. Prefer cheaper models? Lowest latency? Only models with tool-calling support? Write a one-liner:

model_selection:
  strategy:
    # First, prioritise low latency.
    - "ai.models.filter(m, m.metrics.global.latency.upstream_ms_avg < 1000)"
    # If that gets no results, prioritise low cost.
    - "ai.models.sortBy('price')"

Route to self-hosted models. Send requests to Ollama, vLLM, LM Studio, or any OpenAI-compatible endpoint. Mix cloud and local inference in the same configuration.

Modify content in transit. Redact PII before requests hit AI providers. Sanitize responses on the way back. Inject system prompts. All in your traffic policy.

What's in early access

We're shipping with:

OpenAI-compatible API for routing to multiple providers
Built-in support for OpenAI, Anthropic, Google, DeepSeek, OpenRouter, and more
Automatic failover across models, providers, and API keys
CEL-based model selection for custom routing logic
Custom provider support for self-hosted models
Token counting and usage metrics
Traffic Inspector for debugging requests
Log exporting to external systems

We're actively working on:

Dedicated AI gateway dashboard
Provider and retry tracking
Token usage visualization
Cost analytics

Getting started

Sign up for early access at ngrok.ai
Once approved, navigate to AI Gateways in the dashboard
Click + New AI Gateway
Enter a URL and configure your providers
Point your SDK at your new endpoint

Check out the quickstart guide for the full walkthrough.

What we'd love to hear

This is early access, which means we're building this with you. We want to know:

Which providers and models are you using?
What selection strategies would be most useful?
What observability features would actually help?
What's broken, confusing, or missing?

The feature set will evolve based on what you tell us. That's not marketing speak—we genuinely don't know exactly what this needs to become until we see how people use it.

Ready to tell us what you think? Find the ngrok.ai team on Discord.

Our AI gateway is available in early access today. Sign up at ngrok.ai and read the docs for everything else.

Command Palette

The problem

A better approach

What you can do

What's in early access

Getting started

What we'd love to hear

The problem

A better approach

What you can do

What's in early access

Getting started

What we'd love to hear