Why didn't ngrok go down in last week's AWS outage? | ngrok blog
By Peter Yoakum
October 29, 2025
By Peter Yoakum
October 29, 2025
If you're like me and rise early to greet your Monday with some scrolling, you might have found your dreams dashed last week when you noticed many of your feeds and socials were down.
On Monday, October 20, Amazon Web Services experienced a major outage in their us-east-1 region.
Amazon's us-east-1 sits just outside of Washington, D.C., and is the busiest internet traffic hub in the world. This isn't just an Amazon thing, it's true for the other major cloud providers as well who often make it the place to locate their highest-volume Points of Presence (PoP).
Stakes in this region are incredibly high, and that was proven on Monday when a large portion of the internet you rely on went down due to a cascading DNS issue (cue the "It's always DNS!" memes).
It's no secret that we're built entirely on AWS, but if you're an ngrok customer you might not have noticed any disruption at all, and this was due to some very intentional design decisions by the ngrok infrastructure team to avoid these types of scenarios.
The simplest way to not be caught off-guard when us-east-1 falls over is to not use us-east-1 at all.
Why? Beyond the official AWS incidents, us-east-1 exerts tremendous gravity on the global internet infrastructure ecosystem. Because of the scale of traffic in-region, services built primarily in us-east-1 can fail and trigger downstream impact.
Because ngrok becomes a critical part of your network topology, we choose not to be in us-east-1, opting instead for us-east-2.
While major outages like the one we experienced last week might be rare even for a provider like AWS, these events can and do happen, and they can severely degrade your services when they occur. A quick look over the past couple of years shows several long outages totaling dozens of hours for the world's highest-volume region—yikes!
Our original thesis has played out time and time again: The gravity of us-east-1 as the central point for so much critical infrastructure makes it far more likely... to be a major point of failure.
The short answer is that we automatically reroute your traffic away from an affected PoP to the next-nearest location by way of DNS-based load balancing. When we detect a service disruption at an ngrok PoP, we remove it from DNS resolution so your clients never hit the affected region.

At a high-level, ngrok consists of two parts: the control plane and the data plane.
The control plane provides an orchestration layer for controlling your account and your traffic configurations and consists of our dashboard and the ngrok API. The data plane consists of ngrok's regional PoPs that receive and process your traffic and also retains a local copy of your ngrok account's configuration.
The data plane provides a connectivity endpoint for the ngrok Agent, SDK, Docker container, and Kubernetes Operator. For every service ngrok provides, such as Traffic Policy actions, there is parity in every PoP. This means every PoP processes traffic in-region, and does the same work to compute features like Traffic Policy actions, without needing to forward traffic to an external service. This reduces network latency and increases service redundancy in the event of a failover event.
These decisions mean that if we did use us-east-1, and it went down, we'd not only remove it from DNS resolution quickly so traffic keeps flowing, but also make sure you're running all the same routing and transformations wherever that next-closest PoP happens to be.
If you were impacted by last week's events, you have my sympathies and condolences—don't lose heart. Many folks have spent countless hours getting services back up and running, and I'd love to hear your stories (the good, the bad, and the ugly) over at @ngrokHQ.
On a more personal note—whether your uptime priorities have been on the backlog for a while, are brand new goals, or if you're just looking for more than what us-east-1 can possible offer, reach out to me at pjy@ngrok.com.