Why didn't ngrok go down in last week's AWS outage? | ngrok blog

October 29, 2025

If you're like me and rise early to greet your Monday with some scrolling, you might have found your dreams dashed last week when you noticed many of your feeds and socials were down.

On Monday, October 20, Amazon Web Services experienced a major outage in their us-east-1 region.

Amazon's us-east-1 sits just outside of Washington, D.C., and is the busiest internet traffic hub in the world. This isn't just an Amazon thing, it's true for the other major cloud providers as well who often make it the place to locate their highest-volume Points of Presence (PoP).

Stakes in this region are incredibly high, and that was proven on Monday when a large portion of the internet you rely on went down due to a cascading DNS issue (cue the "It's always DNS!" memes).

It's no secret that we're built entirely on AWS, but if you're an ngrok customer you might not have noticed any disruption at all, and this was due to some very intentional design decisions by the ngrok infrastructure team to avoid these types of scenarios.

Just don't use us-east-1

The simplest way to not be caught off-guard when us-east-1 falls over is to not use us-east-1 at all.

Why? Beyond the official AWS incidents, us-east-1 exerts tremendous gravity on the global internet infrastructure ecosystem. Because of the scale of traffic in-region, services built primarily in us-east-1 can fail and trigger downstream impact.

Because ngrok becomes a critical part of your network topology, we choose not to be in us-east-1, opting instead for us-east-2.

Case in point: A decade of us-east-1 incidents

While major outages like the one we experienced last week might be rare even for a provider like AWS, these events can and do happen, and they can severely degrade your services when they occur. A quick look over the past couple of years shows several long outages totaling dozens of hours for the world's highest-volume region—yikes!

Date	Cause	Notes
April 29, 2011 (report)	Amazon Elastic Block Store (EBS) service issue that caused issues with EBS reads and writes in us-east-1	Amazon EC2 and Amazon RDS service disruption in the US East region
July 2, 2012 (report)	Severe weather caused power outages with no generator failover or redundancy, leading to cascading hardware failure in us-east-1	Severe weather event causing widespread power outage in the region
October 22, 2012 (report)	Amazon Elastic Block Store (EBS) operational bug introduced a DNS record propagation issue in us-east-1	Major EBS service availability outage in us-east-1
December 24, 2012 (report)	Amazon Elastic Load Balancer (ELB) state data was erroneously deleted in us-east-1	Multi-hour outage that impacted ELB APIs, behavior, and in some cases returned erroneous API responses
June 13, 2014 (report)	Amazon SimpleDB service disruption due to power outage in us-east-1	Power outage and service disruption lasting more than two hours
September 20, 2015 (report)	Network service disruption of Amazon DynamoDB's storage service	Multi-hour disruption impacting additional AWS services including dashboard availability
February 28, 2017 (report)	Major Amazon S3 outage in us-east-1	A major Amazon service disruption that caused multiple service failures across AWS
November 25, 2020 (report)	Amazon Kinesis internal capacity issue cascaded to multiple services	Broad impact across CloudWatch, Cognito, Lambda, and more
December 10, 2021 (report)	Network device impairment caused control-plane congestion in us-east-1	Multi-hour outage affecting many services and the AWS console
June 13, 2023 (report)	Lambda service event (capacity and health management issue) causing elevated errors and latency	Region-wide Lambda impact and knock-on effects
October 19, 2025 (report)	DNS and internal network issue in us-east-1 blocked access to DynamoDB APIs; originated within the EC2 internal network	Major, cross-industry disruption with staged recovery

Our original thesis has played out time and time again: The gravity of us-east-1 as the central point for so much critical infrastructure makes it far more likely... to be a major point of failure.

What if we did use us-east-1?

The short answer is that we automatically reroute your traffic away from an affected PoP to the next-nearest location by way of DNS-based load balancing. When we detect a service disruption at an ngrok PoP, we remove it from DNS resolution so your clients never hit the affected region.

An architecture diagram showing a client's request being routed from one PoP to another

At a high-level, ngrok consists of two parts: the control plane and the data plane.

The control plane provides an orchestration layer for controlling your account and your traffic configurations and consists of our dashboard and the ngrok API. The data plane consists of ngrok's regional PoPs that receive and process your traffic and also retains a local copy of your ngrok account's configuration.

The data plane provides a connectivity endpoint for the ngrok Agent, SDK, Docker container, and Kubernetes Operator. For every service ngrok provides, such as Traffic Policy actions, there is parity in every PoP. This means every PoP processes traffic in-region, and does the same work to compute features like Traffic Policy actions, without needing to forward traffic to an external service. This reduces network latency and increases service redundancy in the event of a failover event.

These decisions mean that if we did use us-east-1, and it went down, we'd not only remove it from DNS resolution quickly so traffic keeps flowing, but also make sure you're running all the same routing and transformations wherever that next-closest PoP happens to be.

Great, so what can you do next?

If you were impacted by last week's events, you have my sympathies and condolences—don't lose heart. Many folks have spent countless hours getting services back up and running, and I'd love to hear your stories (the good, the bad, and the ugly) over at @ngrokHQ.

On a more personal note—whether your uptime priorities have been on the backlog for a while, are brand new goals, or if you're just looking for more than what us-east-1 can possible offer, reach out to me at pjy@ngrok.com.

Just don't use us-east-1Copy link to clipboard

Case in point: A decade of us-east-1 incidentsCopy link to clipboard

What if we did use us-east-1?Copy link to clipboard

Great, so what can you do next?Copy link to clipboard

Just don't use us-east-1Copy link to clipboard

Case in point: A decade of us-east-1 incidentsCopy link to clipboard

What if we did use us-east-1?Copy link to clipboard

Great, so what can you do next?Copy link to clipboard

Just don't use us-east-1

Case in point: A decade of us-east-1 incidents

What if we did use us-east-1?

Great, so what can you do next?

Just don't use us-east-1

Case in point: A decade of us-east-1 incidents

What if we did use us-east-1?

Great, so what can you do next?