Amazon’s Outage Root Cause, $581M Loss Potential And ‘Apology:’ 5 AWS Outage Takeaways

From the root cause of Amazon’s outage to its potential $581 million cost, CRN breaks down the five important results and findings from AWS’ new post-mortem report. ‘We will do everything we can to learn from this event and use it to improve our availability even further,’ AWS says.

Amazon’s outage that affected thousands of companies and millions of people was caused by two automated systems updating the same data simultaneously, leading to a DNS (Domain Name System) issue that brought down AWS’ DynamoDB database.

Cyber risk analytics firm CyberCube just released a preliminary insured loss estimate for AWS’ outage, projecting a loss of up to $581 million.

“We apologize for the impact this event caused our customers,” said AWS in its post-mortem report of the outage results and root cause.

“We know this event impacted many customers in significant ways,” AWS said. “We will do everything we can to learn from this event and use it to improve our availability even further.”

CRN breaks down the five biggest things to know about AWS outage that every Amazon customer, partner and user needs to know.

No. 1: CyberCube Estimates Up To $581 Million In Losses

CyberCube has released a preliminary insured loss estimate for the AWS outage, projecting a range of between $38 million and $581 million.

Cybersecurity risk analytics provider CyberCube said the outage impacted more than 2,000 large organizations and around 70,000 organizations overall.

AWS is expected to reimburse affected companies for downtime, which may limit insured losses and discourage litigation, according to the security analytics firm.

CyberCube said many customers might choose not to file claims, which is a factor contributing to its lower-end loss projection, because the outage lasted less than a day. The company expects the outage to have a low to moderate impact on cyber insurers, with the majority of losses likely to be in the lower end of CyberCube’s range.

Read through for the four other big things to know about AWS’ outage, including the root cause and changes AWS plans to make.

Broken data chain. Data hacked concept

No. 2: The Root Cause Bug That Caused DynamoDB To Go Down

At around 2:48 a.m. ET on Oct. 20, a critical fault in DynamoDB’s DNS management system cascaded into a roughly 15-hour outage that eventually disrupted millions of people.

The root cause of the outage stemmed from two automated systems that were updating the same data simultaneously.

AWS said the issue was with two programs competing to write the same DNS entry at the same time, which resulted in an empty DNS record for the service’s regional endpoint.

The error rate was triggered by “a latent defect” within the service’s automated DNS management system, Amazon said, which controls how user requests are routed to servers.

This led to the accidental deletion of all IP addresses for the database service’s regional endpoint.

The DNS issue brought down AWS’ DynamoDB database, which then created a cascading effect that impacted many AWS services such as EC2 and its Network Load Balancer.

Futuristic background with hexagon shell and hole with binary code and opened lock. Hacker attack and data breach. Big data with encrypted computer code. Safe your data. Cyber internet security and privacy concept. 3d illustration

No. 3: Digging Into DynamoDB’s DNS Root Cause

Amazon said the outage was caused by a race condition in DynamoDB’s automated DNS management system that left an empty DNS record for the service’s regional endpoint.

Amazon’s DNS management system is made up of two separate components: a DNS Planner that monitors load balancer health and builds DNS plans, and a DNS Enactor that applies changes via Amazon Route 53.

The race condition occurred when one DNS Enactor experienced “unusually high delays” while the DNS Planner continued generating new plans, according to Amazon.

A second DNS Enactor then began applying the newer plans and executed a clean-up process just as the first Enactor completed its delayed run.

This “clean-up” deleted the older plan, which immediately removed all IP addresses for the regional endpoint and left the system in an inconsistent state that prevented further automated updates applied by any DNS Enactors.

Before manual intervention, systems connecting to DynamoDB experienced DNS failures—including customer traffic and internal AWS services—which impacted EC2 instance launches and network configuration, Amazon said.

Network Load Balancer Issue

Following the DNS issue, AWS’ Network Manager began propagating a large backlog of delayed network configurations, causing newly launched EC2 instances to experience network configuration delays.

These network delays affected AWS’ Network Load Balancer (NLB) service.

NLB’s health checking subsystem deleted new EC2 instances that failed health checks due to network delays, only to then restore them when subsequent checks succeeded.

With EC2 instance launches impaired, dependent AWS services including Lambda, Elastic Container Service (ECS) and Elastic Kubernetes Service (EKS) all experienced issues.

“The root cause of this issue was a latent race condition in the DynamoDB DNS management system that resulted in an incorrect empty DNS record for the service’s regional endpoint (dynamodb.us-east-1.amazonaws.com) that the automation failed to repair,” Amazon said. "All systems needing to connect to the DynamoDB service in the [AWS North Virginia US-East-1 data center] Region via the public endpoint immediately began experiencing DNS failures and failed to connect to DynamoDB. This included customer traffic as well as traffic from internal AWS services that rely on DynamoDB.”

No. 4: Amazon Making Changes To Prevent Similar Outages

Amazon said it is making several changes to its systems following the outage, including fixing the “race condition scenario” that caused the two automated systems to overwrite each other’s work.

AWS has disabled the DynamoDB DNS Planner and DNS Enactor automation globally until safeguards can be put in place to prevent the race condition reoccurring.

Amazon also said it will build an additional test suite to help detect similar bugs in the future and improve throttling mechanisms.

“As we continue to work through the details of this event across all AWS services, we will look for additional ways to avoid impact from a similar event in the future, and how to further reduce time to recovery,” said Amazon.

“We know this event impacted many customers in significant ways,” Amazon said. “We will do everything we can to learn from this event and use it to improve our availability even further.”

No. 5: Former AWS Executive Says It Was ‘Inevitable’

Former AWS top executive Debanjan Saha told CRN that the AWS outage was “inevitable.”

“Given their massive global scale and the complexity of these distributed systems, it’s actually remarkable that large-scale disruptions like this are as rare as they are,” Saha said in an email to CRN.

AWS’ outage was “inevitable over a long enough horizon,” he said, as both public cloud and private cloud providers will eventually experience an outage.

“The question is not if, but when,” Saha added.

Saha was AWS’ former vice president and general manager of AWS’ database business from 2014 until 2019, before jumping to competitor Google Cloud in 2019 as vice president and general manager for Google’s data analytics business. He became CEO of DataRobot in 2022.

“Every business that relies on cloud infrastructure should have a clear strategy for resiliency,” Saha said. “That means thinking beyond a single data center or region, and ideally beyond a single provider, building for multi-region—and where possible—multi-cloud or hybrid environments.”

Amazon is set to report the quarterly financial results from the third quarter of 2025 on Oct. 30.