The 10 Biggest Cloud Outages Of 2025: AWS, Google And Microsoft

From AWS, Google Cloud and Microsoft to Cloudflare, Ingram Micro and Salesforce—here are the 10 biggest cloud outages of 2025 that rocked millions of customers.

2013848574

This year saw massive cloud outages affect millions of users and businesses, sometimes lasting for several days, as ransomware attacks and software errors continued to plague the IT industry in 2025.

Amazon Web Services, Google Cloud and Microsoft Azure—the three largest cloud computing companies on the planet that own nearly two-thirds of the global cloud services market—all suffered outages this year that rocked its customers.

There were also cloud outages from Salesforce’s Slack, Cloudflare and SentinelOne, as well as a ransomware attack on Ingram Micro that brought down services for days.

[Related: Deloitte CEO On 2026 AI Plans, Removing AI ‘Fear’ And AWS Partnership]

Cloud Outages In 2025

These 10 cloud outages affected millions of users throughout the year, disrupting everything from airlines and financial services to social media platforms and workforce applications.

Some tech outages only took a few hours to fix and fully restore, while others took several days to fully resolve.

Critical services like Cloudflare and AWS—which many other services depend on—experienced outages that affected millions due to a cascading effect. While infrastructure hardware has improved over the years, the complexity of modern architectures and evolving external threats present new risks that operators must actively manage.

CRN breaks down the 10 biggest cloud outages of 2025 that you should know about from AWS, Conduent, Google Cloud, Ingram Micro, Microsoft, Cloudflare, Salesforce, SentinelOne and Zoom.

AWS Outage That Affected Millions

In one of the biggest and most widespread cloud outages of the year, the world’s global cloud market leader suffered a massive 15-hour-long outage in October that hit millions of people and businesses.

Over 4 million users reported issues on Downdetector related to AWS’ outage.

It affected over 1,000 companies—from payment service providers and financial trading applications to social media websites and commercial software.

The root cause of AWS’ outage stemmed from a Domain Name System (DNS) error that prevented applications from locating the correct address for DynamoDB. AWS’ DynamoDB is a cloud database that stores user information and other critical data.

The DNS issue triggered a waterfall effect that spread throughout AWS’ vast services portfolio that millions of users leverage every day, including internet websites, digital products and applications.

The AWS issue that caused rippling effects worldwide began inside the company’s US-East-1 data center site, which is AWS’ oldest and largest site for its web services located in northern Virginia.

Ingram Micro Ransomware Attack And Global Outage

Ingram Micro was hit with a ransomware attack in July that downed its systems for days, crippling its online ordering systems and product shipments.

The outage halted orders, billing quotes and license management for thousands of Ingram Micro customers.

Key operational platforms like its AI-powered distribution system Xvantage and Ingram Micro’s cloud licensing platform Impulse, were taken offline.

The ransomware attack was associated with the cybercriminal group known as SafePay. Using leaked VPN credentials tied to Ingram Micro’s GlobalProtect remote access system, SafePay slipped through the perimeter undetected.

Ingram Micro worked with third-party cybersecurity experts to investigate and remediate the cybersecurity incident that took around six days to fully fix and restore services.

Ingram Micro said it has implemented additional safeguards and monitoring measures to better protect its network environment.

Microsoft Azure Outage

Every Microsoft Azure region worldwide saw network infrastructure issues on Oct. 29 amid a widespread outage that affected Entra, Purview, Defender and other cloud offerings within the Microsoft 365 suite.

The Azure Front Door (AFD) cloud content delivery network (CDN) and security service were the focus of the issues.

Microsoft blamed the outage on “an inadvertent tenant configuration change within” AFD.

AFD’s issues resulted in latencies, timeouts and errors for a variety of Microsoft products and services, including Azure Active Directory B2C, Azure Databricks, Azure Healthcare APIs, Azure Portal, Azure SQL Database, Azure Virtual Desktop (AVD), Microsoft Copilot for Security, Microsoft Purview, Microsoft Sentinel Threat Intelligence and Video Indexer.

The culprit was the cleanup of tenants with erroneous metadata generated by a particular sequence of profile update operations, a previously unknown bug to Microsoft.

Following the outage, Microsoft said it has implemented additional validation and rollback controls to prevent a similar issue from happening in the future.

Google Cloud’s Outage

On June 12, a Google Cloud outage took out a variety of popular websites and applications including Discord and Spotify.

The issue stemmed from a new feature added to Service Control, the core binary that is part of the check system that makes sure API requests are authorized with appropriate policies to meet endpoints.

In larger regions, such as us-central-1, Service Control task restarts overloaded the infrastructure. The region took almost three hours to fully resolve, and Google throttled task creation to minimize infrastructure impact.

The incident lasted around 3 hours with over 70 different Google Cloud services affected.

Moving forward, Google said it would modularize Service Control’s architecture to isolate the functionality and “fail open”—which is a default to an accessible state if a future failure happens.

Cloudflare said in its own report on the incident that its Workers KV (key value) data store saw the failure of underlying storage infrastructure that is backed in part by Google Cloud.

Worker KV “is a critical dependency for many Cloudflare products and relied upon for configuration, authentication and asset delivery across the affected services,” Cloudflare said. Cloudflare said it plans to prevent singular dependencies on third-party storage infrastructure to improve recovery.

Zoom’s Outage Affected Tens Of Thousands

Videoconferencing star Zoom saw a two-hour outage on April 16 that affected tens of thousands of users.

Downdetector reports by affected users reached a high of about 67,000.

Zoom blamed “a server block by GoDaddy Registry” due to “a communication error between Zoom’s domain registrar, Markmonitor, and GoDaddy Registry, which resulted in GoDaddy Registry mistakenly shutting down the zoom.us domain,” Zoom said about the incident.

“Any start, join, or schedule meetings actions were unable to be completed successfully since the requests required a DNS lookup, which could not be completed,” according to Zoom’s report.

To prevent the issue from happening again, GoDaddy and Markmonitor “put in place a registry lock that restricts server block commands from being placed on the zoom.us domain.”

Salesforce’s Slack Outage Blamed On A Maintenance Action

Salesforce collaboration application Slack saw a pair of issues on Feb. 26 and Feb. 27.

On the first day, “a large percentage of Slack users experienced issues with various features including sending and receiving messages, using workflows, loading channels or threads, and logging into Slack,” according to a Slack report on the incident. “These features may have been degraded or in some cases fully unusable.”

Slack blamed the issue on a maintenance action in one of its database systems, which, “combined with a latent defect in our caching system, caused an overload of heavy traffic to the database”—making half of the instances relying on the database unavailable.

At the peak of the outage, more than 3,000 users reported to DownDetector that they couldn't access the platform.

Even after the database systems problem was resolved, Slack Events application programming interface users saw continued issues until Feb. 27, where custom applications, integrations and bots stopped working for some users.

Slack blamed the migration measures used for the database issue.

Cloudflare Outage Brings Down ChatGPT, Shopify And More

Cloudflare had a three-hour outage in November that brought down numerous popular websites for many users—including OpenAI’s ChatGPT, X.com and e-commerce platform Shopify—while also causing transportation disruptions such as impacts to the New Jersey Transit system and Uber.

The global network outage was caused by a database permissions change and was not the result of a cyberattack.

While Cloudflare initially wrongly suspected the symptoms were caused by a hyperscale DDoS attack, the vendor was soon able to correctly identify the issue.

The error was caused by a database permissions change in the vendor’s ClickHouse cluster, which caused a configuration file used by its Bot Management service to inadvertently double in size and then propagate throughout its network.

The incident ended up being the “worst” outage since 2019 for Cloudflare, said CEO Matthew Prince.

SentinelOne Platform Outage

In May, cybersecurity company SentinelOne witnessed a global platform outage that prevented access to its widely used consoles.

SentinelOne said a software flaw in an outgoing infrastructure control system triggered an automatic function that removed critical network routes on May 29.

SentinelOne engineering confirmed that day “that the manual restoration of all routes was completed and began validating customer console access,” with a subsequent post to the customer and partner portals saying that console access was restored.

All of the data ingestion backlog was burned down by May 30, according to the report.

Conduent’s Outage Caused By Cyberattack

Conduent kicked off the year with a rough start in January with a major service outage caused by a cyberattack.

The solution provider saw an outage that affected some support payments and benefits in the U.S. Conduent systems that are used to enable government services such as child support payments and food assistance.

Conduent CFO Giles Goodburn said in May that the company incurred “$3 million and accrued $22 million of non-recurring expenses in the first quarter related to the event based on potential notification requirements.”

According to a regulatory filing Conduent made in April, the company said that, in some cases, affected systems weren’t restored to normal operations for days.

Microsoft 365 Outage

On March 1, Microsoft published an alert to social media platform X saying that an issue caused an inability for users to access Outlook features and services.

Outage reports for Outlook peaked at about 35,000 in the U.S. on Downdetector.

About 25,000 Microsoft 365 subscribers in the U.S. reported outages shortly after the Outlook reports started.

A few hours later, Microsoft said that “following our reversion of the problematic code change, we’ve monitored service telemetry and worked with previously impacted users to confirm that service is restored.”