Major outages at CrowdStrike, Microsoft leave the world with BSODs and confusion

  News, Security
image_pdfimage_print
A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.
Enlarge / A passenger sits on the floor as long queues form at the check-in counters at Ninoy Aquino International Airport, on July 19, 2024 in Manila, Philippines.
Ezra Acayan/Getty Images

Millions of people outside the IT industry are learning what CrowdStrike is today, and that’s a real bad thing. Meanwhile, Microsoft is also catching blame for global network outages, and between the two, it’s unclear as of Friday morning just who caused what.

After cybersecurity firm CrowdStrike shipped an update to its Falcon Sensor software that protects mission-critical systems, blue screens of death (BSODs) started taking down Windows-based systems. The problems started in Australia and followed the dateline from there.

TV networks, 911 call centers, and even the Paris Olympics were affected. Banks and financial systems in India, South Africa, Thailand, and other countries fell as computers suddenly crashed. Some individual workers discovered that their work-issued laptops were booting to blue screens on Friday morning. The outages took down not only Starbucks mobile ordering, but also a single motel in Laramie, Wyoming.

Airlines, never the most agile of networks, were particularly hard-hit, with American Airlines, United, Delta, and Frontier among the US airlines overwhelmed Friday morning.

CrowdStrike CEO “deeply sorry”

Fixes suggested by both CrowdStrike and Microsoft for endlessly crashing Windows systems range from “reboot it up to 15 times” to individual driver deletions within detached virtual OS disks. The presence of BitLocker drive encryption on affected devices further complicates matters.

CrowdStrike CEO George Kurtz posted on X (formerly Twitter) at 5:45 am Eastern time that the firm was working on “a defect found in a single content update for Windows hosts,” with Mac and Linux hosts unaffected. “This is not a security incident or cyberattack. The issue has been identified, isolated and a fix has been deployed,” Kurtz wrote. Kurtz told NBC’s Today Show Friday morning that CrowdStrike is “deeply sorry for the impact that we’ve caused to customers.”

As noted on Mastodon by LittleAlex, Kurtz was the Chief Technology Officer of security firm McAfee when, in April 2010, that firm sent an update that deleted a crucial Windows XP file that caused widespread outages and required system-by-system file repair.

The costs of such an outage will take some time to be known, and will be hard to measure. Cloud cost analyst CloudZero estimated mid-morning Friday that the CrowdStrike incident had already cost $24 billion, based on a previous estimate.

Multiple outages, unclear blame

Microsoft services were, in a seemingly terrible coincidence, also down overnight Thursday into Friday. Multiple Azure services went down Thursday evening, with the cause cited as “a backend cluster management workflow [that] deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region.”

A spokesperson for Microsoft told Ars in a statement Friday that the CrowdStrike update was not related to its July 18 Azure outage. “That issue has fully recovered,” the statement read.

News reporting on these outages has so far blamed either Microsoft, CrowdStrike, or an unclear mixture of the two as the responsible party for various outages. It may be unavoidable, given that the outages are all happening on one platform, Windows. Microsoft itself issued an “Awareness” regarding the CrowdStrike BSOD issue on virtual machines running Windows. The firm was frequently updating it Friday, with a fix that may or may not surprise IT veterans.

“We’ve received feedback from customers that several reboots (as many as 15 have been reported) may be required, but overall feedback is that reboots are an effective troubleshooting step at this stage,” Microsoft wrote in the bulletin. Alternately, Microsoft recommend customers that have a backup from “before 19:00 UTC on the 18th of July” restore it, or attach the OS disk to a repair VM to then delete the file (Windows/System32/Drivers/CrowdStrike/C00000291*.sys) at the heart of the boot loop.

Security consultant Troy Hunt was quoted as describing the dual failures as “the largest IT outage in history,” saying, “basically what we were all worried about with Y2K, except it’s actually happened this time.”

United Airlines told Ars that it was “resuming some flights, but expect schedule disruptions to continue throughout Friday,” and had issued waivers for customers to change travel plans. American Airlines posted early Friday that it had re-established its operations by 5 am Eastern, but expected delays and cancellations throughout Friday.

Ars has reached out to CrowdStrike for comment and will update this post with response.

This is a developing story and this post will be updated as new information is available.

https://arstechnica.com/?p=2038106