How a massive part of the internet went down for an hour
The company behind Tuesday’s massive internet outage has apologized for making a costly mistake that knocked out websites, apps and online services across the world.
Fastly, which runs a content delivery network of servers and data centers, also said in a statement late Tuesday that it would work to prevent such a widespread failure in the future.
“Even though there were specific conditions that triggered this outage, we should have anticipated it,” said Nick Rockwell, Fastly’s senior vice president of engineering and infrastructure in the statement. “We provide mission critical services, and we treat any action that can cause service issues with the utmost sensitivity and priority. We apologize to our customers and those who rely on them for the outage.”
Fastly supports news sites and apps like CNN, the Guardian, the New York Times and many others. It also provides content delivery for Twitch, Pinterest, HBO Max, Hulu, Reddit, Spotify and other services. The 49-minute outage Tuesday morning took down other major internet platforms and sites as well, including Amazon, Target, and the UK government website — Gov.uk. It affected dozens of countries across the Americas, Europe and Asia, as well as South Africa.
The culprit was a bad software update that Fastly applied May 12. The update introduced a bug that could be triggered by a customer configuring their service under specific circumstances — which happened Tuesday. Fastly said the bug caused 85% of its network to return errors.
Although Fastly was able to get around the bug within an hour, it continues to deploy a permanent fix across its network. The company also said it is reviewing its processes and practices to determine why the bug wasn’t detected when it was introduced last month, and it will figure out how to get its network up faster if something like Tuesday’s outage were to happen again.
“This outage was broad and severe, and we’re truly sorry for the impact to our customers and everyone who relies on them,” Rockwell said.
Fastly helps improve load times for websites and provides other services to internet sites, apps and platforms — including a global server network designed to smooth out traffic overloads that can crash websites, such as a denial-of-service attack. The service accomplishes that by storing content and aspects of websites and apps on servers that are physically closer to the users trying to access a particular site or platform.
But because Fastly provides a layer of support between internet companies and customers trying to access the various online platforms it services, when it goes down, access to those platforms can be blocked entirely.
Bad software updates are rare. But similar goofs have temporarily brought down parts of even larger online platforms, including Google and Amazon, in the past.