On March 1, 2023 we upgraded critical infrastructure in the Ohio region which resulted in overwriting the region's DNS settings. As a result, DNS servers were unable to serve requests, resulting in DNS lookup failures for all services in the region. We updated the affected servers 18 minutes later and service was restored. The time of the impact was 17:50-18:08 UTC.
We upgraded critical infrastructure in Ohio, which reverted the amount of resources available to DNS servers in that region. When the DNS servers restarted, a combination of the increased load due to an empty cache and the lower resources caused them to enter a crash loop. Many of our internal services rely on DNS to function, so what followed was a cascade of failures in other components.
We updated the resources available to the DNS servers, which allowed them to come online and service was restored in all other components shortly afterwards.