On February 22, 2023 we deployed a change to our routing layer that resulted in certificate errors for a subset of services in the Oregon and Frankfurt regions for traffic using custom domains. We completed a rollback of the change 12 minutes later to mitigate the impact. The time of impact was 15:16-15:28 UTC for the Oregon region and 15:26-15:36 UTC for the Frankfurt region.
We deployed a change that caused our routing layer to fail to serve any certificate for custom domains, resulting in connection failures between Cloudflare and our origins. We identified the cause and initiated a rollback 8 minutes after the start of the impact.
We run a suite of tests to verify critical functionality for all changes in a staging environment before deploying to production. While we have tests to verify that our routing layer can serve traffic for web services, this incident exposed a gap in our test coverage related to serving traffic for custom domains. As a result, the problematic change made it through our deployment pipeline undetected and was deployed to production, resulting in customer impact.