Services not reachable - "SSL handshake failed Error code 525"
Incident Report for Render
Postmortem

Summary

On February 22, 2023 we deployed a change to our routing layer that resulted in certificate errors for a subset of services in the Oregon and Frankfurt regions for traffic using custom domains. We completed a rollback of the change 12 minutes later to mitigate the impact. The time of impact was 15:16-15:28 UTC for the Oregon region and 15:26-15:36 UTC for the Frankfurt region.

Root Cause

We deployed a change that caused our routing layer to fail to serve any certificate for custom domains, resulting in connection failures between Cloudflare and our origins. We identified the cause and initiated a rollback 8 minutes after the start of the impact.

We run a suite of tests to verify critical functionality for all changes in a staging environment before deploying to production. While we have tests to verify that our routing layer can serve traffic for web services, this incident exposed a gap in our test coverage related to serving traffic for custom domains. As a result, the problematic change made it through our deployment pipeline undetected and was deployed to production, resulting in customer impact.

Mitigations

  • We will improve our test coverage to ensure that traffic sent to web services via custom domains can be served successfully before deploying changes to production
Posted Feb 23, 2023 - 23:38 UTC

Resolved
Some services in the Oregon and Frankfurt regions were unavailable due to a certificate error between 15:14-15:26 UTC. This was caused by a problematic change that has been rolled back.
Posted Feb 22, 2023 - 15:45 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Feb 22, 2023 - 15:34 UTC
This incident affected: Oregon (Web Services) and Frankfurt (Web Services).