Issues affecting services, deploys, Postgres, and Redis in Oregon.

Incident Report for Render

Postmortem

Summary

Between 2023-03-30 19:00 UTC and 2023-03-30 20:30 UTC, one of our Oregon clusters experienced an outage that affected builds & deploys, new service and datastore creation, and network connectivity both inbound and outbound.

Root Cause

After performing routine maintenance on our etcd cluster, we noticed that there was something severely wrong as each etcd member started back up. We determined that we were affected by a combination of a data corruption bug that existed in the etcd version we were running, and poor disk performance causing the etcd leader to occasionally fall behind followers. etcd being in an inconsistent state meant all cluster operations grounded to a halt, and it took the team approximately 90 minutes to recover the cluster to a healthy state.

Mitigations

We have upgraded our etcd cluster to a version without that data corruption bug.
We have upgraded the hardware that our etcd processes run on.
We have identified a set of improvements to our tooling for etcd maintenance and reliability. This work will be completed in the coming weeks.

Posted Apr 10, 2023 - 23:51 UTC

Resolved

This incident has been resolved.

Posted Mar 30, 2023 - 21:50 UTC

Update

We have taken steps to mitigate the issue, and are seeing recovery. We are continuing to monitor and are taking additional actions to prevent the issue from reoccurring.

Posted Mar 30, 2023 - 21:13 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Mar 30, 2023 - 21:13 UTC

Identified

We have identified the issue and are currently working to mitigate it.

Posted Mar 30, 2023 - 20:14 UTC

Investigating

Some services, Postgres, and Redis instances are unavailable in the Oregon region. Deploys may also be affected - Users may see no log output in their deploy logs. We are currently investigating.

Posted Mar 30, 2023 - 19:14 UTC

This incident affected: Oregon (Web Services, Cron Jobs, Background Workers, Builds and Deploys, PostgreSQL, Redis).