What Happened
At approximately 6:51 AM PT on 2021-08-12, an unexpectedly large spike in inbound traffic caused many instances in Render's load balancing layer to become unavailable. The source of this traffic spike, which was 650x Render's usual peak traffic, was very likely malicious. This showed up as an increase in 503 responses returned by customer sites. Render's team was immediately alerted and began investigating and taking action to manually scale up our load balancer beyond what our autoscaling systems had already initiated.
At approximately 7:02 AM PT, our load balancer became healthy and resumed serving requests, but we noticed an increase in 502 responses, which indicated a problem reaching user services from the load balancer. We identified that a core networking component responsible for DNS was now failing, likely due to the initial traffic spike. We increased resource allocations for this component, enabling it to resume serving requests. All errors were resolved by 7:24 AM PT.
What we are doing to prevent it from happening again
We are incredibly sorry for the impact this outage had on our customers and of course, their customers. Reliability remains the top priority at Render, and we are confident in our ability to prevent similar incidents in the future.