Large WebSocket messages reported to be failing
Incident Report for Render
Postmortem

Summary

Render uses Cloudflare Workers as an HTTP proxy layer, for managing caching and cached requests, for all customers. A deploy to our Cloudflare Workers introduced new behavior around WebSocket compression, that mixed compressed and uncompressed messages and caused WebSocket client errors. The regression in behavior was manually reported and then disabled.

Impact

The regression was active for 17 hours, enabled around 2023-10-04 18:46Z (11:46a PT) and disabled at 2023-10-05 12:06Z (05:06a PT). Several customers using large messages depended on this behavior and helped report and identify the issue.

Root Cause

During a deploy to enable additional logging in the Cloudflare Workers, a new Cloudflare feature around WebSocket Compression was enabled and included in that deploy. Cloudflare Workers do offer flags to prevent new features from being enabled, but Render did not specifically block this feature from being enabled.

With the WebSocket Compression feature enabled, new WebSocket connections would attempt to use compression and mix compressed and uncompressed data. Messages over 1MB of data then resulted in WebSocket 1002 or 1009 errors on the client or server.

Once the Render support and engineering was aware of the regression, several production changes were reverted until the Cloudflare Workers deployment was identified as the cause. The previous Cloudflare Workers deployment was then rolled back.

Mitigations

  • Render has disabled the Cloudflare Workers feature around WebSocket Compression for the near-term
  • Render has added end to end regression testing around this WebSocket behavior with large messages
  • Render is creating an automated test suite, actively testing live WebSocket behavior
Posted Oct 20, 2023 - 22:18 UTC

Resolved
Customers making large websocket requests (>1Mib) may have received errors about the message being too large. The cause has been identified and reverted.

Customer impact was between 2023-10-04 1846UTC and 2023-10-05 1206UTC
Posted Oct 04, 2023 - 23:00 UTC