Summary

From 3:37 AM ET to 5:40 AM ET on June 9, 2025, Opal experienced significant delays in processing access grants and revocations due to an unexpected volume of access changes that overwhelmed our task processing queue.

Impact

Opal cloud customers experienced delays in access request fulfilment during the incident window:

Severity: SEV 2

This incident was classified as SEV 2 in accordance with our severity guidelines, as it represented a significant performance degradation of a core platform feature affecting multiple customers.

Root Cause Analysis

A a series of bursts of access changes from a single customer—around 20,000-30,000 revocation tasks followed by an equal number of grant requests—put unexpected pressure on our shared task processing queue.

While we have safeguards such as API-level rate limits in place, we do not currently have advanced per tenant concurrency limits on the propagation-task-queue level. Additionally, the queue can only be scaled manually, which limits our ability to respond quickly to unexpected load spikes. This lack of advanced controls allowed this load to temporarily affect processing performance for all cloud customers.

Timeline

Screenshot 2025-06-09 at 10.53.57 am.png

Next Steps