Between September 10, 2025, at 13:56 UTC and September 11, 2025, at 14:40 UTC, Jira Service Management, Compass Cloud Operations, and Opsgenie customers in the U.S. region experienced degraded performance across web and mobile applications, as well as Alert REST APIs. Certain customers experienced difficulties with alert searches and page loading, particularly on alert-related pages. This incident was initiated by an EBS volume upgrade conducted within the cloud-based managed ElasticSearch clusters.
Our automated monitoring systems identified the incident within minutes, and it was resolved after 24 hours and 43 minutes by manually implementing vertical and horizontal scaling measures.
Between September 10, 2025, 13:56 UTC and September 11, 2025, 14:40 UTC, some customers in the U.S. region experienced degraded performance in Jira Service Management, Opsgenie, and Compass Cloud. During this time, web and mobile applications, as well as the Alert REST APIs, were impacted. Some customers may have seen issues with alert searches and slow loading of alert pages. During the incident, 22% of customers in the US region experienced failures in web API functionality, 8.6% encountered failures with the REST Alert API, and 1.3% experienced delays in notification delivery. Importantly, there was no loss of data during this event.
The degradation was caused by an EBS volume upgrade in the cloud-based managed ElasticSearch clusters, which necessitated a Blue/Green deployment strategy. One of the ElasticSearch nodes approached its shard size threshold, prompting the upgrade and subsequent deployment. This deployment resulted in elevated latency, increased 4xx HTTP response codes, and timeouts affecting both search and indexing operations. Recovery time exceeded expectations due to the prolonged blue/green deployment. After completion, the ElasticSearch cluster remained unhealthy and did not return to its normal state. Additionally, the switchover to the backup region failed because it was configured similarly in size and setup to the primary cluster.
We understand that outages can affect your productivity. We are prioritizing the following improvement actions to help prevent similar incidents in the future:
We apologize to customers whose services were impacted during this incident. We are taking steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support