Delays observed at JSM, Opsgenie and Compass alert search functionality in US Region

Incident Report for Opsgenie

Postmortem

SUMMARY

Between September 10, 2025, at 13:56 UTC and September 11, 2025, at 14:40 UTC, Jira Service Management, Compass Cloud Operations, and Opsgenie customers in the U.S. region experienced degraded performance across web and mobile applications, as well as Alert REST APIs. Certain customers experienced difficulties with alert searches and page loading, particularly on alert-related pages. This incident was initiated by an EBS volume upgrade conducted within the cloud-based managed ElasticSearch clusters.

Our automated monitoring systems identified the incident within minutes, and it was resolved after 24 hours and 43 minutes by manually implementing vertical and horizontal scaling measures.

IMPACT

Between September 10, 2025, 13:56 UTC and September 11, 2025, 14:40 UTC, some customers in the U.S. region experienced degraded performance in Jira Service Management, Opsgenie, and Compass Cloud. During this time, web and mobile applications, as well as the Alert REST APIs, were impacted. Some customers may have seen issues with alert searches and slow loading of alert pages. During the incident, 22% of customers in the US region experienced failures in web API functionality, 8.6% encountered failures with the REST Alert API, and 1.3% experienced delays in notification delivery. Importantly, there was no loss of data during this event.

ROOT CAUSE

The degradation was caused by an EBS volume upgrade in the cloud-based managed ElasticSearch clusters, which necessitated a Blue/Green deployment strategy. One of the ElasticSearch nodes approached its shard size threshold, prompting the upgrade and subsequent deployment. This deployment resulted in elevated latency, increased 4xx HTTP response codes, and timeouts affecting both search and indexing operations. Recovery time exceeded expectations due to the prolonged blue/green deployment. After completion, the ElasticSearch cluster remained unhealthy and did not return to its normal state. Additionally, the switchover to the backup region failed because it was configured similarly in size and setup to the primary cluster.

REMEDIAL ACTIONS PLAN & NEXT STEPS

We understand that outages can affect your productivity. We are prioritizing the following improvement actions to help prevent similar incidents in the future:

Increase ElasticSearch cluster capacity and optimize settings to handle peak search loads efficiently.
Strengthen our high availability (HA) and failover architecture to ensure rapid and reliable recovery in the event of primary region failures.
Optimize cluster upgrade and shard rebalancing strategies to prevent similar issues.

We apologize to customers whose services were impacted during this incident. We are taking steps to improve the platform’s performance and availability.

Thanks,
Atlassian Customer Support

Posted Sep 24, 2025 - 07:57 UTC

Resolved

Between September 10, 2025, 3:19 PM UTC and September 11, 2025, 2:44 PM UTC, there was degraded performance in web experiences and the REST APIs for some Jira Service Management, Opsgenie, and Compass Cloud customers in the US region. We have deployed a fix to mitigate the issue and have verified that the services have recovered. The issue has been resolved and the service is operating normally.

Posted Sep 11, 2025 - 18:45 UTC

Monitoring

We have deployed the fix and all operations are back to normal. We will continue to monitor the operations.

We will continue providing specific updates when available.

Posted Sep 11, 2025 - 15:05 UTC

Update

We have progressed further in preparing our system with a probable fix. Initial testing has been successful. We are in the process of completing further testing and finalising the fix. We expect the fix to be completed in next few hours.

Alerts continue to successfully notify users, in some cases the responder name may be missing. Our goal continues to restore full functionality to users from Jira Service Management, Opsgenie and Compass.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 14:00 UTC

Update

We are progressing well with the testing of identified potential mitigation steps for this issue; and we are seeing positive early results. We will update with an ETA and further information in the next update.

Alerts continue to successfully notify users, in some cases the responder name may be missing. Our goal continues to restore full functionality to users from Jira Service Management, Opsgenie and Compass.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 13:02 UTC

Update

Atlassian teams continue to test the identified potential resolution steps for this issue. Our goal is to restore full functionality to users from Jira Service Management, Opsgenie and Compass.

Currently, the alerts continue to successfully notify users, except that the responder name may be missing.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 11:53 UTC

Update

Our teams have are identified some potential resolution steps for this issue and continue to test the same on priority to be able to restore full functionality to users from Jira Service Management, Opsgenie and Compass.

Currently, the alerts continue to successfully notify users, except that the responder name may be missing.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 10:59 UTC

Update

Atlassian teams continue to remain engaged with priority to restore full functionality to new alerts to users from Jira Service Management, Opsgenie and Compass.

Escalation teams also continue to remain engaged on this incident to ensure our focus is not diluted from a resolution as soon as possible.

Currently, the alerts continue to successfully notify users, except that the responder name may be missing.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 09:51 UTC

Update

Atlassian teams remain engaged with priority to restore full functionality to new alerts to users from Jira Service Management, Opsgenie and Compass.

Escalation teams are also engaged on this incident to ensure we can get to a resolution as soon as possible.

At this point, the alerts continue to successfully notify users, except that the responder name may be missing.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 08:53 UTC

Update

All concerned teams are engaged and are working to restore full functionality to new alerts to users from Jira Service Management and Opsgenie.

Escalation teams are also engaged on this incident to try and ensure we can get to a resolution as soon as possible.

Currently, the alerts continue to successfully notify users, but without the intended content within the alerts.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 07:50 UTC

Update

Our teams priority remains to restore full functionality to new alerts to users from Jira Service Management and Opsgenie.

Escalation teams are fully engaged on this incident to try and ensure we can get to a resolution as soon as possible.

At this time these alerts are still successfully notifying users, but without the intended content within the alerts.

We will continue providing specific updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 06:51 UTC

Update

Our infrastructure teams are working diligently to try and ensure all services are restored to Jira Service Management and Opsgenie as soon as possible.

New alerts are continuing to notify users, however we are still prioritising infrastructure to restore the content inside these alerts for Web and Mobile UIs.

We will continue providing updates when available and will ensure we have further update within the hour.

Posted Sep 11, 2025 - 05:51 UTC

Update

Our team is continuing to investigate with the highest level of urgency in order to restore Jira Service Management and Opsgenie services.

At this time we are prioritising infrastructure to try and restore new and active alerts to be populated with their alert content as soon as possible, while continuing to deliver existing messages as they are ready.

We will continue to provide updates as we progress, and will ensure we have an update posted within the hour.

Posted Sep 11, 2025 - 04:50 UTC

Update

We have received reports that some customers are experiencing further issues relating to loading pages within Opsgenie and Jira Service Management which may be related to the infrastructure efforts underway to restore services to these products.

Please be aware we also have teams actively investigating these issues to ensure as fast a resolution as possible.

We want to reiterate that your alerts are still sending notifications correctly, and there is no data loss occurring. However, the underlying issue is affecting the propagation of alert details to notifications, and to the Web and Mobile UI.

Viewing schedules may also be impacted at this time while we continue to process the tasks required for recovery.

We will provide further updates as soon as possible.

Posted Sep 11, 2025 - 03:49 UTC

Update

New alerts should still continue to notify users as they are generated, however the alert content will still be missing.

Our teams are still actively engaging with infrastructure teams to try and expedite historical processing of messages to fully restore services.

We will provide further updates within the hour.

Posted Sep 11, 2025 - 03:10 UTC

Update

While new alerts should now correctly notify users, the content of those alerts for impacted customers will remain empty until missed alerts are fully processed.
Our team is continuing to investigate with urgency potential infrastructure options to expedite this process.
We will provide further updates within the hour.

Posted Sep 11, 2025 - 02:10 UTC

Update

We are continuing to see recovery of Jira Service Management operations and Opsgenie services.
There are a large number of missed alerts to be processed due to the prior service issue and our team is actively investigating methods to help expedite this processing.
Alerts should trigger in the meantime but the content of these alerts may not be visible at this time.
Our teams are continuing to investigate with urgency.
We will provide further updates within the hour.

Posted Sep 11, 2025 - 01:10 UTC

Update

Our team has been able to identify the root cause of these performance issues and have put a mitigation into place.
We are now continuing to see recovery of Jira Service Management operations and Opsgenie services.
Existing alerts are continuing to be processed at this time.
We will provide further update as soon as possible.

Posted Sep 11, 2025 - 00:11 UTC

Update

Our team is still actively working to resolve the issue and making progress toward a resolution. Thank you for your patience.

Posted Sep 10, 2025 - 20:10 UTC

Update

We are continuing to work on a fix for this issue.

Posted Sep 10, 2025 - 17:56 UTC

Identified

We continue to work on resolving the incident for Jira Service Management and Opsgenie. We have identified the root cause and taken actions to mitigate the issue and minimize the impact on search functionality.

Posted Sep 10, 2025 - 15:26 UTC

Investigating

We identified degraded performance in web experiences and Alert Rest API for some Jira Service Management and Opsgenie Cloud customers in the US Region. The team has taken actions to mitigate the issue and minimize the impact on search functionality.

Posted Sep 10, 2025 - 15:23 UTC

This incident affected: US (Incident Flow, Alert Flow, Email Notification Delivery, SMS Notification Delivery, Voice Notification Delivery, Mobile Notification Delivery, Heartbeat Monitoring, Incident REST API, Alert REST API, Heartbeat REST API, Incoming Email Service, Incoming Integration Flow, Outgoing Integration Flow, Signup, Login & Authorization, Opsgenie Actions, Web Application, Mobile Application, Configuration REST APIs, Reporting & Analytics, Pricing & Billing, Logs, Incoming Call Routing), EU (Incident Flow, Alert Flow, Email Notification Delivery, SMS Notification Delivery, Voice Notification Delivery, Mobile Notification Delivery, Heartbeat Monitoring, Incident REST API, Alert REST API, Heartbeat REST API, Incoming Email Service, Incoming Integration Flow, Outgoing Integration Flow, Signup, Login & Authorization, Opsgenie Actions, Web Application, Mobile Application, Configuration REST APIs, Reporting & Analytics, Pricing & Billing, Logs, Incoming Call Routing), and Public Website.