Atlassian Cloud Services impacted

Incident Report for Opsgenie

Postmortem

Postmortem publish date: Nov 19th, 2025

Summary

All dates and times below are in UTC unless stated otherwise.

Customers utilizing Atlassian products experienced elevated error rates and degraded performance between Oct 20, 2025 06:48 and Oct 21, 2025 04:05. The service disruptions were triggered due to an AWS DynamoDB outage and further affected by subsequent failures in AWS EC2 and AWS Network Load Balancer within the us-east-1 region.

The incident started at Oct 20, 2025 06:48 and was detected within six minutes by our automated monitoring systems. Our teams worked to restore all core services by Oct 21, 2025 04:05. Final cleanup of backlogged processes and minor issues was completed on Oct 22, 2025.

We recognize the critical role our products play in your daily operations, and we offer our sincere apologies for any impact this incident had on your teams. We are taking immediate steps to enhance the reliability and performance of our services, so that you continue to receive the standard of service you have come to trust.

IMPACT

Before examining product-level impacts, it's helpful to understand Atlassian's service topology and internal dependencies.

Products such as Jira and Confluence are deployed across multiple AWS regions. The data for each tenant is stored and processed exclusively within its designated host region. This design is intentional and represents the desired operational state, as it limits the impact of any regional outage strictly to tenants in-region, in this case us-east-1.

While in-scope application data is pinned to the region selected by the customer, there are times when systems need to call other internal services that may be based in a different region. If a problem occurs in the main region where these services operate, systems are designed to automatically fail over to a backup region, usually within three minutes.

However, if unexpected issues arise during this failover, it can take longer to restore services. In rare cases, this could affect customers in more than one region. It’s important to note that all in-scope application data for supported products is pinned according to a customer’s chosen region.

Jira

Between Oct 20, 2025 06:48 and Oct 20, 2025 20:00, customers with tenants hosted in the us-east-1 region experienced increased error rates when accessing core entities such as Issues, Boards, and Backlogs. This disruption was caused by AWS's inability to allocate AWS EC2 instances and elevated errors in AWS Network Load Balancer (NLB). During this window, users may also have observed intermittent timeouts, slow page loads, and failures when performing operations like creating or updating issues, loading board views, and executing workflow transitions.

Between Oct 20, 2025 08:36 and Oct 20, 2025 09:23, customers across all regions experienced elevated failure rates when attempting to load Jira pages. This disruption was caused by the regional frontend service entering an unhealthy state during this specific time interval.

Normally, the frontend service connects to the primary AWS DynamoDB instance located in the us-east-1 to retrieve the most recent configuration data necessary for proper operation. Additionally, the service is designed with a fallback mechanism that references static configuration data in the event that the primary database becomes inaccessible. Unfortunately, a latent bug existed in the local fallback path. When the frontend service nodes restarted, they were unable to load critical operational configuration data from primary or fallback sources, leading to the observed failures experienced by customers.

Between Oct 20, 2025 06:48 and Oct 21, 2025 06:30, customers experienced significant delays and missing Jira in-app notifications across all regions. The notification ingestion service, which is hosted exclusively in us-east-1, exhibited an increased failure rate when processing notification messages due to AWS EC2 and NLB issues. This issue resulted in notifications being delayed - and in some cases, not delivered at all - to users worldwide.

Jira Service Management (JSM)

JSM was impacted similarly to Jira above, with the same timeframes and for the same reasons.

Between Oct 20, 2025 08:36 and Oct 20, 2025 09:23, customers across all regions experienced significantly elevated failure rates when attempting to load JSM pages. This affected all JSM experiences including the Help Centre, Portal, Queues, Work Items, Operations, and Alerts.

Confluence

Between Oct 20, 2025 06:48 and Oct 21, 2025 02:45, customers using Confluence in the us-east-1 region experienced elevated failure rates when performing common operations such as editing pages or adding comments. The primary cause of this service degradation was the system's inability to auto-scale due to AWS EC2 issues to manage peak traffic load effectively.

Though the AWS outage ended at Oct 20, 21:09, a subset of customers continued to experience failures as some Confluence web server nodes across multiple clusters remained in an unhealthy state. This was ultimately mitigated by recycling the affected nodes.

To protect our systems while AWS recovered, we made a deliberate decision to enable node termination protection. This action successfully preserved our server capacity but, as a trade-off, it extended the time required for a full recovery once AWS services were restored.

Automation

Between Oct 20, 2025 06:55 and Oct 20, 2025 23:59, automation customers whose rules are processed in us-east-1 experienced delays of up to 23 hours in rule execution.

During this window, some events triggering rule executions were processed out of order because they arrived later during backlog processing. This caused potential inconsistencies in workflow executions, as rules were run in the order events were received, not when the action causing the event occurred. Additionally, some rule actions failed because they depend on first-party and third-party systems, which were also affected by the AWS outage. Customers can see most of these failures in their audit logs; however, a few updates were not logged due to the nature of the outage.

By Oct 21, 2025 5:30, the backlog of rule runs in us-east-1 was cleared. Although most of these delayed rules were successfully handled, there were some additional replays of events to ensure completeness. Our investigation confirmed that a few events may never have triggered their associated rules due to the outage.

Between Oct 20, 2025 06:55 and Oct 20, 2025 11:20, all non-us-east-1 regional automation services experienced delays of up to 4 hours in rule execution. This was caused by an upstream service that was unable to deliver events as expected. The delivery service encountered a failure due to a cross-region dependency call to a service hosted in the us-east-1 region. Because of this dependency issue, the delivery service was unable to successfully deliver events throughout this time frame, resulting in customer-defined rules not being executed in a timely manner.

Bitbucket and Pipelines

Between Oct 20, 2025 06:48 and Oct 20, 2025 09:33, Bitbucket experienced intermittent unavailability across core services. During this period, users faced increased error rates and latency when signing in, navigating repositories, and performing essential actions such as creating, updating, or approving pull requests. The primary cause was an AWS DynamoDB outage that impacted downstream services.

Between Oct 20, 2025 06:48 and Oct 20, 2025 22:46, numerous Bitbucket Pipeline steps failed to start, stalled mid-execution, or experienced significant queueing delays. Impact varied, with partial recoveries followed by degradation as downstream components re-synchronized. The primary cause was an AWS DynamoDB outage, compounded by instability in AWS EC2 instance availability and AWS Network Load Balancers.

Furthermore, Bitbucket Pipelines continued to experience a low but persistent rate of step timeouts and scheduling errors due to AWS bare-metal capacity shortages in select availability zones. Atlassian coordinated with AWS to provision additional bare-metal hosts and addressed a significant backlog of pending pods, successfully restoring services by 01:30 on Oct 21, 2025.

Trello

Between Oct 20, 2025 06:48 and Oct 20, 2025 15:25, users of Trello experienced widespread service degradation and intermittent failures due to upstream AWS issues affecting multiple components, including AWS DynamoDB and subsequent AWS EC2 capacity constraints. During this period, customers reported elevated error rates when loading boards, opening cards, adding comments or attachments.

Login

Between Oct 20, 2025 06:48 and Oct 20, 2025 09:30, a small subset of users experienced failures when attempting to initiate new login sessions using SAML tokens. This resulted in an inability for those users to access Atlassian products during that time period. However, users who already had valid active sessions were not affected by this issue and continued to have uninterrupted access.

The issue impacted all regions globally because regional identity services relied on a write replica located in the us-east-1 region to synchronize profile data. When the primary region became unavailable, the failover to a secondary database in another region failed, which delayed recovery. This failover defect has since been addressed.

Statuspage

Between Oct 20, 2025 06:48 and Oct 20, 2025 09:30, Statuspage customers who were not already logged in to the management portal were unable to log in to create or update incident statuses. This impact was restricted only to users who were not already logged in at the time. The root cause was the same as described in the Login section above, and it was resolved by the same remediation steps.

REMEDIAL ACTION PLAN & NEXT STEPS

We have completed the following critical actions designed to help prevent cross-region impact from similar issues:

Resolved the code defect in the fallback option to ensure that Jira Frontend Services in other regions remain unaffected during a region-wide outage.
Fixed the issue that prevented timely failover of the identity service which impacted new login sessions.
Resolved the code defect so that delivery services in unaffected regions remain operational during region-wide outages.

Additionally, we are prioritizing the following improvement actions:

Implement mitigation strategies to strengthen resilience against region-wide outages in the notification ingestion service.

Although disruptions to our cloud services are sometimes unavoidable during outages of the underlying cloud provider, we continuously evaluate and improve test coverage to strengthen resilience of our cloud services against these issues.

We recognize the critical importance of our products to your daily operations and overall productivity, and we extend our sincere apologies for any disruptions this incident may have caused your teams. If you were impacted and require additional details for internal post-incident reviews, please reach out to your Atlassian support representative with affected timeframes and tenant identifiers so we can correlate logs and provide guidance.

Thanks,

Atlassian Customer Support

Posted Oct 24, 2025 - 17:54 UTC

Resolved

Our team is now able to see full recovery across the vast majority of Atlassian products.

We are aware of some ongoing issues with specific components such as migrations and JSM virtual service agents, and our team is continuing to investigate with urgency.

We apologise for the inconvenience that this incident has caused and we will provide further information when the Post Incident Investigation has been completed.

Posted Oct 21, 2025 - 05:24 UTC

Update

The issue relating to the Atlassian Support portal displaying a message to customers to use our temporary support channel has now been resolved. The Atlassian Support portal is fully functional for any ongoing support issues.

With regards to other Atlassian products, we continue to see recovery continuing across all impacted products and our teams are continuing to monitor as the recovery continues.

We will provide further update on our recovery status within two hours.

Posted Oct 21, 2025 - 03:24 UTC

Update

We continue to see recovery progressing across all impacted products as backlogged items continue to be processed.

The Atlassian Support portal is currently displaying a message directing customers to our temporary support channel. Please note that our support portal is currently fully functional for those attempting to raise requests. We are continuing to look into this alert to remove this message.

We will provide further update on our recovery status in two hours.

Posted Oct 21, 2025 - 01:07 UTC

Update

Our team is now seeing recovery across all impacted Atlassian products. We are continuing to monitor for individual products that may still be processing backlogged items now that services are restored.

The Atlassian Support portal is currently still displaying a message directing customers to our temporary support channel. Please note that our support portal is currently fully functional for those attempting to raise requests. We are continuing to look into this alert to remove this message.

We will provide further update on our recovery status in one hour.

Posted Oct 21, 2025 - 00:03 UTC

Update

Our teams are continuing to monitor the recovery of systems across Atlassian products. This update is to inform that the Atlassian Support portal is fully operational at this time for customers that wish to contact support.

Posted Oct 20, 2025 - 22:58 UTC

Update

Monitoring - We've started seeing continued product experience improvement.

While we still have a backlog of event processing, we are seeing improvements in systems operational capabilities across all products. We estimate a significant improvement with the next few hours and will continue to monitor the health of AWS services and the effects on Atlassian customers. We appreciate your continued patience and remain committed to full resolution as we work through this situation. We will post our next update in two hours.

Posted Oct 20, 2025 - 21:26 UTC

Monitoring

There have been no changes since our last update. We will provide our next updated by 9:00PM UTC or sooner as new information becomes available.

We are currently aware of an ongoing incident impacting Atlassian Cloud services due to an outage with our public cloud provider, AWS. We understand the impact this issue is having on your operations and want to assure you that resolving this matter is our highest priority and we are closely monitoring the health of AWS services. While we do not have a definitive ETA at this time, we remain committed to full resolution and deeply appreciate your patience as we work through this situation.

Posted Oct 20, 2025 - 18:38 UTC

Update

We are currently aware of an ongoing incident impacting Atlassian Cloud services due to an outage with our public cloud provider, AWS. We understand the impact this issue is having on your operations and want to assure you that resolving this matter is our highest priority and we are closely monitoring the health of AWS services. While we do not have a definitive ETA at this time, we remain committed to full resolution and deeply appreciate your patience as we work through this situation. We will continue to provide updates every hour or sooner as new information becomes available.

Posted Oct 20, 2025 - 17:43 UTC

Update

Update - We understand the impact this issue is having on your operations and want to assure you that resolving this matter is our highest priority. Our public cloud provider is actively working to mitigate this issue with urgency. While we do not have a definitive ETA at this time, we remain committed to full resolution and deeply appreciate your patience as we work through this situation. We will continue to provide updates every hour or sooner as new information becomes available.

Posted Oct 20, 2025 - 17:27 UTC

Update

Update - Thank you for your continued patience. We understand the impact this issue is having on your operations and want to assure you that resolving this matter is our highest priority. Our public cloud provider is still actively working to mitigate this issue with urgency and while we do not have a definitive ETA at this time, we remain committed to full resolution and deeply appreciate your patience as we work through this situation. We will be providing hourly updates on this issue.

Posted Oct 20, 2025 - 16:36 UTC

Update

Posted Oct 20, 2025 - 16:35 UTC

Update

Posted Oct 20, 2025 - 15:21 UTC

Update

Posted Oct 20, 2025 - 15:20 UTC

Update

We understand your pain and mitigating or fixing this issue is of utmost importance. Our public cloud provider is actively working to mitigate this issue on priority. We have been seeing partial operational success. We appreciate your patience and will continue to provide updates every hour or sooner.

Posted Oct 20, 2025 - 14:02 UTC

Update

Our public cloud provider is working to mitigate this issue quickly. We are seeing some early positive indicators and are continuing to monitor. We appreciate your patience and will continue to provide updates every hour or sooner.

Posted Oct 20, 2025 - 13:58 UTC

Update

Posted Oct 20, 2025 - 13:56 UTC

Update

Posted Oct 20, 2025 - 13:07 UTC

Update

Atlassian team is actively engaged and continues to work with our public cloud provider to mitigate this issue at the earliest. We are starting to see partial operations succeed. We appreciate your patience. We shall continue to share updates every hour, if not sooner.

Posted Oct 20, 2025 - 12:06 UTC

Update

We continue to work with our public cloud provider towards mitigating the issue at the earliest. We appreciate your patience. We shall continue to share updates every hour, if not sooner.

Posted Oct 20, 2025 - 11:07 UTC

Identified

We understand that our public cloud provider has identified the cause of the issue. We are starting to see some recovery and is working towards mitigation. We appreciate your patience.
We shall continue to share updates every hour, if not sooner.

Posted Oct 20, 2025 - 09:58 UTC

Update

We are experiencing an outage due to some issue at the end of our public cloud provider. We are working closely with them to get this resolved or mitigated as quickly as possible. ETA of the same is not know at the moment.
We shall continue to share updates every hour, if not sooner.

Posted Oct 20, 2025 - 09:24 UTC

Update

Posted Oct 20, 2025 - 09:23 UTC

Update

Atlassian Cloud services are impacted and we are aware that our customers might not be able to create support tickets. Our teams are actively investigating the same.
We shall keep you informed of the progress every hour.

Posted Oct 20, 2025 - 08:24 UTC

Investigating

We have noticed that Atlassian Cloud services are impacted and our teams are actively investigating the same.
We shall keep you informed of the progress every hour.

Posted Oct 20, 2025 - 07:56 UTC

This incident affected: US (Incident Flow, Alert Flow, Email Notification Delivery, SMS Notification Delivery, Voice Notification Delivery, Mobile Notification Delivery, Heartbeat Monitoring, Incident REST API, Alert REST API, Heartbeat REST API, Incoming Email Service, Incoming Integration Flow, Outgoing Integration Flow, Signup, Login & Authorization, Opsgenie Actions, Web Application, Mobile Application, Configuration REST APIs, Reporting & Analytics, Pricing & Billing, Logs, Incoming Call Routing), EU (Incident Flow, Alert Flow, Email Notification Delivery, SMS Notification Delivery, Voice Notification Delivery, Mobile Notification Delivery, Heartbeat Monitoring, Incident REST API, Alert REST API, Heartbeat REST API, Incoming Email Service, Incoming Integration Flow, Outgoing Integration Flow, Signup, Login & Authorization, Opsgenie Actions, Web Application, Mobile Application, Configuration REST APIs, Reporting & Analytics, Pricing & Billing, Logs, Incoming Call Routing), and Public Website.