On July 19, 2022, between 05:40 and 07:10 UTC, Atlassian customers in the EU region using Jira, Confluence and Opsgenie experienced problems loading pages through the web UI. The incident was automatically detected at 05.14 by one of Atlassian’s automated monitoring systems. The main disruption was resolved within 16 minutes with the full recovery taking additional 74 minutes.
Between July 19, 2022, 05:40 UTC and July 19, 2022, 07:10 UTC Jira, Confluence and OpsGenie users saw some web pages fail to load. During the 16 minute period from 06:40 UTC to 6:56 UTC, customers were unable to access Jira Confluence and OpsGenie web UI because the Atlassian Proxy (the ingress point for service requests) was unable to service most requests.
The issue was caused by an AWS initiated change that impacted Elastic Block Store (EBS) volume performance to such an extent that new instance creation and therefore auto scaling, was blocked. As a result, the products above, as well as essential internal Atlassian services could not auto scale to the increasing incoming service requests as the EU region came online. Once the AWS change had been rolled back, most Atlassian services recovered. Some internal services required manual scaling as a result of unhealthy nodes preventing scaling initiation, which prolonged complete recovery.
We know that outages impact your productivity and we apologize to customers whose services were impacted during this incident. We see two main avenues to increase our resiliency during an incident where AWS auto scaling is blocked:
We are taking these immediate steps to improve the platform’s resiliency.