DocEvent Status - Incident history

Issue with downloads for all services except for S3

2024-06-12T04:25:00.000+00:00

Jun 12, 04:25:00 GMT+0
Identified - An update was released to a canary deployment to initially us-east-1 region, then ap-southeast-2 and eu-west-1, this caused errors to occur occasionally for GET requests for all services except for S3. This deployment was later rolled out wider within all regions, as the deployment rolled out more errors were found with GET requests. This release has now been rolled back and has since been resolved. The root cause of the issue in the deployment has since been found. This incident has been resolved..

Jun 12, 14:00:00 GMT+0
Resolved - This incident has been resolved..

Issue in authentication which is affecting all regions

2024-05-10T14:25:20.103+00:00

May 10, 15:32:52 GMT+0
Resolved - This incident has been resolved..

May 10, 15:17:20 GMT+0
Monitoring - We implemented a fix and are currently monitoring the result..

May 10, 14:43:46 GMT+0
Identified - We have identified the issue and are upgrading some clusters to manage database load.

May 10, 14:50:30 GMT+0
Identified - We are continuing to work on a fix for this incident..

May 10, 14:25:20 GMT+0
Investigating - We are currently investigating this incident..

May 10, 15:12:32 GMT+0
Identified - The fix to upgrade the deployment is currently in progress..

us-east-1 AWS outage affecting services

2023-06-13T20:00:00.000+00:00

Jun 13, 20:00:00 GMT+0
Investigating - us-east-1 console and verification service for self-hosted instances is currently unavailable. All cloud ftp and sftp services continue to be available. AWS is having an outage in us-east-1 region for AWS Lambda and API Gateway which these services rely on. https://health.aws.amazon.com/health/status.

Jun 13, 20:45:41 GMT+0
Resolved - AWS has implemented a fix, and we see services reconnecting to our APIs now. This incident has been resolved..

Elastic search cluster issue

2023-06-09T23:00:00.000+00:00

Jun 9, 23:00:00 GMT+0
Investigating - We are currently investigating this incident..

Jun 9, 23:23:30 GMT+0
Investigating - Our provider elastic.co is currently having an outage and it is affecting our instances from authenticating users to login to our servers. https://status.elastic.co/incidents/07bw653d2677?u=kpnld432cry6.

Jun 9, 23:43:40 GMT+0
Identified - A new update from elastic.co We have confirmed a proxy outage for us-east-1 that impacts all communication to Elastic Cloud. A resolution is being worked on. Another update will be provided in 30m or sooner.

Jun 10, 00:15:10 GMT+0
Identified - New update from elastic: The issue has been identified and a fix applied to a subset of proxies. Full rollout to all proxies is in progress. .

Jun 10, 00:28:33 GMT+0
Monitoring - elastic.co have updated their us-east-1 proxies and our monitoring is returning successful results for connectivity and access..

Jun 10, 00:32:26 GMT+0
Resolved - This incident has been resolved..

Instance failures in eu-west-1

2023-05-16T16:34:00.000+00:00

May 16, 16:34:00 GMT+0
Investigating - We are currently investigating this incident..

May 16, 16:49:00 GMT+0
Monitoring - We have updated instance capacity and are monitoring..

May 16, 16:59:00 GMT+0
Resolved - Our monitoring has verified everything has returned to normal. The incident is now resolved..

Degredations due to DOS

2022-05-17T08:27:00.000+00:00

May 17, 08:27:00 GMT+0
Investigating - We are currently investigating this incident. We have put mitigations in place, and our monitoring has shown degraded service during this period. This means some data IPs were unavailable. This only affects sftp connections not going through our load balancer / static IPs..

May 17, 08:45:00 GMT+0
Monitoring - We have been blocking traffic, rerouting and monitoring..

May 17, 08:50:00 GMT+0
Monitoring - Note, this affects only eu-west-1, not us-east-1 as previously tagged..

May 17, 12:27:45 GMT+0
Resolved - We're marking this as resolved, and will continue monitoring..

SFTP temporary connection failures

2022-01-03T21:50:08.233+00:00

Jan 3, 21:50:08 GMT+0
Resolved - We resolved an issue where connectivity for sftp and scp connections were receiving errors. This was happening temporarily on certain instances. These instances have been ejected and are monitoring is continuing..

AWS us-east-1 API outage

2021-12-07T21:00:00.000+00:00

Dec 7, 21:00:00 GMT+0
Investigating - Our login console and public API are experiencing failures. This is due to an AWS outage in us-east-1 region that affects our API Gateway services in this region..

Dec 7, 23:00:00 GMT+0
Resolved - Update provided by AWS: [9:37 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. We have identified the root cause and are actively working towards recovery. [10:12 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. We have identified root cause of the issue causing service API and console issues in the US-EAST-1 Region, and are starting to see some signs of recovery. We do not have an ETA for full recovery at this time. [11:26 AM PST] We are seeing impact to multiple AWS APIs in the US-EAST-1 Region. This issue is also affecting some of our monitoring and incident response tooling, which is delaying our ability to provide updates. Services impacted include: EC2, Connect, DynamoDB, Glue, Athena, Timestream, and Chime and other AWS Services in US-EAST-1. The root cause of this issue is an impairment of several network devices in the US-EAST-1 Region. We are pursuing multiple mitigation paths in parallel, and have seen some signs of recovery, but we do not have an ETA for full recovery at this time. Root logins for consoles in all AWS regions are affected by this issue, however customers can login to consoles other than US-EAST-1 by using an IAM role for authentication. [12:34 PM PST] We continue to experience increased API error rates for multiple AWS Services in the US-EAST-1 Region. The root cause of this issue is an impairment of several network devices. We continue to work toward mitigation, and are actively working on a number of different mitigation and resolution actions. While we have observed some early signs of recovery, we do not have an ETA for full recovery. For customers experiencing issues signing-in to the AWS Management Console in US-EAST-1, we recommend retrying using a separate Management Console endpoint (such as https://us-west-2.console.aws.amazon.com/). Additionally, if you are attempting to login using root login credentials you may be unable to do so, even via console endpoints not in US-EAST-1. If you are impacted by this, we recommend using IAM Users or Roles for authentication. We will continue to provide updates here as we have more information to share. [2:04 PM PST] We have executed a mitigation which is showing significant recovery in the US-EAST-1 Region. We are continuing to closely monitor the health of the network devices and we expect to continue to make progress towards full recovery. We still do not have an ETA for full recovery at this time. [2:43 PM PST] We have mitigated the underlying issue that caused some network devices in the US-EAST-1 Region to be impaired. We are seeing improvement in availability across most AWS services. All services are now independently working through service-by-service recovery. We continue to work toward full recovery for all impacted AWS Services and API operations. In order to expedite overall recovery, we have temporarily disabled Event Deliveries for Amazon EventBridge in the US-EAST-1 Region. These events will still be received & accepted, and queued for later delivery. [3:03 PM PST] Many services have already recovered, however we are working towards full recovery across services. Services like SSO, Connect, API Gateway, ECS/Fargate, and EventBridge are still experiencing impact. Engineers are actively working on resolving impact to these services. [4:35 PM PST] With the network device issues resolved, we are now working towards recovery of any impaired services. We will provide additional updates for impaired services within the appropriate entry in the Service Health Dashboard..

Issues connecting, auto disconnecting (us-east-1, ap-southeast-2)

2021-08-19T02:39:22.200+00:00

Aug 19, 02:39:22 GMT+0
Investigating - We are currently investigating this incident..

Aug 19, 02:43:40 GMT+0
Identified - It appears as if a recent patch caused us-east-1 and ap-southeast-2 regions to randomly disconnect users. We have identified the issue and are rolling out a fix. us-east-1 regions now resolved.

Aug 19, 02:50:39 GMT+0
Monitoring - A fix has been implemented for ap-southeast-2 and released.

Aug 19, 02:52:30 GMT+0
Resolved - We just resolved the issue!.