Outage affecting some customers in the US East
Incident Report for StarRez Cloud
Postmortem

US East Outage – 16th September 2023

Root Cause

At 07:24 UTC, a power disruption within our upstream vendors datacenter impacted underlying network and compute infrastructure which required manual intervention to mitigate. This impacted SQL database hosting for a subset of customers in this region.

Resolution

StarRez initiated DR as per standard process into our East US2 region with all customers impacted by the outage failed over and online 9hrs after the incident began.

Our upstream provider recovered all impacted services 14hrs later at 21:38 UTC

Once the vendor had brought all services back online within the region and StarRez was comfortable with stability, all databases were moved back into the US East region.

Additional Information

In follow-up to this incident, our upstream vendor has confirmed that an internal process and BIOS bug delayed and ultimately prevented recovery happening sooner. Fixes for both are being expedited with the aim of improving and minimizing the time to restore.

Posted Sep 27, 2023 - 00:04 UTC

Resolved
This incident has been resolved.

StarRez Engineers will be reviewing failback of customer databases over the coming days and will fail back when it is safe to do so.

A postmortem will be posted in due course once more information is known from our vendor.

Please reach out to the StarCare team if any further information is required.
Posted Sep 16, 2023 - 23:30 UTC
Monitoring
Our upstream vendor has confirmed again that the outage is still ongoing within this region.

All customers impacted by this event have now been failed over into the DR region and are back online.

We will now move into a monitoring state until the underlying outage has been closed out.

- Updates within the as warranted by change of events
Posted Sep 16, 2023 - 16:58 UTC
Update
The service wide outage is still ongoing within this region.

StarRez Engineers are actively in the process of executing our Disaster Recovery plan for all impacted customers.

Over 50% of impacted customers are now back online.

- Updates within the as warranted by change of events

Apologies for the inconvenience
StarRez
Posted Sep 16, 2023 - 15:58 UTC
Update
The service wide outage is still ongoing within this region.

StarRez Engineers are actively in the process of executing our Disaster Recovery plan for all impacted customers.

Customers are starting to come online as this process completes.

- Updates within the next hour or as warranted.

Apologies for the inconvenience
StarRez
Posted Sep 16, 2023 - 14:35 UTC
Identified
The issue has been identified to be a service wide outage in this region.

StarRez Engineers are actively in the process of initiating our Disaster Recovery plan for all impacted customers.

- Updates within the next hour or as warranted.

Apologies for the inconvenience
StarRez
Posted Sep 16, 2023 - 13:40 UTC
Update
StarRez engineers are still working on our upstream vendor to identify the source of the outage.

StarRez Engineers are actively in the process of initiating our Disaster Recovery plan for all impacted customers.

- Updates within the next hour or as warranted.

Apologies for the inconvenience
StarRez
Posted Sep 16, 2023 - 13:25 UTC
Update
StarRez engineers are still waiting on our upstream vendor to identify the source of the outage.

StarRez Engineers are also now in the process of initiating our Disaster Recovery plan for all impacted customers.

- Updates within the next hour or as warranted.

Apologies for the inconvenience
StarRez
Posted Sep 16, 2023 - 12:27 UTC
Update
StarRez engineers continue to work with our upstream vendor to identify the source of the outage. Confirming only a subset of customers within this region are currently impacted.

- Updates within the next hour or as warranted.

Apologies for the inconvenience
StarRez
Posted Sep 16, 2023 - 11:42 UTC
Update
We are continuing to investigate the issue with our hosting provider where some customer databases in the US East region have gone offline.
Posted Sep 16, 2023 - 09:45 UTC
Investigating
We are currently investigating an issue with our hosting provider where some customer databases in the US East region have gone offline.
More details to follow...
Posted Sep 16, 2023 - 08:30 UTC
This incident affected: StarRez Regions (East US).