Service Disruption - Central US
Incident Report for StarRez Cloud
Postmortem

StarRez Root Cause Analysis
StarRez's upstream vendor experienced issues within storage and compute infrastructure, resulting in backend database and compute services becoming inaccessible.

Root Cause
At 10:36PM UTC, 18th July 2024, our upstream vendor experience issues within storage clusters and compute resources in the Central US region. This resulted in connectivity to a subset of backend database and compute infrastructure being lost.

Resolution At 11:25PM UTC, 18th July 2024, a subset of customer services were restored as load was moved to functional compute infrastructure by StarRez engineers. The remaining customer base remained impacted due to connectivity issues with the backend database infrastructure. At 2:40AM UTC, 19th July 2024, backend database connectivity was restored by our vendor and all remaining customer services that were impacted came online.

Next Steps
A review of redundancy in the region will occur to determine if any adjustments can be made to improve resilience.

Posted Jul 30, 2024 - 01:32 UTC

Resolved
This incident has been resolved.
Posted Jul 19, 2024 - 13:15 UTC
Monitoring
All customer sites are back online.

No disaster recovery was initiated.

We will continue monitoring the situation
Posted Jul 19, 2024 - 02:45 UTC
Update
We're seeing instability again as the backend vendor is applying mitigations in the region.
Disaster Recovery is underway for the customers impacted.

-Next update expected within 60 minutes, or as warranted by a change of events.
Posted Jul 19, 2024 - 02:01 UTC
Identified
Disaster recovery is being initiated for the remaining customers offline.
-Next update expected within 60 minutes, or as warranted by a change of events.
Posted Jul 19, 2024 - 00:38 UTC
Update
We've been able to mitigate a subset of customers which are now back online.
-We are currently verifying stability for these customers.
-All remaining customers StarRez Engineers are actively working with our backend provider to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.
Posted Jul 18, 2024 - 23:42 UTC
Update
We have confirmed the Service Disruption in Central US is region wide and at a network layer.
-Engineers are actively working with our upstream provider to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.
Posted Jul 18, 2024 - 23:00 UTC
Investigating
Customers in the Central US are currently experiencing a service disruption
-Engineers are actively working to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Jul 18, 2024 - 22:48 UTC
This incident affected: StarRez Regions (Central US).