StarRez Cloud Status - Service Disruption - Core Services

Service Disruption - Core Services - Asia Southeast

Incident Report for StarRez Cloud

Postmortem

Southeast Asia Outage – 7th February 2023

A cooling failure within our upstream vendors datacenter brough down a subset of services; storage accounts and SQL backend. This required StarRez to implement DR processes to bring customer sites back online.

Root Cause

A cooling failure within our upstream vendors datacenter forced a shutdown of all storage and compute resources within this zone to protect our data. This impacted a subset of services for customers in this region, primarily storage accounts and SQL backends.

Resolution

A subset of services were re-provisioned within a functional zone within the region to bring a subset of customers back online. All remaining impacted services were failed over as part of the StarRez DR process to the Asia East region.

Once the upstream vendor had brought all services back online within the region and StarRez was comfortable with stability, all services were moved back into the Southeast region.

Additional Information

In follow-up to this incident, StarRez have reviewed and updated our DR process in response to this incident to ensure quicker recovery should similar incidents occur in the future.

Posted Mar 03, 2023 - 02:08 UTC

Resolved

The incident within this region has been resolved.

StarRez will work continue to monitor for stability to determine when it is safe to failback resources into the region.

Posted Feb 10, 2023 - 02:20 UTC

Monitoring

All customers are now back online after successfully failing over core resources.

Engineers will continue to monitor this situation closely before closing out.

- Next update as warranted by a change of events.

Posted Feb 08, 2023 - 13:33 UTC

Update

All production sites are now back online. The remaining Development sites are being worked on.
There continues to be no ETA for restoration of services within the Southeast Asia region. from our vendor.

-Engineers are actively working to remediate the issue.
-Next update as warranted by a change of events.

Posted Feb 08, 2023 - 13:02 UTC

Update

There continues to be no ETA for restoration of services from our vendor.

Engineers are currently failing over these remaining customers to a functional region to bring services back online.

- Engineers are actively working to remediate the issue.
- Next update expected within 60 minutes, or as warranted by a change of event

Posted Feb 08, 2023 - 11:09 UTC

Update

There continues to be no ETA for restoration of services from our vendor.

The customer dev/test environments impacted by this will continue to sustain an outage at this time. Engineers are actively reviewing if the DR process should be engaged on these sites should the outage remain ongoing.

- Engineers are actively working to remediate the issue.
- Next update expected within 60 minutes, or as warranted by a change of event

Posted Feb 08, 2023 - 06:05 UTC

Update

There continues to be no ETA for restoration of services. Our vendor has advised that restoration works are still underway within the impacted region

-Engineers are actively working to remediate the issue.
-Next update as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team

Posted Feb 08, 2023 - 05:07 UTC

Update

The workaround has helped to bring a subset of sites back online. Work continues with the remaining sites to restore service.

There continues to be no ETA for restoration of services within this region from our vendor.

-Engineers are actively working to remediate the issue.
-Next update as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team

Posted Feb 08, 2023 - 01:32 UTC

Update

A workaround is current being investigated within the hope to restore services.

- Engineers are actively working to remediate the issue.
- Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team

Posted Feb 07, 2023 - 23:25 UTC

Update

Confirmed this is a Microsoft outage in the datacenter in this region. Will provide updates as they are provided to us

Posted Feb 07, 2023 - 22:36 UTC

Identified

The upstream provider has identified they are having issues in this region

-Engineers are actively working to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team

Posted Feb 07, 2023 - 20:49 UTC

Investigating

Customers in Asia Southeast are experiencing a service disruption with some Core Services
-Engineers are actively working to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.

Posted Feb 07, 2023 - 20:47 UTC

This incident affected: StarRez Core Functionality (Email Sending).