Service Disruption - Core Services - Australia East
Incident Report for StarRez Cloud
Postmortem

Australia East Outage – 30th Aug – 1st Sept 2023

Root Cause
A cooling failure within our upstream vendors datacenter forced a shutdown of storage and compute resources within this zone to protect customer data. This impacted a subset of services for customers in this region, primarily backend storage and SQL database hosting.

Resolution
Over half of customer services impacted were back online within 8hrs of the initial down event. The remaining customer base were impacted for an additional 36-48hrs before being brought back online due to issues with database connectivity and data integrity.
Once the upstream vendor had brought all services back online within the region and StarRez was comfortable with stability, any failed over services were moved back into the Australia East region.

Additional Information
StarRez initiated DR 4hrs into the outage as per standard process however issues relating to data integrity occurred which required further engagement with the vendor and customers. This exacerbated the outage and prevented StarRez from finalizing failover into our DR region in a timely manner with data integrity intact.
In follow-up to this incident StarRez continue to be engaged with our upstream vendor to obtain full root cause. Further review of redundancy will also take place to confirm if improvements can be made.

Posted Sep 10, 2023 - 22:53 UTC

Resolved
This incident has been resolved.
Posted Sep 01, 2023 - 04:02 UTC
Monitoring
All customers are now back online.

Engineers continue to monitor the situation closely for stability before closing the incident.

Apologies for any inconvenience,
StarRez Team
Posted Aug 31, 2023 - 23:39 UTC
Update
99% of customers have been restored and are operational. Engineers are working with our upstream provider on the remaining affected customers.
Posted Aug 31, 2023 - 20:41 UTC
Update
We are continuing to work on a fix for this issue.
Posted Aug 31, 2023 - 08:08 UTC
Update
There continues to be a subset of customers that are impacted by this outage.
Engineers are closely monitoring updates from our vendor on when these final customers will be restored.

-Next update as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Aug 31, 2023 - 06:34 UTC
Update
There continues to be a subset of customers that are impacted by this outage.
Engineers are working closely with our vendor to resolve the issue.

-Next update as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Aug 31, 2023 - 00:00 UTC
Update
Most customers have recovered, we have a subset of customers that are still experiencing issues and engineers are reviewing further options to restore them.
Posted Aug 30, 2023 - 22:02 UTC
Update
We are continuing to review customers that are still affected and reviewing all mitigation options available
Posted Aug 30, 2023 - 19:10 UTC
Update
We are starting to see services slowly recover and will continue to monitor the situation
Posted Aug 30, 2023 - 17:52 UTC
Update
Customers in the Australia East region are still experiencing a service disruption with core services.
-Engineers have are currently preparing to migrate these customer's to the Australia South East region.
Posted Aug 30, 2023 - 16:25 UTC
Update
A subset of customers in the Australia East region are still experiencing a service disruption with core services.
-Engineers are actively working to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.
Posted Aug 30, 2023 - 14:33 UTC
Update
Another batch of customers have come back online as connectivity is slowly restored.
Engineers continue to monitor the situation closely for the remaining sites that are impacted.

-Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Aug 30, 2023 - 12:57 UTC
Update
Our vendor has confirmed a network related outage and are working on resolution.

A very small subset of customers are back online however we are still experiencing and outage for the remainder.

-Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Aug 30, 2023 - 11:57 UTC
Identified
We have identified a major network outage occurring within the region.

This case is being investigated and escalated to our backend provider for further review.

-Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Aug 30, 2023 - 11:33 UTC
Investigating
A subset of customers in the Australia East region are experiencing a service disruption with core services.
-Engineers are actively working to remediate the issue.
-Next update expected within 60 minutes, or as warranted by a change of events.

Apologies for any inconvenience,
StarRez Team
Posted Aug 30, 2023 - 11:07 UTC
This incident affected: StarRez Regions (Australia East).