Australia East Outage – 30th Aug – 1st Sept 2023
Root Cause
A cooling failure within our upstream vendors datacenter forced a shutdown of storage and compute resources within this zone to protect customer data. This impacted a subset of services for customers in this region, primarily backend storage and SQL database hosting.
Resolution
Over half of customer services impacted were back online within 8hrs of the initial down event. The remaining customer base were impacted for an additional 36-48hrs before being brought back online due to issues with database connectivity and data integrity.
Once the upstream vendor had brought all services back online within the region and StarRez was comfortable with stability, any failed over services were moved back into the Australia East region.
Additional Information
StarRez initiated DR 4hrs into the outage as per standard process however issues relating to data integrity occurred which required further engagement with the vendor and customers. This exacerbated the outage and prevented StarRez from finalizing failover into our DR region in a timely manner with data integrity intact.
In follow-up to this incident StarRez continue to be engaged with our upstream vendor to obtain full root cause. Further review of redundancy will also take place to confirm if improvements can be made.