StarRez Root Cause Analysis
Australia East Outage - 7th Nov 2022
Summary
On the 7th Nov 2022, a subset of customers within the Australia East region experienced downtime to a number of core services (StarRez Web, PortalX and Schedule Service)
This lead to a downtime event of ~30min for customer applications whilst erroneous infrastructure was removed from service.
Root Cause
The root cause was determined to be an underlying node that was experiencing 2 known bugs that StarRez engineers are currently working with our upstream vendor to resolve.
These bugs impact DNS resolution and network access within the application, preventing access to the backend database.
Resolution
The problematic node was removed from service and customer applications move to healthy infrastructure within the same cluster.
StarRez engineers will continue to work with our upstream vendor to resolve this ongoing bug within the platform.