StarRez Root Cause Analysis
Australia East Outage - 10th Nov 2022
Summary
On the 10th Nov 2022, a subset of customers within the Australia East region experienced downtime to a number of core services.
Known bugs were triggered which impacted 2 backend nodes within our infrastructure. This resulted in applications losing network connectivity and being forced to restart.
Root Cause
The root cause was determined to be 2 underlying nodes that were experiencing known bugs that StarRez engineers are currently working with our upstream vendor to resolve.
This bug impacts network access within the application, preventing access to the backend database and other external services.
Resolution
All impacted application pods where moved to healthy infrastructure and the hosts causing the original outage where removed from service.
StarRez engineers will be moving customer application pods to an updated cluster which has several fixes applied and should prevent these issues happening into the future.