Service Disruption - Mercury
Incident Report for StarRez Cloud
Postmortem

StarRez Root Cause Analysis
A global event occurred where an update was distributed by our threat protection vendor which caused blue screen events on all Windows based hosts that received the update.

Root Cause
At 5:05AM UTC, 19th July 2024, the Mercury Cloud platform was impacted by an update that was distributed by our threat detection vendor. This resulted in a subset of infrastructure experiencing continual blue screen events/reboot loops causing either a complete failure or continual disruption for the underlying host.

Resolution
Multiple remediation practices took place to restore services after mitigation steps were provided from the vendor. A subset of systems recovered automatically once an update was pushed by the vendor, however any remaining hosts impacted required manual intervention to remove the relevant update file to allow the machine to boot successfully.
At 3:37PM UTC 19th July 2024, all Mercury services were back online

Next Steps
We will conduct post incident reviews to investigate if there are any process improvements we can make when vendors push updates that disrupt service.

Posted Jul 30, 2024 - 23:01 UTC

Resolved
Services have been restored to all Mercury customers.
Posted Jul 19, 2024 - 17:47 UTC
Update
All services have been restored to EMEA customers.

-Restoration efforts for the subset of customers down in North American.
-Engineers are actively working on this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.

Thank you for you patience
StarRez
Posted Jul 19, 2024 - 14:44 UTC
Update
Restoration efforts for the subset of customers down in North American and EMEA is still ongoing.

-Engineers are actively working on this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.

Thank you for you patience
StarRez
Posted Jul 19, 2024 - 14:21 UTC
Update
A subset of customers continue to remain down in both the EMEA and North America regions.

Further work is underway to bring these remaining customers back online

-Engineers are actively working on this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.

Thank you for you patience
StarRez
Posted Jul 19, 2024 - 12:52 UTC
Update
Services are starting to come online within the North America region.

-Engineers are actively working on this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.
Posted Jul 19, 2024 - 11:46 UTC
Update
Services are starting to come online within the EMEA region.

-Engineers are actively working on this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.
Posted Jul 19, 2024 - 11:12 UTC
Update
StarRez engineers are continuing to work on mitigations to bring customers back online.

Due to the situation this is proceeding slower than anticipated.

-Engineers are actively working on this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.
Posted Jul 19, 2024 - 10:05 UTC
Update
StarRez engineers are continuing to work on mitigations to bring stability and service back to the platform.

-Engineers are actively reviewing this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.
Posted Jul 19, 2024 - 08:33 UTC
Identified
Engineers continue to actively review how to stabilize the platform and prevent any further down events.

Efforts are being made to mitigate this issue.

-Next update expected within the next 1 hour, or as warranted by a change of events.
Posted Jul 19, 2024 - 07:28 UTC
Update
StarRez can confirm that the Mercury platform is currently impacted by a larger Global event relating to the usage of Crowdstrike.

Engineers are actively reviewing the situation in an effort to restore service.

-Engineers are actively reviewing this issue.
-Next update expected within the next 1 hour, or as warranted by a change of events.
Posted Jul 19, 2024 - 06:35 UTC
Investigating
Mercury Customer's within all regions are experiencing a service disruption accessing Mercury and related services.
-Engineers are actively reviewing this issue.
-Next update expected within the next 3 hours, or as warranted by a change of events.
Posted Jul 19, 2024 - 05:45 UTC
This incident affected: Mercury Cloud (North America, EMEA).