7/24/2024 3:28 AM
On Thursday July 18th, Learn365 availability was impacted by a Microsoft Azure outage. The outage was exclusive to the Central US region. When we initially updated our Zensai Status Page, we reported the outage impacted the North Europe region as well. However, this was a clerical error on our part. We apologize for any confusion this may have caused.
For specifics on the technical background of the outage's root cause, please refer to the direct quotes from the Microsoft team below.
”We have determined that a backend cluster management workflow deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region. This resulted in the compute resources automatically restarting when connectivity was lost to virtual disks. We have confirmed mitigation for Azure Storage and Compute resources. We are continuing to investigate and mitigate a number of downstream impacted services that have not returned to a fully healthy state."
”Services which were impacted by this outage recovered progressively and engineers from the respective teams intervened where further manual recovery was needed. Following an extended monitoring period, we determined that impacted services had returned to their expected availability levels. "
As per Microsoft:
"Services which were impacted by this outage recovered progressively and engineers from the respective teams intervened where further manual recovery was needed. Following an extended monitoring period, we determined that impacted services had returned to their expected availability levels. "
We’re happy to report the issue has now been resolved. Thank you for your patience and please let us know if you see any further issues.
As per Microsoft:
"We have determined that a backend cluster management workflow deployed a configuration change causing backend access to be blocked between a subset of Azure Storage clusters and compute resources in the Central US region. This resulted in the compute resources automatically restarting when connectivity was lost to virtual disks. We have confirmed mitigation for Azure Storage and Compute resources. We are continuing to investigate and mitigate a number of downstream impacted services that have not returned to a fully healthy state."
We can confirm that customers from the Central US database region can access Learn365 now, as the first successful requests were recorded by our monitoring system at 04:19 AM UTC (12:19 AM EST). We continue to monitor the situation until it is fully resolved.
Microsoft has identified the underlying issues with Azure services and is currently working to deploy a fix.
We anticipate Learn365 services will be restored once Azure services are fully operational.
We encourage those impacted to follow along with the status updates posted below.
https://azure.status.microsoft/en-us/status
We are continuing to investigate this issue.
7/18/2024 11:19 PMWe are continuing to investigate this issue.
7/18/2024 11:13 PMWe are aware of an issue impacting user access to Learn365. We believe this is related to a Microsoft Outage. We will continue to investigate and triage.
https://azure.status.microsoft/en-us/status
For current system status information about LMS365, check out our system status page. During an incident, you can also receive status updates by subscribing to updates available on our status page. The summary of our post-mortem investigation is usually posted here a few days after the incident has ended. If you have additional questions about this incident, please log a ticket with us.
Comments
Please sign in to leave a comment.