Microsoft outage highlights 'extreme' difficulty of identifying IT risks: Richard Hilton
We have just witnessed this in the perfect storm of two unrelated internet infrastructure issues that collided on Friday, July 19, 2024. The outcome was mass disruptions around the world to public and private entities alike creating huge customer service confusion, revenue loss, and uncapped reputational damage.
On Thursday night Microsoft’s Azure US central region experienced a widespread outage, which was then followed on Friday Morning by a CrowdStrike configuration file update to the Falcon systems driver that caused the dreaded Blue Screen of Death (BSoD) to more than 8.5 Million devices (about one per cent of Microsoft’s global total).
Advertisement
Hide AdAdvertisement
Hide AdThe fix? Either manually deleting the offending file on every effected device or carrying out full system restorations from backups taken before 4:09am UTC. One can only imagine the scale of time and resource impact to businesses and who picks up the tab?
This does bring into question several concerns for UK businesses that perhaps up until now many leaders may have not realised. The last decade has seen the shift to single cloud vendors like Microsoft over multi-vendor on-prem strategies which raises systemic failure risks, as reliance on one provider means any outage can disrupt all applications and data hosted on that cloud.
It also highlights the business challenge of it being extremely difficult to identify risks in this interconnecting world we all now live and who has ultimate liability. Public Cloud services are not sufficient if you have critical systems (above 3 nines of availability), and there is no redress, accountability or liability on them for any of your business loss and damage
Perhaps many businesses don’t realise that within the Microsoft Service Agreement Clause 6B it states ‘Microsoft is not liable for any disruption or loss you may suffer as a result. In the event of an outage, you may not be able to retrieve your content or data that you have stored’ with them stating ‘We recommend that you regularly backup your content and data that you store on the services store using third party apps and services’.
Advertisement
Hide AdAdvertisement
Hide AdOften in the tech world, systems run in the background with very little questioning as to their purpose and function, perhaps now is a time for reflection to consider the risk appetite of your business, to ask the right questions of your ‘IT team’ and to know what mitigation is in place or not.
You need to understand the risks that exist within your business when all your intellectual property, systems and data are handled by one provider. As business leaders you need to identify the critical business systems that need to keep working when all around them fails - Public Cloud infrastructures offer limited options. This incident also highlighted to bad actors/hackers how to easily infiltrate a downed business by setting up malicious websites that appeared to offer software updates.
While it would be difficult to summarise the true impact to all businesses from this outage and vulnerability update, we can help you understand the business risks and look to put mitigation in place to ensure essential future-proofing for your business.
Richard Hilton – Private Sector CSO at Claritas Solutions Ltd
Comment Guidelines
National World encourages reader discussion on our stories. User feedback, insights and back-and-forth exchanges add a rich layer of context to reporting. Please review our Community Guidelines before commenting.