Over the last month, we've witnessed two major outages from two very big players in the IT space which have affected millions of people and tens of thousands of businesses around the world. The amount of losses in time and business is unfathomable.
While many businesses were affected, there are still a large number who weren't. But even if your business wasn't affected, these events should have you sitting up and paying attention because while it's not a system or software you use today, it could be one you use tomorrow.
Today, we're going to go over the details of these outages and some of the lessons you can learn from them to help reduce the risk in your own organization(s).
CDK Global Outage
The Who: CDK Global is a software company involved in the automotive and heavy truck/equipment industry. One of their primary offerings is a popular CRM platform used by auto dealerships. The platform handles everything in a given dealership from processing vehicle sales to tracking tickets, parts ordering and tech jobs in the service department. This software is cloud-based and accessible in a variety of ways.
The What: On June 19th 2024, CDK experienced a ransomware attack which caused the cloud-side services of the platform to go offline. The end-result was that thousands of auto dealers across the US (and Canada) were unable to operate, bringing business to a standstill.
The Impact: Services were not fully restored until after July 4th, a span of over 2 weeks. Many dealerships scrambled to restore some sort of business capability in the interim, with stories of some dealerships resorting to digging up 30 year-old paper forms from the 90s to restore some semblance of business operations. The impact also affected things like accounting and payroll, with some employees having their own bottom-line hurt in a substantial way.
The Lesson: The cloud isn't perfect. The best way to surmise the cloud is "it's someone else's computer". Don't get me wrong, the cloud has a vast amount of benefits as we've detailed previously, but it is extremely important to understand that the cloud isn't impervious to outages or even ransomware attacks. A lot of small business owners love the cloud because, in their mind, they no longer have to worry about things like servers or backups or even security. The truth however is, you're simply delegating those responsibilities to another company, they don't just go away. As a result, review your company's software "stack", that is the software and services your company uses in day-to-day operations, determining what you are and are not comfortable with delegating to other people who aren't a direct part of your organization. Do this on a regular basis as your business grows and evolves to avoid leaving your business exposed.
CrowdStrike Windows BSOD Loop
The Who: CrowdStrike is a cybersecurity company which produces a number of security products, including Falcon Sensor, which is a vulnerability detection software installed on PCs and servers to detect potential vulnerabilities. This product has been used in both large enterprises and small businesses alike.
The What: On Friday (July 19th), CrowdStrike issued an update for the Falcon Sensor software which included a bugged .sys file. This file caused a system with the update applied, to reboot endlessly with a Blue Screen of Death (BSoD) while trying to start up. This took millions of devices, servers, desktops, laptops and more, offline and made them, effectively, unusable.
The Impact: As with the CDK outage, this affected tens of thousands of companies, including banks, airlines, manufacturers and healthcare providers, bringing operations to a complete halt. The initial solution to resolve the issue was to manually boot each system into safe mode, remove the bugged C-00000291*.sys file, and then reboot back to the normal Windows desktop. However, while this sounds relatively straight-forward, when spanned across dozens, hundreds or even thousands of devices within an org, it becomes a big problem. Fortunately, CrowdStrike has released a new option to fix systems more easily in bulk by contacting their support.
The Lesson: Even the software meant to protect you can be an issue at times. While it will likely be common to hear in the coming days and months to "always test your updates before rolling them out" (which is good advice!), the sheer number of updates for all of the various systems and softwares can make this anywhere from difficult to impossible for SMBs. In fact, the update issued to the CrowdStrike software was considered so trivial of an update, that it was pushed automatically to all endpoints and ignored the configurable update rules in-place. So in this instance, testing the update from the end-user standpoint was practically impossible. As a result, it is equally important to regularly work on an Operational Impact Assessment and Incident Response Plan, and include this type of scenario in them, to give your organization a plan of action and help reduce the damage such an incident may cause.
One more valuable lesson
Last but not least, businesses and organizations today rely on technology more than ever before, and it's a trend that's only increasing. Yet many business owners and c-suite tend to look at IT as something more akin to office supplies instead of a core component of their organization. The truth is, technology is as crucial to your business as a good legal firm, financier or accountant, and treating it that way will not only improve the bottom line and drive growth, but will also protect what is already in-place and can avoid costly, costly lessons.