As a 20-year veteran in the technology industry, I have seen a considerable number of technical incidents.

When I heard that CrowdStrike was linked to the recent Windows crash, I immediately thought of my experience developing BlackBerry's Windows drivers 15 years ago. At that time, I dealt with countless blue screens and crashes, so I have a good sense of what went wrong.

The issue from last Friday is complex and may need clarification even for those familiar with software development but unfamiliar with Windows drivers or kernel space.

The purpose of this article is to simplify the situation, aiming to make it understandable even for those without a technical background or even grandparents.

Assuming your Windows machine is your home

User Space

To better understand, let's assume your Windows computer is your home. Imagine the rooms in your house, such as the kitchen, living room, and bedroom. These are places where you perform daily activities, cook meals, watch TV, and relax. They are visible, accessible areas that directly interact with the functions of the house.

We will call this part of your home user space. This is also the space of the computer that you see and interact with. There are things displayed on the screen, and you point to them using the mouse. This is also referred to as the user space of Windows.

Kernel Space

Your home also has a hidden world. The behind-the-scenes systems include the foundation, plumbing, electricity, and heating. You do not see these systems while going about your daily activities, but they are essential to ensure everything functions properly. They handle critical functions such as the supply of water and electricity and the maintenance of housing operations.

This is the internal infrastructure of your home. In a computer, this area is called kernel space.

CrowdStrike is your new smart thermostat

Consider adding new features to your home, such as installing a smart thermostat. This latest addition was not part of the original house, but to function correctly, it needs to connect and cooperate with the existing heating and cooling systems (internal infrastructure).

It monitors the temperature of the house and works effectively in cooperation with the existing infrastructure. Since the user, the homeowner, or the original builder of the house did not build this component (the thermostat), it is referred to as a third-party application.

Similarly, CrowdStrike has built an application called Falcon that runs on Windows machines and monitors security. To do this properly, it needs to connect to the computer's kernel space/internal infrastructure.

Since Falcon is not created by Microsoft or the user, it is called a third-party application.

Authentication Process

When companies like Google build home thermostats (NEST), they need to prove that the product works with the customer's heating and cooling systems before selling it.

Now, let's say you buy a Nest thermostat, bring it home, and try to install it on your heating and cooling system. The Nest now starts monitoring the temperature of your home and communicates to the heating and cooling system when changes occur.

Similarly, you can purchase Falcon from CrowdStrike and install it on your computer. This allows you to monitor the security of your computer and do things to protect the computer's internal infrastructure that resides in the kernel space.

Applications that access the kernel space on a Windows machine must be certified by Windows before running on the machine. If they are not certified, they cannot run. However, sometimes significant security risks are discovered quickly, necessitating updates to Falcon.

Companies may split applications into two parts: one that operates in kernel space and another that functions only in user space. The parts in kernel space require certification, while the parts in user space do not.

Therefore, if there is a new attack, information can be quickly sent to user space and tell the application's kernel space part to look there. Then, the kernel space can use that information to protect against the attack. Also, since there was no need to update the kernel space, it did not need to be certified.

However, and here is the big one. They must be careful. Updating user space information can accidentally confuse the application in kernel space, causing things to go sideways. Therefore, if you are doing this, you need to test it properly.

Do you know where this is going? …..

How did everything go wrong?

With the new Nest Smart thermostat, you can set the temperature from your phone and directly control your home's heating and cooling system from the wall and smartphone panel.

Next, let's say Google releases a software update for Nest. This update improves the mobile app you use to set the temperature but does not include changes to the thermostat part that directly controls the heating and cooling systems. Remember that the infrastructure side needs to be certified, so they do not want to change it frequently or make updates faster. Only the mobile app may be built in a way that requires changes, but the infrastructure side should work.

This is where things go wrong. The updated app introduces new features or settings that should work with the thermostat. However, the core control system of the thermostat that manages heating and cooling did not handle the changes as expected. The software developers forgot to test this change with the existing infrastructure and could not realize that changes were also needed for it to function correctly.

As a result, when you set the temperature using the new app, the updated data is passed to the thermostat's control system. However, since the core system was not updated, it could not interpret or process this new data correctly. This mismatch causes the thermostat to send incorrect signals to the heating and cooling systems. Consequently, the heating may turn on when it is already warm, or the cooling system may not turn on when it gets hot, making the home uncomfortable.

In this case, the only way to fix the system is to remove it from the wall, connect the display to a computer, update the software, and then reconnect it to the wall.

Falcon is like a smart thermostat in the context of a Windows machine. There are components or "drivers" that operate in kernel space (like the core control system of the thermostat) and another component in user space (like the mobile app you use). If CrowdStrike updates only the user space component without updating the kernel space driver, the system may experience issues.

When the Falcon kernel space driver pulls data from the updated user space component, incompatibility can cause significant failures, potentially leading to a blue screen of death (BSOD). Fixing this involves manually rebooting the computer into safe mode (like manually fixing the thermostat), removing the problematic driver, and successfully rebooting the computer.

This is exactly what happened on Friday, July 19. An update to CrowdStrike's Falcon application caused a significant failure in kernel space, resulting in blue screens on all Windows machines where it was installed.

Imagine breaking the temperature control of every home using Nest!! Like Nest, Falcon did not break the temperature of every home. Falcon only broke the Windows computers it was running on, not a Windows crash, but a CrowdStrike crash. Falcon was also running on Mac and Linux machines, but this part of the code functioned differently and was not affected.

In summary, updating a smart thermostat app without updating the core control system can lead to issues with temperature control in the home. Incomplete updates to kernel space drivers can cause serious problems for Windows machines.

This example illustrates why it is essential to maintain all parts of a system, whether it be a thermostat or a computer, to ensure smooth operation.

Need more details?

If you want to delve deeper into this issue, I highly recommend this video.

Best of luck to the IT teams working to resolve this issue and to the developers at CrowdStrike who made mistakes. We are all human, and we all make mistakes. Hopefully, not only CrowdStrike but many others will learn some valuable lessons from this unfortunate event.

Users who liked