DerScanner > Blog > Static Analysis: How to Do It Right?
Today, vulnerability detection via static analysis is more and more talked of as a mandatory stage in the development process. However, static analysis problems are in the spotlight as well. If you tried any serious tool, you could be scared off by long reports with confusing recommendations, tool configuration difficulties, and false positives. So, is static analysis even needed?
From our experience, the answer is “yes”. Many problems that arise when using a tool for the first time are quite solvable. Today, we’ll talk about what a user can do and what an analyzer should offer in order to be useful, instead of just being “one more unnecessary tool security officers require”.
So, we’ve already discussed the theoretical limitations of static analysis. For example, a deep static analysis attempts to address exponentially complex problems. Therefore, each tool seeks a tradeoff between analysis time, resource consumption, number of detected vulnerabilities, and the false positive rate.
Why do we need deep analysis at all? Any IDE very quickly finds errors, sometimes even security-related ones, so what exponentially complex problems are we talking about? A classic example is an SQL injection (and any other injection, such as XSS, RCE and the like), which passes through several functions (i.e. reading user-provided data and query execution occur in different functions). Searching for this requires intraprocedural dataflow analysis, which is an exponentially complex problem. Surely, analysis cannot be deemed deep without searching for such vulnerabilities. For the same reason, we must analyze a code in its entirety, and not in parts; otherwise, intraprocedural vulnerabilities may be missed.
In recent years, I have talked a lot with (potential) customers of various static analyzers. In particular, we discussed what they didn’t like about the tools after first use (pilot) results. In one way or another, most complaints result from the theoretical limitations of the technology. In addition, the tools may simply lack the functionality a user needs. However, in my opinion, analyzers can become (and are becoming) more user-friendly when it comes to solving the problems specified below. Furthermore, you also surely need to be able to use analyzers properly to neutralize their shortcomings, which turns out to be not so difficult at all. Okay, let’s get started then.
Just imagine this possible scenario: you have decided to try the technology or are selecting a static analyzer during a pilot project. No doubt, you do not trust vendor-provided test examples and want to try analyzing your own code (and even find and fix real vulnerabilities). You are provided with an installer or a ready-to-use virtual machine with a system for a short term.
First, you need to run the analysis. When starting a tool, you expect something like a single-click operation: just upload an archive with a source code and click “analyze”. Nothing of the kind! You get several forms with different fields you must complete somehow. You must specify programming languages, some analyzer settings, select vulnerability packages (what the hell am I talking about?), and so on. Finally, you complete this challenge, and the analysis begins. Ah, no – a scan error is returned. “Format does not meet requirements”, “Code build is required for this language”, “Files for scanning are not found” ... If you did not write the code by yourself, you will have to go hat in hand to developers.
A developer transfers a source code for testing.
Special attention should be paid to code build requirements. For some languages (JVM languages: Java, Scala, Kotlin and the like, C/C ++, Objective-C, C#), most analyzers require to build a code during analysis. You understand how painful it is: to reconstruct a large project environment for building on a new machine. On the other hand, these requirements are quite reasonable since they arise from analysis technology and language specifics.
How do analyzers solve these problems? First, the start of analysis should be automated to the maximum possible extent. Ideally, it should be enough to upload a file in any format, and the analyzer should “understand” what languages are used, how to perform a build and how to set other default settings to gain results that are as complete as possible. Clearly, it’s impossible to foresee everything; however, you can try to handle most cases.
Build requirements should be made as soft as possible. For example, for JVM languages, you do not need to require a build during analysis – just upload artifacts, i.e. a built code along with sources (which is much simpler). For Xcode, in case of Objective-C, the build can be automated for most cases. If code building has failed, an analyzer may try to perform a partial analysis. While results will be incomplete, this is better than nothing. It is also convenient when an analysis module is installed on a developer’s machine with a configured code building process, while the architecture should allow for other modules and interfaces to be moved to another machine.
Finally, an analyzer should specify most soft format requirements and deal with input files itself. It’s great if an analyzer supports archives with source code, nested archives, archives from a repository, links to a repository, archives from the production environment, and executable files from the production environment.
Keep in mind, however, that an analyzer has no artificial intelligence and cannot foresee everything. Therefore, if errors occur, read a manual, which often contains a lot of useful information about preparing a code for analysis. Well, don’t be afraid as all these preparations are made only once for each code base during analyzer implementation. Most often, the analyzer is integrated into a CI cycle to avoid build problems.
Well, we’ve launched scanning. One hour has passed – no results. A progress bar is frozen somewhere in the middle, with no percentage and ETA displayed. Another hour has passed – the progress bar has reached 99% and been hanging there for half an hour. The third hour has passed – the analyzer crashes with an out-of-memory error or hangs for one more hour and finally returns results. Anyone expecting analysis to last as long as it takes to pop out for a coffee will be greatly disappointed.
Yes, a good static analyzer may consume a lot of resources because, as I’ve said, finding advanced vulnerabilities is an exponentially complex task. So, the more resources and time available, the better the results will be (with a good engine, of course). It is really difficult to predict the time and resources required since static analysis algorithm completion time heavily depends on things like language structures, code complexity, and call nesting (i.e. those factors that are hard to calculate beforehand).
High resource consumption is an inevitable evil. Allocate resources carefully, wait for scanning completion patiently, and keep in mind that no one can accurately predict analyzer resource consumption even for a given code base – so get ready for parameter changes. Moreover, the parameters may change even without updating the code base, just because of an analyzer update.
Nevertheless, the analyzer can help a little with this problem too, since it can distribute a resource-intensive part (engines) and interface across different machines. This way, you will avoid overloading machines with unnecessary programs slowing them down, while still being able to use the system interface, irrespective of scanning load (for example, to view and edit results). This will also simplify scaling without the need to reinstall the entire system: just deploy an analyzer on a new VM, specify the main machine IP address – and that’s it!
In addition, the analyzer may allow you to select analysis depth, disable “heavy” checks, and use incremental analysis (to check changed code only). Be very careful when using these features, as they can greatly affect scanning results. If you use such functionality, we strongly recommend conducting a full analysis from time to time.
Well, finally, we are ready to talk about scanning results. You have eagerly awaited the results and are greatly surprised with them: 156 critical, 1,260 moderate, and 3,210 low severity vulnerabilities. You go to the results page – and get lost in a pile of findings. You export a .pdf report and see many thousands of pages. Guess what a code developer would say if faced with such a long read?
A security officer delivers a vulnerability report to a developer
Still let’s try to review the results and give the security officer a chance. Having closely examined a few dozen occurrences, you begin to understand why there are so many vulnerabilities. Some vulnerabilities really look substantial, and you understand you must fix them. However, right away you find around a dozen false positives, and furthermore: a huge number of vulnerabilities in library codes. You are not going to correct libraries, are you? Finally, you realize how much time you would have to spend analyzing the results. And this procedure must be repeated every day or week, or at least for each release (not that bad, actually).
First, false positives can be understood in very different ways. Some people will consider everything to be false, except for critical vulnerabilities that can be exploited right now, while others will consider only obvious analyzer errors to be false positives. A lot depends on what you want to get from a tool. We recommend considering almost all occurrences, since even a low-severity vulnerability that cannot be exploited right now may turn into a serious problem tomorrow due to code or external condition changes.
Well, you must review all occurrences, and this is still a huge amount of work. And just here analyzers can be of great help. An analyzer’s most important function is the ability to track vulnerabilities between scans of the same project, while being tolerant of small changes typical for code development. This removes the need to repeat these long vulnerability reviews: you spend more time on the first run, removing false positives and adjusting occurrence severity, but then you will only need to look at new vulnerabilities, which requires dramatically less time.
OK, but is it really necessary to review all vulnerabilities during the first run? We recommend doing so but, in general, it’s not a must-do. First, analyzers allow for result filtering by directories and files: for example, when starting a scan, you can right away exclude certain components, libraries, and test code from analysis. This will also impact analysis speed. Second, analyzers allow for result filtering by vulnerabilities, i.e. when you start scanning, you can limit the set of vulnerabilities. Finally, in addition to severity, an analyzer may return something like false positive probability (i.e. confidence that a certain finding is a real vulnerability). You can filter the results, using this metric.
We should also mention Software Composition Analysis technology (now supported by more and more tools, albeit to different extents). This allows you to discover libraries used in your code, determine their names and versions, and see known vulnerabilities and licenses. Moreover, this technology can separate library code from your own code, which will help with result filtering.
It turns out that the result avalanche problem is rather easy to solve. While the first result review can take much time, the next re-scanning reviews will be faster and faster. Reminder: please filter results carefully so as not to miss a vulnerability. Even known libraries may contain vulnerabilities. If a vulnerability is poorly detected now (i.e. a tool returns many false positives) and you exclude it, then, after updating the analyzer, you may miss a real vulnerability.
Well, you’ve managed to deal with that huge report and a lot of false positives, and now you want to go further and check that your analyzer finds those vulnerabilities which, you know for sure, definitely exist (maybe, you embedded them intentionally or discovered them with another tool).
You should understand that an analyzer may miss a vulnerability for various reasons. The simplest one is that the scanning was incorrectly configured (pay attention to error messages). There may be various technological reasons also. A static analyzer consists of two important components: an engine (containing entire algorithmic complexity and mathematics) and vulnerability search rule base. Sometimes an engine allows for the discovery of a certain vulnerability that is missing from your rule base. So, you can easily add a necessary rule. However, it is a quite different situation when an engine per se does not support such vulnerabilities, in which case a very significant modification may be required. I gave a respective example in the beginning of the post: you’ll never find an SQL injection without dataflow analysis algorithms.
A static analyzer engine should have a set of algorithms covering vulnerability classes available for a given programming language (control flow, dataflow, interval analysis, etc.). A good analyzer will allow you to add your own vulnerability search rules, which eliminates the first reason for missing a vulnerability.
Therefore, if an existing vulnerability is absent from scan results, first identify the underlying cause – and this is where a vendor can usually help. If the cause is within a rule base or scanning configuration, then the problem can be solved quite easily. The most important thing is to evaluate analysis depth, i.e. what an engine can find by definition.
Here, you can assume that using this tool requires deep developer expertise, since you need to understand which positives are false and which are not. In my opinion, this depends on how user-friendly the tool is. Should it offer easy-to-use and clear functionality, plain vulnerability descriptions with examples, links and recommendations in different languages, and show traces for dataflow analysis-related vulnerabilities, then you don’t need to have deep developer expertise and understand all the specifics of a programming language and frameworks. However, you need some minimum development background to read a code.
At the end of the post, we’ll briefly touch on one of the most important tool use aspects, which will be discussed in detail in subsequent posts. Suppose you have decided to use a static analyzer. However, you have an established (in terms of both technology and organization) development process, and you do not want to change it (to be honest, nobody will allow you to).
A tool must have a full-fledged non-graphical interface (such as CLI or REST API), which you can use for integration into any of your processes. It’s good news if an analyzer has out-of-the-box integrations with various components: IDE or build system plugins, version control system integrations, CI/CD server (Jenkins, TeamCity) plugins, integrations with project management systems (JIRA) or user handling systems (Active Directory).
Integration of static analysis into the development process (so-called SDLC) is the most effective option, provided that the process is well-established and all participants are on the same page when it comes to knowing why and how to do it. Continuous code analysis after modifications or analyzer updates will allow you to find vulnerabilities as early as possible. Separation of developer and security officer roles, clear indication of information security requirements, and soft integration into the existing process (for example, at first, system use may be recommended, not mandatory) will allow you to use the tool effectively and without frustration. Otherwise, you may still use a tool manually.
This post contains basic recommendations on how to start using a static analyzer. A good analyzer is much better than any lightweight checker, and looks for problems of fundamentally different complexity. Therefore, you should take into account technological specifics of static analysis and choose a certain tool so that its functionality smooths out all such specifics to the maximum possible extent.