Wikipedia defines static code analysis like this:
Static code analysis is the analysis of computer software that is performed without actually executing programs built from that software [...].To make it short: Static analysis typically finds mistakes. But but some of the mistakes don't matter at all.
In most cases the analysis is performed on some version of the source code [...].
The term is usually applied to the analysis performed by an automated tool,
with human analysis being called program understanding, program comprehension or code review.
What is most important is to find the intersection of stupid and important. Further on, it highly depends on the context, if bugs matter or not. Static code analysis, at best, might catch 5-10% of the software quality problems in code. This may extend to 80+% for certain specific defects but overall it ist not the the magic bullet you are looking for. Anyway, using static analysis is cheaper and more
effective than any other techniques for catching bugs. If you want to catch more bugs, you need to take a fullblown approach on testing (see picture one, taken from a JavaOne presentation by William Pugh (findbugs lead).
My employer has kindly given me the permission to use the msg java measuring station (lets call it JMP for short) to do the analysis (Thanks Rainer!). And a co-worker of mine is kindly supporting me in hunting bugs in it and doing configurations (Thanks Jochen!).
The JMP is a collection of popular code analysis tools. Three tools focus on static code analysis. One on architectural compliance and one tries to find out about test coverage.
Beside the simple results from the tools, I also try to add some expert views (code review :-)).
What metrics are covered?
The JMP covers all metrics generated by the individual tools. This is the enormous count of about 52 different numbers to interpret. To make this more convenient for the readers, I picked the most common ones.
Having a part I available indicates, that there will be a part II :) If you are looking for CCD, ACD, Ca, Ce, I, A, I/A and D you have to wait for the next post.
Non Commenting Source Statements (NCSS)
Determines complexity of methods, classes and files by counting the Non Commenting Source Statements (NCSS). Statements for NCSS are not statements as specified in the Java Language Specification but include all kinds of declarations too.
Roughly spoken, NCSS is approximately equivalent to counting ';' and '{' characters in Java source files.
The NCSS for a class is summarized from the NCSS of all its methods, the NCSS of its nested classes and the number of member variable declarations.
The NCSS for a file is summarized from the ncss of all its top level classes, the number of imports and the package declaration.
Impact:
Too large methods and classes are hard to read and costly to maintain. A large NCSS number often means that a method or class has too many responsabilities and/or functionalities which should be decomposed into smaller units.
Threshold:
Derived from this you can set some default values from experiences that define tresholds like the following.
The maximum count of classes per package must not exceed 40.
The maximum count of functions per class must not exceed 20.
The maximum ncss per function must not exceed 25.
Cyclomatic Complexity Number (CCN)
CCN is also know as McCabe Metric. It defines the complexity of classes and files by counting the control flow statements like 'if', 'for', 'while, etc. in methods. Whenever the control flow of a method splits, the "CCN counter" gets incremented by one. Each method has a minimum value of 1 per default.
Impact:
Too complex methods and classes are hard to understand and test and costly to maintain. A high CCN often stands for methods or classes that respond to too many responsabilities and possibly wrong design.
Threshold:
Having a CNN below 10 is quite normal. The maximum count must not exceed 25.
Findbugs total warnings and density
FindBugs is a program to find bugs in Java programs. It looks for instances of "bug patterns". This are code instances that are likely to be errors. A complete list of Bugpattern is available http://findbugs.sourceforge.net/bugDescriptions.html.
The metric itself is the density. It refers to the Defects per Thousand lines of non-commenting source statements.
Impact:
The total density sums up the overall code impression. Code which has a high density most probably has many errors in it.
Threshold:
The quality of this metric depends on the distribution of the bugs along the categories. Most projects allign around a density of 10.
There are several categories of bugs reported by this tool. Therefore it is not only the metric that makes it but every single bug. Only looking at the metric is not enough. With this tool you should do a deeper look at the reported bugs.
First, review the correctness warnings. Developers would want to fix most of the high and medium priority correctness warnings reported. Once you've reviewed those, you might want to look at some of the other categories.
Next on with the bad practice warnings which are violations of recommended and essential coding practice. Examples include hash code and equals problems, cloneable idiom, dropped exceptions, serializable problems, etc.
Dodgy warnings summarize code that is confusing, anomalous, or written in a way that leads itself to errors. Examples include dead local stores, switch fall through, unconfirmed casts, and redundant null check of value known to be
null.
CheckStyle Errors
Checkstyle reviews Java code for it's compliance to coding standards. The standard checks are applicable to general Java coding style. The optional checks are available for JEE artifacts.
The single metric generated here is the number of errors the tool finds. This is highly dependent on the configuration used with checkstyle. The config used for analyzing the JSF libraries are based on the complete set of Sun's Java Coding Standards whithout any tailoring. This is not practical to follow all of them, but for a comparison this should be a good place to start. Even if some projects tailored some of the checks, it should be visible, if and how
the projects care about code style.
Impact:
Code conventions are important to programmers for a number of reasons. 80% of the lifetime cost of a piece of software goes to maintenance. Hardly any software is maintained for its whole life by the original author.
Following coding conventions improves the readability of the software, allowing engineers to understand new code more quickly and thoroughly.
Threshold:
Depending on the real project setting and it's commitment to the set of checks, this should find no errors at all.
The second part will cover CCD, ACD, Ca, Ce, I, A, I/A and D. Stay tuned.
Hi Markus,
ReplyDeleteGreat article - thank you for writing it. Question for you about the msg java measuring station. What would be comparable/closest set of tools out there that could resemble the station? Some of the tools you mentioned one can find in Sonar, but I assume there is more to it.
Hi Arkady,
ReplyDeletethanks for your comment. I believe that http://kenai.com/projects/sqe/pages/Home or even PMD are closest to that.
Under the hood the msg Java Messplatz is nothing more as a collection of different tools which have a combined reporting.
Thanks,
M
Thank you for the response, I will look into it!
ReplyDelete