Software metrics are currently on everyone's lips and a frequently discussed topic. There are many conference talks, blog posts and other presentations that talk about software metrics. But to me it seems as if this subject is a closed book for many developers, so I decided to write this little post about a special category of software metrics, the complexity metrics.
Complexity metrics are a theoretical approach to measure the subjective complexity of a software fragment, where the words software fragment stand for a paraphrase for functions, methods, classes and nearly every logical unit that can be found in a software system. The most prevalent procedure to calculate complexity values is static code analysis, where an application parses the raw source code of a project, counts different statements and expressions and packs up the determined results in simple classification numbers. And with this information you already know the main concepts behind most software metrics, classification numbers and counting. As you can see there is no magic behind the scene, the only thing required is a good background knowledge to interpret those values.
The Cyclomatic Complexity Number or short CCN is the
oldest complexity metrics. The first time this software
metric was mentioned was 1976 by Thomas J. McCabe. This
metric counts the available decision paths in a software
fragment to determine its complexity. Each decision path
starts with one of the conditional statements from the
following list, so that it is fairly easy to detect them in
existing source code.
- ?
- case
- elseif
- for
- foreach
- if
- while
A look at this list of statements may result in the questions:
Is this list wrong, it doesn't list else and default?
But it is correct. The assumption is that both statements
will contain the defaut execution path of a program which
also exists when there are no special cases to capture.
Each decision path gets the value 1 and the sum of all these
values represents the Cyclomatic Complexity of the analyzed
software fragment. Note that each function and method also
counts with a value of 1 With this knowlegde we can now
calculate the complexity of the following example code:
<?php
class CyclomaticComplexityNumber
{
public function exampe( $x, $y ) // (1)
{
if ( $x > 23 || $y < 42 ) // (1)
{
for ( $i = $x; $i >= $x && $i <= $y; ++$i ) // (1)
{
}
}
else
{
switch ( $x + $y )
{
case 1: // (1)
break;
case 2: // (1)
break;
default:
break;
}
}
}
file_exists('/tmp/log') or touch('/tmp/log');
}
Based on the previous definition the Cyclomatic Complexity
Number of the example code example is 5. But you may have
noticed that this approach does not capture all decision paths
that exist. We haven't catched those paths that came from the
by the boolean expression || line 6 and && line 8, and
the logical or expression in line 25. A variation of the
Cyclomatic Complexity Number that also captures those paths
is the so called CCN2. The CCN2 is the most widely used
variation of this software metrics. Tools like PHPUnit, PMD
and Checkstyle report it as Cyclomatic Complexity of an
analyzed software fragment.
Now we get a complexity value of 8 when we apply the CCN2
to the previous example, what is a growt of the software's
complexity of 60%.
Due to the fact that Cyclomatic Complexity Number was
originally invented for procedural programming languages,
this definition for the Cyclomatic Complexity Number still
misses one element to measure the complexity of an object
oriented software system. With the concept of exceptions a
software gets additional decision paths for each catch
statement used in the source code. While try contains
the code for the regular execution code without special
cases, similar to else and default statements.
- ?
- &&
- ||
- or
- and
- xor
- case
- catch
- elseif
- for
- foreach
- if
- while
Now that we know what the Cyclomatic Complexity Number is,
what can we do with the measured information? We can find
the complexity hotspots in a system, for example the top
ten artifacts with the highest complexity, but this is only
important during an initial analyses phase to get the big
picture of an application. For a continuous inspection this
information is not so important. A continuous analyses
requires thresholds that help to categories calculated
values. During the time four values have emerged as good
thresholds for the Cyclomatic Complexity Number of a
software system.
- A software fragment with a CCN value between 1-4 has
low complexity.
- A complexity value between 5-7 is moderate and still
easy to understand.
- Everything between 6-10 has a high complexity, while
everything greater 10 is very complex and hard to
understand.
You may ask, why should I care about the complexity of a
software system, where is the value of benefit in this
metric?
Mostly the complex parts of an application contain business
critical logic. But this complexity has negative impacts on
the readability and understandability of source code. Those
parts will normally become a maintainence and bug fixing
nightmare, because no one knows all the constraints, side
effects and what's exactly going on in that part of the
software. This situation results in the well known saying
"Never touch a running system" which in turn mostly ends
in copy&paste programming. The situation can even become
more critical when the original author leaves the
development team or the company.
Finally a small example how to apply the new knowledge
about the Cyclomatic Complexity Number, thresholds and the
negative impacts of complex software to an existing
development process. The following source listing shows
a complex method taken from PHP_Depend's source. This method
has a Cyclomatic Complexity Number of 16 and I must admit
that the original author needed some time to understand what
was going on in this method.
<?php
// ...
private function _countCalls(AbstractCallable $callable)
{
$callT = array(
TokenizerI::T_STRING,
TokenizerI::T_VARIABLE
);
$chainT = array(
TokenizerI::T_DOUBLE_COLON,
TokenizerI::T_OBJECT_OPERATOR,
);
$called = array();
$tokens = $callable->getTokens();
$count = count($tokens);
for ($i = 0; $i < $count; ++$i) {
// break on function body open
if ($tokens[$i]->type === TokenizerI::T_CURLY_BRACE_OPEN) {
break;
}
}
for (; $i < $count; ++$i) {
// Skip non parenthesis tokens
if ($tokens[$i]->type !== TokenizerI::T_PARENTHESIS_OPEN) {
continue;
}
// Skip first token
if (!isset($tokens[$i - 1]) || !in_array($tokens[$i - 1]->type, $callT)) {
continue;
}
// Count if no other token exists
if (!isset($tokens[$i - 2]) && !isset($called[$tokens[$i - 1]->image])) {
$called[$tokens[$i - 1]->image] = true;
++$this->_calls;
continue;
} else if (in_array($tokens[$i - 2]->type, $chainT)) {
$identifier = $tokens[$i - 2]->image . $tokens[$i - 1]->image;
for ($j = $i - 3; $j >= 0; --$j) {
if (!in_array($tokens[$j]->type, $callT)
&& !in_array($tokens[$j]->type, $chainT)
) {
break;
}
$identifier = $tokens[$j]->image . $identifier;
}
if (!isset($called[$identifier])) {
$called[$identifier] = true;
++$this->_calls;
}
} else if ($tokens[$i - 2]->type !== TokenizerI::T_NEW
&& !isset($called[$tokens[$i - 1]->image])
) {
$called[$tokens[$i - 1]->image] = true;
++$this->_calls;
}
}
}
The first thing to do is to make sure that the test suite
is good enough to ensure that the required refactorings
will not change the public behavior of the component or
class. When this is donw and we are sure our that api
breaks will be detected by the test suitewe can start to
extract logic into separate methods.
The following example shows the result of the refactoring:
<?php
// ...
private function _countCalls(AbstractCallable $callable)
{
$called = array();
$tokens = $callable->getTokens();
$count = count($tokens);
for ($i = $this->_findOpenCurlyBrace($tokens); $i < $count; ++$i) {
if ($this->_isCallableOpenParenthesis($tokens, $i) === false) {
continue;
}
if ($this->_isMethodInvocation($tokens, $i) === true) {
$image = $this->_getInvocationChainImage($tokens, $i);
} else if ($this->_isFunctionInvocation($tokens, $i) === true) {
$image = $tokens[$i - 1]->image;
} else {
$image = null;
}
if ($image !== null) {
$called[$image] = $image;
}
}
$this->_calls += count($called);
}
The subjective feeling of readability heavily depends on the
complexity of control structures, as we can see by a
comparison of the original and the refactored version of the
method example. The new version with its Cyclomatic Complexity
Number of 5 is much easier to read and understand.
This text is the first of two blog posts. The second article
will give a short introduction into the NPath Complexity
You liked this article and you are interested in this and
other quality assurence related topics? - Then you should
now order your copy of the Book Quality Assurance in PHP
Projects. The book talks about nearly all aspect of
quality assurence, with practical tips and expert knowledge
contributed by certain PHP professionals.