The value of complexity metrics - Cyclomatic Complexity (1/2)

Software metrics are currently on everyone's lips and a frequently discussed topic. There are many conference talks, blog posts and other presentations that talk about software metrics. But to me it seems as if this subject is a closed book for many developers, so I decided to write this little post about a special category of software metrics, the complexity metrics.

Complexity metrics are a theoretical approach to measure the subjective complexity of a software fragment, where the words software fragment stand for a paraphrase for functions, methods, classes and nearly every logical unit that can be found in a software system. The most prevalent procedure to calculate complexity values is static code analysis, where an application parses the raw source code of a project, counts different statements and expressions and packs up the determined results in simple classification numbers. And with this information you already know the main concepts behind most software metrics, classification numbers and counting. As you can see there is no magic behind the scene, the only thing required is a good background knowledge to interpret those values.

The [Cyclomatic Complexity Number](Thomas J. McCabe: A Complexity Measure) or short CCN is the oldest complexity metrics. The first time this software metric was mentioned was 1976 by Thomas J. McCabe. This metric counts the available decision paths in a software fragment to determine its complexity. Each decision path starts with one of the conditional statements from the following list, so that it is fairly easy to detect them in existing source code.

  • ?
  • case
  • elseif
  • for
  • foreach
  • if
  • while

A look at this list of statements may result in the questions: Is this list wrong, it doesn't list else and default? But it is correct. The assumption is that both statements will contain the defaut execution path of a program which also exists when there are no special cases to capture.

Each decision path gets the value 1 and the sum of all these values represents the Cyclomatic Complexity of the analyzed software fragment. Note that each function and method also counts with a value of 1 With this knowlegde we can now calculate the complexity of the following example code:

<?php                                 
class CyclomaticComplexityNumber
{
    public function exampe( $x, $y ) // (1)
    {                                               
        if ( $x > 23 || $y < 42 ) // (1)
        {                                 
            for ( $i = $x; $i >= $x &&amp; $i <= $y; ++$i ) // (1)
            {
            }
        }
        else
        {
            switch ( $x + $y )
            {
                case 1: // (1)
                    break;
                case 2: // (1)
                    break;
                default:
                    break;
            }
        }
    }
    file_exists('/tmp/log') or touch('/tmp/log');
}

Based on the previous definition the Cyclomatic Complexity Number of the example code example is 5. But you may have noticed that this approach does not capture all decision paths that exist. We haven't catched those paths that came from the by the boolean expression || line 6 and && line 8, and the logical or expression in line 25. A variation of the Cyclomatic Complexity Number that also captures those paths is the so called CCN2. The CCN2 is the most widely used variation of this software metrics. Tools like PHPUnit, PMD and Checkstyle report it as Cyclomatic Complexity of an analyzed software fragment.

Now we get a complexity value of 8 when we apply the CCN2 to the previous example, what is a growt of the software's complexity of 60%.

Due to the fact that Cyclomatic Complexity Number was originally invented for procedural programming languages, this definition for the Cyclomatic Complexity Number still misses one element to measure the complexity of an object oriented software system. With the concept of exceptions a software gets additional decision paths for each catch statement used in the source code. While try contains the code for the regular execution code without special cases, similar to else and default statements.

  • ?
  • &&
  • ||
  • or
  • and
  • xor
  • case
  • catch
  • elseif
  • for
  • foreach
  • if
  • while

Now that we know what the Cyclomatic Complexity Number is, what can we do with the measured information? We can find the complexity hotspots in a system, for example the top ten artifacts with the highest complexity, but this is only important during an initial analyses phase to get the big picture of an application. For a continuous inspection this information is not so important. A continuous analyses requires thresholds that help to categories calculated values. During the time four values have emerged as good thresholds for the Cyclomatic Complexity Number of a software system.

  • A software fragment with a CCN value between 1-4 has low complexity.
  • A complexity value between 5-7 is moderate and still easy to understand.
  • Everything between 6-10 has a high complexity, while everything greater 10 is very complex and hard to understand.

You may ask, why should I care about the complexity of a software system, where is the value of benefit in this metric?

Mostly the complex parts of an application contain business critical logic. But this complexity has negative impacts on the readability and understandability of source code. Those parts will normally become a maintainence and bug fixing nightmare, because no one knows all the constraints, side effects and what's exactly going on in that part of the software. This situation results in the well known saying "Never touch a running system" which in turn mostly ends in copy&paste programming. The situation can even become more critical when the original author leaves the development team or the company.

Finally a small example how to apply the new knowledge about the Cyclomatic Complexity Number, thresholds and the negative impacts of complex software to an existing development process. The following source listing shows a complex method taken from PHP_Depend's source. This method has a Cyclomatic Complexity Number of 16 and I must admit that the original author needed some time to understand what was going on in this method.

<?php
// ...
private function _countCalls(AbstractCallable $callable)
{
    $callT  = array(
        TokenizerI::T_STRING,
        TokenizerI::T_VARIABLE
    );
    $chainT = array(
        TokenizerI::T_DOUBLE_COLON,
        TokenizerI::T_OBJECT_OPERATOR,
    );
    $called = array();

    $tokens = $callable->getTokens();
    $count  = count($tokens);
    for ($i = 0; $i < $count; ++$i) {
        // break on function body open
        if ($tokens[$i]->type === TokenizerI::T_CURLY_BRACE_OPEN) {
            break;
        }
    }

    for (; $i < $count; ++$i) {
        // Skip non parenthesis tokens
        if ($tokens[$i]->type !== TokenizerI::T_PARENTHESIS_OPEN) {
            continue;
        }
        // Skip first token
        if (!isset($tokens[$i - 1]) || !in_array($tokens[$i - 1]->type, $callT)) {
            continue;
        }
        // Count if no other token exists
        if (!isset($tokens[$i - 2]) &&amp; !isset($called[$tokens[$i - 1]->image])) {
            $called[$tokens[$i - 1]->image] = true;
            ++$this->_calls;
            continue;
        } else if (in_array($tokens[$i - 2]->type, $chainT)) {
            $identifier = $tokens[$i - 2]->image . $tokens[$i - 1]->image;
            for ($j = $i - 3; $j >= 0; --$j) {
                if (!in_array($tokens[$j]->type, $callT)
                    &&amp; !in_array($tokens[$j]->type, $chainT)
                ) {
                    break;
                }
                $identifier = $tokens[$j]->image . $identifier;
            }

            if (!isset($called[$identifier])) {
                $called[$identifier] = true;
                ++$this->_calls;
            }
        } else if ($tokens[$i - 2]->type !== TokenizerI::T_NEW
            &&amp; !isset($called[$tokens[$i - 1]->image])
        ) {
            $called[$tokens[$i - 1]->image] = true;
            ++$this->_calls;
        }
    }
}

The first thing to do is to make sure that the test suite is good enough to ensure that the required refactorings will not change the public behavior of the component or class. When this is donw and we are sure our that api breaks will be detected by the test suitewe can start to extract logic into separate methods.

The following example shows the result of the refactoring:

<?php
// ...
private function _countCalls(AbstractCallable $callable)
{
    $called = array();

    $tokens = $callable->getTokens();
    $count  = count($tokens);
    for ($i = $this->_findOpenCurlyBrace($tokens); $i < $count; ++$i) {

        if ($this->_isCallableOpenParenthesis($tokens, $i) === false) {
            continue;
        }

        if ($this->_isMethodInvocation($tokens, $i) === true) {
            $image = $this->_getInvocationChainImage($tokens, $i);
        } else if ($this->_isFunctionInvocation($tokens, $i) === true) {
            $image = $tokens[$i - 1]->image;
        } else {
            $image = null;
        }

        if ($image !== null) {
            $called[$image] = $image;
        }
    }
    $this->_calls += count($called);
}

The subjective feeling of readability heavily depends on the complexity of control structures, as we can see by a comparison of the original and the refactored version of the method example. The new version with its Cyclomatic Complexity Number of 5 is much easier to read and understand.

This text is the first of two blog posts. The second article will give a short introduction into the [NPath Complexity](NPATH: a measure of execution path complexity and its applications) You liked this article and you are interested in this and other quality assurence related topics? - Then you should now order your copy of the Book [Quality Assurance in PHP Projects](Quality Assurance in PHP Projects). The book talks about nearly all aspect of quality assurence, with practical tips and expert knowledge contributed by certain PHP professionals.