Monthly archives: September, 2009

Software metrics (PHP focused) part 2

In part 1 I spoke about lines of code and average of lines per day. Two indexes that are quite naive. Let’s see some better metrics, most of them object oriented programming focused.

Third stop: Tests and code coverage
If you have unit tests you can easily control that nothing vital is broken in each contribution. A unit test stress a class, so if there is some change in it, you can verify that the expected behavior remains the same. Moreover you can get an analysis of which lines of the code are actually executed in a test suite: that is, the code coverage of the test. You will see high values in well tested classes and low values in classes that need more tests.

Tools? Of course, PHPUnit, with the help of xdebug to get the code coverage.

Four stop: Cyclomatic complexity
Cyclomatic complexity is just the amount of different paths the execution can go throughout. For example, in our main project we have a total of 3048 paths. This value can be interesting to detect places in code that have become too complex and maybe need some cleanup.

Five stop: Pure OO software metrics
There are some interesting software metrics for object oriented code organized in packages, used basically to value which packages (groups of classes) are of better quality than others. Values that show the relation among classes, dependences, the resilience to change, etc.

Last stop: Ratios
Combining some of the previous values you can get interesting ratios. For example:
– Cyclomatic complexity per lines of code: 0.2, a very good value.
– Lines of code per method (class function): 22.02, a normal value that we have to lower.
– Method per classes: 8.25, a good value.
– Average number of extended classes: 0.4, a good value.
– etc…

Tools? The excellent pdepend is used here. Have a look at the end of its example page to see the amount of data (and funny but interesting diagrams) you can get.

Finally, all those values and tests are compiled each single time a programmer sends a commit to our code repository, and I get a mail with all the details, including the lines added, the author’s name, and all the values that change. So with a quick look I can assure that the contribution is ok, or there is some code to improve. I wonder how many companies (which develop in PHP) use something like this. I bet less than 100 in the world!


Software metrics (PHP focused) part 1

Managing a software project with various programmers and around 10 contributions per day is a complex thing.
How can we measure the quality of every single contribution?
How can control the work of the programmers?

First stop: Lines of code

The easiest thing to measure in a project is the source lines of code it has. For example, the main project we are developing at work has ~30k lines (k=1000). That is, 24k of pure code and 7k of comments (and I’m counting only pure PHP OO code, without HTML, CSS or javascript).

Is this that simple to count? No, it’s not. The main problem is that programming is brain work, where creativity, skills to solve problems, and smartness are put in play. It’s like writing a novel: can you say that one novel is better than other just counting the pages it has? Can you say a Boeing 717 is worse than a 747 because it weights less? As the Wikipedia’s article says, only this metric can be useful when comparing 2 projects with different order of magnitude. However, the ratio comments per lines of code (in my case 1 comment every 4 lines) usually is a good index of quality.

Second stop: Average lines per day

Calculating the average lines of code per day can be tricky as well. At my job, a senior developer usually does ~50 lines per day, but a junior programmer does ~130 lines. Is that much? Well, having a look at some metrics about a similar project, phpMyAdmin, in Ohloh (a website with metrics for open source soft), and doing some maths, it’s suggestion is just 17 lines per day! On the other hand, if you google for “lines of code per day” you will get really wide values, from tops of 1k (using code generators: tricking) to normal less-than-100 values. Moreover, the deviation from this average value is actually huge: one day I can do 2 lines, other day I can do 200.

Senior does 50 lines, junior 130… is the senior one a slacker? Of course is not… usually the code from the senior one is better: less prone to errors, more adaptable, more concrete function’s names, more elegant, and does more things in less lines. The more the better? Actually the less code needed to solve a problem, the better. About this topic, I recommend reading the article : Code is your enemy!.

Tools? We use phploc for counting lines of code, and phpcpd for detecting cases of duplicated code. Both tools are developed by Sebastian Bergmann, the author of PHPUnit (the most popular testing framework for PHP).

Next, in part 2 of this post, I’ll be speaking about other metrics, like code coverage, cyclomatic complexity and some interesting ratios based on software package metrics.