Category: Development

Software metrics (PHP focused) part 1

Managing a software project with various programmers and around 10 contributions per day is a complex thing.
How can we measure the quality of every single contribution?
How can control the work of the programmers?

First stop: Lines of code

The easiest thing to measure in a project is the source lines of code it has. For example, the main project we are developing at work has ~30k lines (k=1000). That is, 24k of pure code and 7k of comments (and I’m counting only pure PHP OO code, without HTML, CSS or javascript).

Is this that simple to count? No, it’s not. The main problem is that programming is brain work, where creativity, skills to solve problems, and smartness are put in play. It’s like writing a novel: can you say that one novel is better than other just counting the pages it has? Can you say a Boeing 717 is worse than a 747 because it weights less? As the Wikipedia’s article says, only this metric can be useful when comparing 2 projects with different order of magnitude. However, the ratio comments per lines of code (in my case 1 comment every 4 lines) usually is a good index of quality.

Second stop: Average lines per day

Calculating the average lines of code per day can be tricky as well. At my job, a senior developer usually does ~50 lines per day, but a junior programmer does ~130 lines. Is that much? Well, having a look at some metrics about a similar project, phpMyAdmin, in Ohloh (a website with metrics for open source soft), and doing some maths, it’s suggestion is just 17 lines per day! On the other hand, if you google for “lines of code per day” you will get really wide values, from tops of 1k (using code generators: tricking) to normal less-than-100 values. Moreover, the deviation from this average value is actually huge: one day I can do 2 lines, other day I can do 200.

Senior does 50 lines, junior 130… is the senior one a slacker? Of course is not… usually the code from the senior one is better: less prone to errors, more adaptable, more concrete function’s names, more elegant, and does more things in less lines. The more the better? Actually the less code needed to solve a problem, the better. About this topic, I recommend reading the article : Code is your enemy!.

Tools? We use phploc for counting lines of code, and phpcpd for detecting cases of duplicated code. Both tools are developed by Sebastian Bergmann, the author of PHPUnit (the most popular testing framework for PHP).

Next, in part 2 of this post, I’ll be speaking about other metrics, like code coverage, cyclomatic complexity and some interesting ratios based on software package metrics.

Looking for quality content in the web 2.0

How can we induce users to participate more in our website?
For sure a lot of people have this question in their minds. Since the arrival of the Web 2.0 the value of a web is based on its users and the content they create. The more quantity and quality of the user’s created content, the more value of the website.

The first step is to simplify the UI as much as possible, to help users overcome their laziness and participate. The state of art includes clever AJAX tools, browser plugins, and desktop applications. In some websites they go one step further, and reward somehow the most valuable users, like the’s badges (a website for programmers), where you get medals for doing things (like “silver medal for good answer”: voted up 25 times).

But what about the quality of the content?
If you help users to add content, that doesn’t mean you will have a great content, just a lot of content. In some cases you can finish with a website flooded by low quality content (read “Facebook”). This is not a bad thing per se, as happens on quite a lot of TV channels: despite their low quality, people continue watching them. But seems that specific (or thematic) websites have better quality than generic websites (this also works on TV channels). Just compare the ratio interesting-content / total-content in Flickr vs. Facebook : of course you can find some bad pictures on Flickr, but meanwhile you can find tons of uninteresting content on Facebook. On Flickr you are somehow induced to publish only good pictures, on Facebook you are just tempted to publish a lot.

So, the balance between quantity vs. quality rules the net as it does in other places. The thing is, as website creator, find the most profitable ratio (regarding personal satisfaction and/or monetary ROI).

Lately I’ve been thinking about resurrecting an old pet project, a website for creating and playing games. Is that specific group (the gamers) enough to pay the bills or just to pay some caprices? Is the “create game” part too specific, or just what I need to make the difference? How could I work effectively on this project while keeping my day job?… too sunny to think!

Feeling as a senior programmer

Here it is a morning conversation with my new Hindi junior coworker. He is in New Delhi, working for us remotely, and I’m teaching him quite a lot of things, and love to discuss about programming problems…

Me – have you read about Composite Pattern?
He – i use to say to my friends that am good at programming, then I found you :(
Me – hahaha
He – right now reading Pragmatic Programmer, :)
Me – really? really good
He – planning to apply for zend certification so I’ve so much to study
Me – This book changed the way I think about programming

Testing (during winter solstice)

What was I doing during the shortest day (in Northern Hemisphere) of the year?
Enjoying the “code coverage” used in testing, while traveling to my hometown by train for Xmas.
Testing, code coverage… what’s that? you ask. Let me explain.

A really important part of developing code is testing the code you write. It’s the way to verify your code really works. The easiest way to do so is inserting some “prints” here and there, to verify that variables store expected values. Or you can write an external program to stress your code with some inputs and verify expected outputs. But the best way is to use a testing tool for creating automated tests. Then you normally do some unit tests (that stress modules/objects) and test suites (that aggregates unit tests).

Until recently I didn’t use automated tests. But some months ago I discovered PHPUnit, and now I’m “safe”. Automated tests helps you and your team against careless modifications (made by this new internal, or by yourself in a bad day).

And what’s code coverage? Well, when you write tests, there is a way to see if your tests stress all the code lines: code coverage. But let me show you a really small example of this useful tool.

Imagine this function/method:

And this (obviously uncompleted) test:

When you launch PHPUnit and ask for a code coverage analysis in HTML, you get something like this (click to see it fullsize):
Code coverage with PHPUnit
Isn’t this wonderful? You get a lot of information! You quickly discover ways to improve your tests and sometimes your source code… how could’ve lived without this?

Merry Christmas, by the way!

Professional PHP: A difficult task, and even worse in Spain

This is a post about tech books. Sometime ago I discovered that Internet is not, and should be not, the only information source for an IT professional. It’s a good place to search for technical reference, to scan for chunks of information, but definitively it isn’t a place to read long texts. Books are the alternative. We get used to scan text on Internet (Nielsen dixit), but we read books in other way, in a more relaxed way. And books authors write books in a different way from writing on Internet. We need books (or big and comfortable e-books) in order to understand “the big picture”.

PHP is a flexible language, and thanks to its version #5 it’s a modern object-oriented language. It’s also an easy language to learn, which leads to help newbies. But, just because this, most of the PHP programmers’ population have a low programming level. This could be a reason to explain why almost every PHP book seems written for newbies. PHP5 is object-oriented so, as an example: how many books explain Design Patterns using Java? More than 30 (according to an Amazon search). How many with PHP5? Just 4. Another example… let’s speak about unit testing: how many books are out there on jUnit? Around a dozen. And on PHPUnit (something like jUnit, but in PHP)? Just ONE!

And things get worse if we speak about trying to buy advanced PHP books in Spain. Libraries in Spain avoid books written in other languages but Spanish, and of course none of the “advanced level” PHP books get a Spanish translation. So the reality is that it’s almost impossible to get a decent book. Last year I was surprised when I went to London, entered in a just normal library and found a lot of hi-tech PHP and AJAX books. I tried to found a good library here in Barcelona, but it’s impossible: so finally I ordered some books to Amazon UK!

“Hello World!” is too hard to program

10 years ago, when I saw for the first time a Java program, the classical “Hello World!”, I thought for a moment “booo, it’s counterintuitive… you need to create a class just to say Hello”. Later I used to do all the nasty work with classes.

But I never imagined something like the following:

Malbolge, invented by Ben Olmstead in 1998, is an esoteric programming language designed to be as difficult to program in as possible. The first “Hello, world!” program written in it was produced by a Lisp program using genetic algorithms.
– from Malbolge description

8-0 No comments.

Playing with the bits of double variables

[Most of my readers will not understand the following, but the mathematicians will love it]

A couple of years ago, a mathematician friend told me some problems he had about optimizing a program (maths calculations), asking me for suggestions. I answered with some ideas, but then he asked me about the double format, the way the computer stores a double variable, with its separated sign, mantissa and exponent (actually, IEEE 754). He was pretending to manipulate directly those parts, to improve the calculation’s speed. I didn’t help him in a clear way.

In the last few months I’ve been taking part in some programming contests, on TopCoder. Today one of the problems was really boring but at the same time really interesting. In brief, it says that the division is a computationally expensive operation, except if the divisor is a power of 2 (then you can use just bit shifts to do so). The statement asks to program a division, approximating the result using a series of divisions by powers of 2.

I tried to use bit shifts and bit masks to get the exponent of a double (keeping in mind the double binary format), but I discovered that C++ (as C) doesn’t allow bit operations with doubles.

Nevertheless later I asked in the forums, and got an interesting idea. Using an “union you bind two or more variables, sharing the same memory position. That way you can manipulate the value in memory from different angles.

Here is an example in pure C:


KISS : Keep It Simple and Stupid

Sometimes I forget this principle. The other day was one of such days. I was trying to solve a problem at TopCoder (a website that runs computer programming contests)…

The statement was something like “given n males and m females, sit them down in a circular table, in a way that if you start removing every K persons, after m steps there will be only males”. It looks like a trivial problem (and it is), but the difficulties arrive when you consider a big K, starting to do loops with a different number of people each time.

I started using a lot of modulus operations and trying to get an elegant solution as well. But the modulus is quite expensive (in computational terms), and the code was getting confused (and of course it was really difficult to trace). I made a lot of examples with pen and paper, trying to abstract something useful. Suddenly I realized: why don’t I program just what I am doing with pen and paper? Just letting the computer iterates every step, counting and setting females at the K step. I programmed it, and later I verified (peeking other’s code) that it was the correct solution: simple, stupid, but effective. [Well, actually the complexity in this way is O(m*K), and using only modulus is O(m), but in a short domain (as they said, K<=1000) the first solution is faster.]

This reminds me a classical quote:

“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?” – Brian Kernighan

Chinese stealing my code

This page was part of my last project which I developed in my company: Buff catalog.

And this page was part of the website of our client’s competitor in China, recently published: KranGear catalog.

The funny thing is that their javascript files even include MY comments, in Spanish.

The idea was to use a thermometer as the scroll bar, reflecting the different temperature protection of every product. The original page was developed around December 2006. I spent a couple of weeks tweaking the details, to get the smoothest result. I think it was an innovation, because the design department wanted to make it with Flash, and I showed them that this can be made with just javascript (including the zoom-in effect when you pass over the designs). And now a Chinese competitor copies me… hence I did a good job (even I could have done far better, but we were in a hurry to publish the website).

This reminds me of another UI effect, made with javascript, which I programmed for the Whisher home page (now they have a new website, but my work can be seen, without images, at webarchive’s capture). The idea was some kind of quick movement between slides, and I spent a couple of days creating the correct framework (divs, css, and javascript) to make it alive. Now it’s a quite common effect, that most javascript libraries have (like this example), but in those days it was new stuff.

Curious, because my most innovative work was in this area (javascript), when my position at the company was more server-side focused (php + mysql, mainly).

Stupid robots generating the Spanish Congress’ website

The thing to speak about this week in Spain is the end of the football league. But I don’t follow it!!

Let’s restart: the thing to speak about this week in Spain, regarding the “web world”, is the new website of the Spanish Congress, This new version was worth a bit less than 200 thousand euros. And it’s probably the worst HTML I’ve ever seen… just thousands of lines of tag soup. Just try to see the source code (right click in the page, and “View Source Code”)… even if you don’t know anything about HTML, you can see some really weird things, like a thousand of lines of CSS definitions that are stupid. It’s quite impossible to do it worse!

The truth is that it’s quite clear that some kind of robot has generated all this tag soup. Humans are not in this magnitude of stupidity. Finally, all the government’s websites must follow (by law) standards and usability guidelines: I guess they haven’t any idea of what this really means.

“Welcome to Spain”… What a shame!