Monthly archives: August, 2007

More about playing with word count and position

I’ve seen in the last few weeks some examples on using the word positions within the sentences… like my “your blog written by monkeys“.

A way to experiment is just looking for a chuck of a sentence, a “meaning” block, that puts in relation two concepts. For example encountering “is the new”, you can create a nice diagram showing new tendencies.

A similar path is getting statistics of word appearances within a text, and their relations. Later you can paint nice pictures using this info, like one visualizing the Holy Books of five world religions.

Finally, you don’t even need to know how to program this systems. You can do things yourself with just paper and pencil, and your favorite song.

The cooking company

Imagine a big restaurant, with a lot of cooks. There are some bosses managing all the work: assigning the dishes to the individual cooks. So far, this seems quite correct. But imagine the bosses decide to start a individual plate with 3 cooks, who start to heat the frying pan, and then the bosses decide to remove these 3 chefs, and assign another 2. In the middle of the cooking, for some weird reason, the managers change the orders for creating this dish, modifying some ingredients, and reassigning other cooks to the task. Can you imagine the final result? Well, actually I don’t really know how a big restaurant manages all its tasks, but… you guessed it, I’m thinking about software companies.

It’s curious to read, from a brilliant article by Paul Graham, this idea:

There is a contradiction in the very phrase “software company”. The two words are pulling in opposite directions. Any good programmer in a large organization is going to be at odds with it, because organizations are designed to prevent what programmers strive for.

The software industry doesn’t have yet a good frame or structure for organizing / managing / assigning the projects to programmers. Everybody knows (even some bosses) that the fact is: the more programmers you assign to a project, the less quality you obtain. But most companies prefer considering programmers just as flat resources. “Assign this resource to that project”, they used to say. The problem is that programmers are not resources, like machines. Also programming is somehow an art.

In Spain, the big companies create a lot of job levels, like: junior programmer, senior programmer, junior analyst and senior analyst. The lower level inhabitants are reassigned from time to time to different tasks, messing them up. Even worse, everybody assumes that programming is bad (because the high levels inhabitants don’t program, and everybody wants to climb levels); so imagine that: engineers who have just finished the degree who want to program the least as possible! A really unhealthy industry. In the rest of Europe, things are a bit better, as are in America. But, of course they haven’t found the holy grail.

Some people say the future is pair programming, and others say small groups (3 or 4 persons) is the best option. Also, we can observe another way to manage projects: the methodology (or lack of it) in the open source projects. I’d like to see some analysis from the projects at sourceforge: statistics putting in relation things like number of (key) developers, activity, and… let’s dream, quality (maybe measured using bugs count plus features accomplished plus something). We might find the key to create a software company, or the key to avoid creating them.

Your blog written by monkeys

Do you want to try my new tiny script?
Just write here your blog’s url, click the button and watch the surprise.

And now, let me explain it.
Some weeks ago I was discussing with another engineer about NLP, and the importance of having some language knowledge to effectively process a text. He wanted to do some stuff with pure text, written in an unknown language. I suggested that you can do more and better things if some human expert helps you, but he told me he couldn’t pay an expert. In that case, I pointed out that you can use some kind of statistic approach, like registering the appearance of every single word, or the relation between a word and the previous and next ones. But those methods are just randomly, almost like monkeys playing with the words

This evening I’ve created a PHP script that reads your blog (actually only the words within paragraphs) and generates new text using some statistic (the data used can be seen at the end of the page source). It seems real, it might have sense… but it’s just random. Monkeys rulez!