Development – Page 2 – Liopic on Tech

2017 focus: ML

At the end of 2016 I was still amazed with the result of AlphaGo vs. Lee Sedol match in March (for the 1st time a machine beats a top professional Go player), and at the same time I was looking for a subject to focus on in 2017, so I chose Machine Learning. During my university years I tried out some related tools (genetic algorithms, basic neural networks, etc), but for 10 years I’d not looked at it again.

The first stop was the famous Machine Learning course by Andrew Ng in Coursera, as everybody points you there. Despite it explains a lot of complex stuff in an intuitive way, soon you get tired of so much maths and using Octave/Matlab, when you should be using Python.

After one year learning about Machine Learning, I think I have quite a list of recommendations on how to start exploring the field. Disclaimer: this could be related with my preferred way of learning, that is, with text instead of videos. This could be a good way to start if you have no previous experience:

Do not watch that coursera’s ML course, but just read the notes somebody took on it instead.
Learn about Python, but specially about the libraries Numpy, Pandas and scikit-learn. Also how to run a jupyter notebook. And the best way to install them all is via Anaconda distribution.
Buy a copy (paper or ebook) of the book “Python Machine Learning” by Sebastian Raschka.
Join Kaggle and have a look at the Titanic tutorials, and it’s new Learn section. They also have a video-course in Udacity in case you like watching videos.
Don’t be in a rush to learn deep-learning (aka neural networks), because you’ll first have to learn about classic ML models, but also a lot of related processes: data cleaning, feature engineering and data visualization.

My first real-world input was in May, when I attended PyData conference in Barcelona, which was a turning point: I found lots of ideas to apply, but over all I felt the industry’s pulse.

During summer I challenge myself to apply it at work and to do a conference talk. The subject was customer segmentation using non-supervised algorithms, using a dataset I prepared myself from our company’s data. Finally the talk became a 2-hour workshop.

It was the first time I did a presentation about Machine Learning in English. Despite the audience was satisfied with the workshop and some people had interesting conversation after, I felt that I should’ve work harder while preparing it.

As 2017 finished and 2018 started I’ll continue focusing on ML, but with a more practical approach. In my day work we have developed a recommendation system that will evolve with several ML models working together, and after work I’ll try to play more with Kaggle, taking part in some competitions.

In 2018, I’ll try deep learning too: both with Andrew Ng’s course with Tensorflow, a creative apps course and some video-tutorials on PyTorch. I’ll try to improve my engineering approach to ML, as things like version control, testing and deployment are very rare to see in a world with more university people than industry ones. Finally I plan to complete a nice course on data visualization with D3.js.

I hope all these links help somebody too!

Teaching students about real industry work

Some months ago I had the chance to teach University students about how we develop in the real world, as part of a “companies’ seminars” event.

There is an ongoing discussion in our industry: Do you need a major in Computer Science to become a successful developer?. People say that the subjects explained in the University become outdated quickly, basically due to the lightning speed of technology. People say that nowadays joining a course on javascript is enough to learn to program. Other people say that you must spend 4~5 years in University.

I’m on the side of the need for formal University education. Students need foundations to perfectly understand how things really work. But it’s true that they also need to know how the industry really work. Virtualization, code versioning, code quality (“clean”), tradeoffs, etc, are subjects that are not taught in University, unluckily.

During the seminar I taught students about general subjects like the tradeoffs we have to choose in our company, but also about last trending technologies like docker. Anyhow the most loved subject by them was my introduction to clean code, that opened their eyes. Let’s hope this will inspire them.

Here are the links to the slides I used:
– Professional development
– Clean Code
– OOP and SOLID principles
– Introduction to docker
– Seminar conclusion

The best advice I gave them: Find a job in a company where you can learn.

PyData conference in Barcelona

I was lucky to attend PyData conference in Barcelona this year, hosted in ESADE.

Although I’m basically a PHP developer, I’ve been playing with data science tools lately with python’s stack. I have no real experience in data science, apart from a couple of prediction coding using linear regression, but I was curious.

With a novice spirit, I set some clear objectives: find out if data science is like teenager sex, or companies are really using it; get a feeling of the community; and try to learn as much as I could.

First of all, the community is vibrant, actually far more than PHP’s one in Barcelona. The organization was smooth too, and all the people I talked with was really nice. Everybody had things to learn, so came with an open mind.

It was funny to see that I was on the “data owners” side, while most people were in the “looking for datasets” side. This led to several conversations asking me how we use the data in our company.

Regarding the talks, there were quite a lot about tools. Python science stack have a wide range of evolving tools, and this somehow reminds me of PHP circa 2008, when basic tools (PHPUnit, for example) were becoming popular. It’s good to polish your tools and master them, so I welcomed those talks.

There were also some talks on theory, which surprised me, as I haven’t never seen university professors in software conferences. Mathematical and computer science concepts were explained, for instance on optimization. This contrasts with the common industry solution: if some code is slow, just use more machine instances, which is far cheaper that spend time trying to optimize things (at least 99% of the time). I don’t mean I didn’t like those talks (actually one was really mind blowing), but I would love to see more professors in some other conferences, getting a real feel of some industry practices.

I was looking for talks showing “real fire”, real examples in companies. We heard about hotels trying to predict cancellations (in order to do overbooking); we saw IBM’s Watson analyzing the personality of customers; predict which employees will leave a big company; ideas to react knowing bad weather will arrive; best weekday to publish job offers and set interviews; and some other extremely interesting stuff… but I do want more!

My overall feeling is that I learned a lot. Python is not really used as a language but more as an interface for some amazing libraries. It looks like I have no option but to start exploring the data in ulabox!

I’d like to thank ulabox (my employer) that paid the ticket, and all the people in the organization that did a great job!

I published some of my (unedited) notes too.

Virtual disk design kata

In my current job (ulabox) we do every Thursday a internal training session, usually prepared by one of our department members. Some months ago I prepared a code kata on design patterns, with 5 steps with instructions. The idea was to push the team to debate about different approaches to a common problem, and show them some classical design patterns, as a way to polish our weapons. The result was good, but the discussion only really happened at the end, when I showed them those patterns.

Some weeks later I heard about a code conference in Barcelona, organized by the Barcelona Software Craftsmanship group, so I took the chance to polish my kata and ask them to do in the event. It was rejected to the main event.

Later I heard about Monday’s katas: this group organizes every Monday a code kata with up to 20 developers. I offered my kata and our office to do it, and on December 12th we did it! All participants agree: the kata is smooth and induces to think about the subjects it later shows.

I published my kata on github. Have fun!

Category: Development