February 2018 – Liopic on Tech

At the end of 2016 I was still amazed with the result of AlphaGo vs. Lee Sedol match in March (for the 1st time a machine beats a top professional Go player), and at the same time I was looking for a subject to focus on in 2017, so I chose Machine Learning. During my university years I tried out some related tools (genetic algorithms, basic neural networks, etc), but for 10 years I’d not looked at it again.

The first stop was the famous Machine Learning course by Andrew Ng in Coursera, as everybody points you there. Despite it explains a lot of complex stuff in an intuitive way, soon you get tired of so much maths and using Octave/Matlab, when you should be using Python.

After one year learning about Machine Learning, I think I have quite a list of recommendations on how to start exploring the field. Disclaimer: this could be related with my preferred way of learning, that is, with text instead of videos. This could be a good way to start if you have no previous experience:

Do not watch that coursera’s ML course, but just read the notes somebody took on it instead.
Learn about Python, but specially about the libraries Numpy, Pandas and scikit-learn. Also how to run a jupyter notebook. And the best way to install them all is via Anaconda distribution.
Buy a copy (paper or ebook) of the book “Python Machine Learning” by Sebastian Raschka.
Join Kaggle and have a look at the Titanic tutorials, and it’s new Learn section. They also have a video-course in Udacity in case you like watching videos.
Don’t be in a rush to learn deep-learning (aka neural networks), because you’ll first have to learn about classic ML models, but also a lot of related processes: data cleaning, feature engineering and data visualization.

My first real-world input was in May, when I attended PyData conference in Barcelona, which was a turning point: I found lots of ideas to apply, but over all I felt the industry’s pulse.

During summer I challenge myself to apply it at work and to do a conference talk. The subject was customer segmentation using non-supervised algorithms, using a dataset I prepared myself from our company’s data. Finally the talk became a 2-hour workshop.

It was the first time I did a presentation about Machine Learning in English. Despite the audience was satisfied with the workshop and some people had interesting conversation after, I felt that I should’ve work harder while preparing it.

As 2017 finished and 2018 started I’ll continue focusing on ML, but with a more practical approach. In my day work we have developed a recommendation system that will evolve with several ML models working together, and after work I’ll try to play more with Kaggle, taking part in some competitions.

In 2018, I’ll try deep learning too: both with Andrew Ng’s course with Tensorflow, a creative apps course and some video-tutorials on PyTorch. I’ll try to improve my engineering approach to ML, as things like version control, testing and deployment are very rare to see in a world with more university people than industry ones. Finally I plan to complete a nice course on data visualization with D3.js.

I hope all these links help somebody too!

Month: February 2018

2017 focus: ML