Applied machine learning in python with scikit-learn
log in to bookmark this presentaton
Abstract
The demand for software engineers with Data Analytics and Machine Learning skills is rapidly growing and python / numpy is one of the best environment for quickly prototyping scalable data-centric applications.
scikit-learn is a very active open source project that implements a variety of state-of-the art machine learning algorithms. The goal of this project and tutorial is to take the algorithms out of the academic papers and make them work on a selection of real world tasks to unleash the value of your data.
We will focus providing hints to perform the right data preprocessing steps and how select the right algorithm and parameters. We will also introduce tools and methodologies to measure the performance of the trained models as objectively as possible.
Prior experience with standard file processing in python (parsing HTML, json, ...) and numerical computation with numpy is highly recommended. An undergrad level in maths will help gain some theoretical insights but is not required to go through the exercises.