Those are the setup instructions to prepare the tutorial:
Advanced Machine Learning with scikit-learn
We will use Python 2.7 as support for Python 3 is not yet 100% there... (working on it). Python 2.6 should also mostly work for the tutorial.
We will need the following packages:
Under Windows, the easiest way to install recent binary packages for all of this is probably to get them from Christoph Gohlke's Python Package binary archive.
Be careful downloading the 32 bit versions if you have the 32 bit version of Python or the 64 bit otherwise. We won't need more than 2GB or RAM so both versions should work for the tutorial.
Launch a new IPython notebook session by typing the following in a console (without the $
prompt):
$ ipython notebook
The web browser should open a new window or tab for the IPython user interface: click the "New Notebook" button, then try to import all the modules by typing:
In [1]: import numpy
In [2]: import scipy
In [3]: import pylab
In [4]: import sklearn
In [5]: import IPython.parallel
In [6]: import psutil
If get any error message, please send me and email at olivier.grisel@ensta.org with [PyCon 2013 Tutorial] in the object and:
Updated: download the dataset archive: datasets.zip (~100MB)
Updated: download the tutorial material archive from github: parallel_ml_tutorial-master.zip and unzip it.
Or:
git clone https://github.com/ogrisel/parallel_ml_tutorial.git
You can then put the datasets.zip
inside the parallel_ml_tutorial
folder and run:
python fetch_data.py
from there so as to unzip the datasets and make the data files ready.
There will also be a set of USB keys with the material available during the tutorial itself but it's faster to download it before the session.
You can also have a look at the README of the parallel_ml_tutorial repo on github.
scikit-learn uses the numpy array datastructure extensively. If you are not familiar with it, you should have a look at the first chapters of this tutorial. You should also get familiar with the scipy sparse datastructures such as CSR and COO matrices.
This tutorial targets people with prior experience will scikit-learn. If you are new to scikit-learn and have not registered for Jake's introductory tutorial at PyCon, it is strongly advised to follow the tutorials from the official documentation or from the SciPy Lecture Notes.