Wednesday
9 a.m.–12:20 p.m.
A beginner's introduction to Pydata: how to build a minimal recommendation engine.
- Audience level:
- Novice
- Category:
- Big Data
Description
In this tutorial we'll set ourselves the goal of building a minimal recommendation engine, and in the process learn about Python's excellent Pydata and related projects: numpy, pandas, and pytables.
A recommendation engine is a software system that analyzes large amounts of transactional data and distills personal profiles to present its users with relevant products/information/content.
Abstract
Environment setup
Checklist of the required software before entering the tutorial. Please make sure you can invoke a Python shell in your system and that pip install <blah>
works correctly.
- Python >= 2.6
- Virtualenv
- Pip
Update: See updated tutorial preparation instructions at A beginner's introduction to Pydata: how to build a minimal recommendation engine
The recommendation problem
Estimated duration: 10'
- Definition of a recommender system
- Problem statement
Different types of recommender systems
Estimated duration: 15'
- Content-based recommenders
- Collaborative filters
- Hybrid solutions
Our goal: a minimal content-based recommendation engine
Estimated duration: 15'
- Problem domain for our example: recommending grocery items
- Sample dataset
- Flow-chart of the intended system
- Write pseudo-code for the chosen recommendation strategy
A sample in-memory system: intro to Numpy
Estimated duration: 40'
pip install numpy
- The ndarray.
- Operations between 1-d arrays and 2-d arrays
- Basics on broadcasting rules
- Translate recommendation strategy into a simple numpy-based routine
Dealing with missing data: intro to Pandas
Estimated duration: 40'
pip install pandas
- The Series and DataFrame
- Descriptive stats of our sample dataset
- TBD
Adding a persistence layer: intro to Pytables
Estimated duration: 40'
pip install tables
- The HDF5 format
- Caching intermediate results
- TBD
Putting it all together
Estimated duration: 20'
- Back to the flow-chart: filling out the implementation details
- Where to go next