PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

Saturday 9 a.m.–12:20 p.m.

PyData 101: Essential data science skills for every programmer, from data to model to visualization

Andy Terrel, Christine Doig

Audience level:
Novice
Category:
Science

Description

Data Science is fun and with the PyData toolset something you can start to build with in an afternoon. Join us as we start with a few datasets, learn how to munge, model, and materialize into simple web applications for predictions. At the end of the day you will come away with a solid understanding of the PyData ecosystem and tools used everyday by data scientists.

Abstract

Data science applications are all around us. One can find directions on our phones, recommendations on e-commerce sites, or predictions of attacks on our servers. As the uses and opportunities to use data in our applications rise, it becomes an essential skill for programmers to be exposed to the ideas and paradigms of data science applications. We start from the beginning and build out concepts and demonstrations to begin your journey learning the wide world of data science applications. # Learning goals In this tutorial we will start with basic examples of taking raw data and producing insights. A set of exercises with different data properties will be given to help students work through the various challenges that exist working with data. For these tasks we will use the excellent Pandas library that has become the industry standard for representing building data applications. After we have a good understanding of our data sets and the observations present, we will begin the process of modelling, or predicting, phenomena with that data. Answering questions like "Will it rain today?" or "Will this shopper buy this book?" has never been easier with the Python data ecosystem. Using Scikit Learn, we will explore how to build a model and evaluate its fitness for use in real applications. Finally, having data and being able to predict phenomena is much less meaningful without a way to communicate it to the world. In the final exercise we will use the up and coming Bokeh library to build interactive visualizations that can be deployed to the web, or viewed on a local machine. By building an interactive exploratory data visualizations, you will be communicating the predictions in a way that appeals to mass audiences. Come learn the joy of building data applications the PyData way. # Tutorial Prerequisite Knowledge - A basic understanding of the Python programming language and - Understanding of basic mathematics and statistics - Curiosity and willingness to work on real life exercises # Installation Prerequisites - Basic libraries: - Pandas ([http://pandas.pydata.org](http://pandas.pydata.org)) - Scikit Learn ([http://scikit-learn.org/](http://scikit-learn.org)) - Bokeh ([http://bokeh.pydata.org](http://bokeh.pydata.org)) - Recommended ways to install: - Anaconda ([https://www.continuum.io/downloads](https://www.continuum.io/downloads)) - Python 2.7 or Python 3.3+ - Using pip from the command line: `pip install pandas sklearn bokeh` - Your favorite linux package management system - Other useful things to download - IPython notebook ([http://ipython.org/notebook.html](http://ipython.org/notebook.html)) - Your favorite census dataset ([http://www.census.gov/data/developers/data-sets.html](http://www.census.gov/data/developers/data-sets.html)) - An interesting research dataset ([https://archive.ics.uci.edu/ml/datasets.html](https://archive.ics.uci.edu/ml/datasets.html))

Student Handout

No handouts have been provided yet for this tutorial