PyCon 2019 in Cleveland, Ohio

Thursday 1:20 p.m.–4:40 p.m. in Room 19

Data Science Best Practices with pandas

Kevin Markham


The pandas library is a powerful tool for multiple phases of the data science workflow, including data cleaning, visualization, and exploratory data analysis. However, the size and complexity of the pandas library makes it challenging to discover the best way to accomplish any given task. In this tutorial, you'll use pandas to answer questions about multiple real-world datasets. Through each exercise, you'll learn important data science skills as well as "best practices" for using pandas. By the end of the tutorial, you'll be more fluent at using pandas to correctly and efficiently answer your own data science questions. Participants should have an intermediate knowledge of pandas and an interest in data science, but are not required to have any experience with the data science workflow. Datasets will be provided by the instructor.

Student Handout

No handouts have been provided yet for this tutorial