Wednesday 1:20 p.m.–4:40 p.m. in Room 22

Faster Python Programs - Measure, don't Guess

Mike Müller

Description

Optimization can often help to make Python programs faster or use less memory. Developing a strategy, establishing solid measuring and visualization techniques as well as knowing about algorithmic basics and datastructures are the foundation for a successful optimization. The tutorial will cover these topics. Examples will give you a hands-on experience on how to approach efficiently. Python is a great language. But it can be slow compared to other languages for certain types of tasks. If applied appropriately, optimization may reduce program runtime or memory consumption considerably. But this often comes at a price. Optimization can be time consuming and the optimized program may be more complicated. This, in turn, means more maintenance effort. How do you find out if it is worthwhile to optimize your program? Where should you start? This tutorial will help you to answer these questions. You will learn how to find an optimization strategy based on quantitative and objective criteria. You will experience that one's gut feeling what to optimize is often wrong. The solution to this problem is: „Measure, Measure, and Measure!“. You will learn how to measure program run times as well as profile CPU and memory. There are great tools available. You will learn how to use some of them. Measuring is not easy because, by definition, as soon as you start to measure, you influence your system. Keeping this impact as small as possible is important. Therefore, we will cover different measuring techniques. Furthermore, we will look at algorithmic improvements. You will see that the right data structure for the job can make a big difference. Finally, you will learn about different caching techniques. ## Software Requirements You will need Python 2.7 or 3.6 installed on your laptop. Python 2.6 or 3.4/3.5 should also work. Python 3.x is strongly preferred. If released, we will use Python 3.7. ### Jupyter Notebook I will use a Jupyter Notebook for the tutorial because it makes a very good teaching tool. You are welcome to use the setup you prefer, i.e editor, IDE, REPL. If you also like to use a Jupyter Notebook, I recommend `conda` for easy installation. Similarly to `virtualenv`, `conda` allows creating isolated environments but allows binary installs for all platforms. There are two ways to install `Jupyter` via `conda`: 1. Use [Minconda][1]. This is a small install and (after you installed it) you can use the command `conda` to create an environment: `conda create -n pycon2018 python=3.6` Now you can change into this environment: `source activate pycon2018`. The prompt should change to `(pycon2018)`. Now you can install IPython: `conda install Jupyter`. 2. Install [Anaconda][2] and you are ready to go if you don't mind installing lots of packages from the scientific field. ### Working witch `conda` environments After creating a new environment, the system might still work with some stale settings. Even when the command `which` tells you that you are using an executable from your environment, this might actually not be the case. If you see strange behavior using a command line tool in your environment, use ``hash -r`` and try again. ### Tools You can install these with `pip` (in the active `conda` environment with `conda`): * [SnakeViz][3] * [line_profiler][4] * [Pympler][6] * [memory_profiler][7] [1]: https://conda.io/miniconda.html [2]: http://continuum.io/downloads [3]: http://jiffyclub.github.io/snakeviz/ [2]: http://continuum.io/downloads [4]: https://pypi.python.org/pypi/line_profiler/ [6]: https://pypi.python.org/pypi/Pympler [7]: https://pypi.python.org/pypi/memory_profiler

Student Handout

No handouts have been provided yet for this tutorial