Python is used in a large number of web sites where the performance of the web tier is a significant cost. There are multiple ways to improve the performance of these applications: improving the Python code itself, moving code out of Python using tools like Cython, and extreme options like directly improving the performance of the Python interpreter.
In this talk we’ll explore some of the changes we’ve made to the CPython runtime to improve the performance of our workload. We’ll start with a high level overview of our architecture which isn’t atypical for a Python web application and see opportunities and challenges that has provided for optimization. Then we’ll go deep down the rabbit hole and look at common hot spots in the Python runtime and the results we’ve had in reducing the overhead of them.
Along the way we’ll look at both targeted optimization opportunities and classic techniques such as inline caching, a JIT compiler, and leveraging type annotations for performance. We’ll cover techniques that we’ve proven successful, and ones that are still experimental. We’ll see how these can be applied to the Python runtime and what are the performance results of doing so: overall we’ve seen a 20-30% improvement in our production workload and up to 7x improvement on benchmarks.