Tuesday 4:15 p.m.–5 p.m.
Python Sorted Collections
Grant Jenks
- Audience level:
- Intermediate
- Category:
- Python Libraries
Description
C++, Java, and .NET provide sorted collections types. Wish Python did too? Look around and you'll find Pandas DataFrame indexes, Sqlite in-memory databases, even redis-py sorted set commands. The SortedContainers module was designed to fill this gap with sorted list, dict and set implementations. It's written in pure-Python but generally faster than C-extension modules. Come see how it works.
Abstract
Python's standard library is great until you need sorted collections types. Most go far with the standard library's `bisect`, `heapq`, and `queue.PriorityQueue`. Others mistake `collections.OrderedDict` for a dictionary that maintains sort order. But when you really need a sorted list, dict, or set, you're faced with a dozen varying options on PyPI. Together we'll look at when to use sorted collections types and the tradeoffs of those options.
Of particular focus will be the SortedContainers module, an Apache2 licensed collections library, written in pure-Python and as fast as modules with C-extensions. Performance is a feature in SortedContainers and benchmarks across alternatives, runtimes, and workloads will be highlighted. After only a couple years, SortedContainers has gained popularity and unseated decade-old incumbents.
Our discussion will survey what makes SortedContainers so fast. We'll see why benchmarks matter most in claims about performance and why the strengths and weakness of modern processors affect how you choose your data structures. It's possible to write fast code in pure-Python! Come see where algorithms and theory intersect processors and machines.