In this tutorial we shall review three different and distinct approaches to parallel computing which can be used to solve problems in all manner of domains, including machine learning, natural language processing, finance, and computer vision. The first two approaches to be reviewed will be embarrassingly parallel in nature while the third approach will leverage fine-grain parallelism.
Perhaps Gregory Pfister said it best in this book, In Search of Clusters. To paraphrase, there are three ways to do anything faster: work harder, work smarter or get help. In computer-speak, this roughly translates to: increase processor speed, improve algorithms or exploit parallelism. With processor speeds no longer doubling every eighteen months and little or no room left for improvements in serial algorithms, exploiting parallelism is the one frontier with the potential for delivering huge improvements in performance. In this tutorial we shall review three different and distinct approaches to parallel computing which can be used to solve problems in all manner of domains, including machine learning, natural language processing, finance, and computer vision. The first two approaches to be reviewed will be embarrassingly parallel in nature while the third approach will leverage fine-grain parallelism.
At the conclusion of the tutorial, the audience will posses a conceptual understanding of not only the what/why/how on general purpose parallelism, but also have a much better appreciation of the tradeoffs involved when exploiting three specific forms of parallelism.
All examples will leverage open source packages. However, given the rather large number of package dependencies and the size of datasets, we will provide a small VirtualBox image using Ubuntu for host OS Windows, OS X and Linux with prerequisites pre-installed. A link to the download location as well as instructions for validating the setup will be provided two weeks before the tutorial. We encourage attendees to try out the VirtualBox image as soon as possible.
Background on the need for parallelism and review of the parallel landscape given the rise of multi-core architectures on cheap hardware
Basic terminology on parallelism that not only accounts for the moment but transcends the moment; i.e. terminology that can be used to better understand both existing and proposed parallel solutions.
Survey of different and distinct forms of parallelism: inter-node (i.e. across servers), and intra-node (i.e. within each server)
Update: See updated tutorial preparation instructions at Applied Parallel Computing with Python - Essential VirtualBox