The Journey to Give Every Scientist a Supercomputer

Type:
Talk
Audience level:
Novice
Category:
Cloud
March 10th 10:25 a.m. – 11:05 a.m.

Description

The recent cloud buzz has hugely benefited Python web devs. But, for Python's formidable scientific community, the cloud has been less ambitious--until now. PiCloud is a Python-based cloud platform that tackles a noble cause: giving every scientist in the world instant access to a supercomputer. The talk will cover how Python inspired the design of PiCloud, which has now processed over 100M jobs.

Abstract

The Journey to Give Every Scientist a Supercomputer

Background (2 min)

The recent cloud buzz has largely benefited Python web application developers using web frameworks such as Django. Google AppEngine, for example, allows anybody in the world to easily host their own website. But, for Python's formidable scientific community, the cloud has been noticeably less helpful--until now. PiCloud is a Python-based cloud platform that tackles a noble cause: giving every scientist in the world instant access to a supercomputer. The talk will cover how Python inspired the design of PiCloud, which has now processed over 100M jobs.

PiCloud's Vision (3 min)

PiCloud offers the easiest way to utilize the cloud for compute-intensive applications; more specifically, applications in scientific computing, high-performance computing, and batch processing. With only a couple lines of Python code, scientists, developers, and engineers on our platform can leverage thousands of cores of computational power on-demand.

Our goal is to make computing power a utility for scientists in the same way that electricity is for modern society: available to everyone, seemingly infinite in quantity, and readily accessible with a flip of a switch. We achieve this by being a serverless cloud. In other words, our users each get the power of a supercomputer at their fingertips without having to design, provision, or administer any servers.

picture alt

How Python Inspired PiCloud (10 min)

PiCloud was started thanks to a serendipitous collision of computer vision and a curiosity for the Python language. In 2009, we were working on a fun application called AutoTagger to automatically tag peoples faces on Facebook. While we coded the project in Python and C++ using extensions, we realized we were spending over half of our time administering our cluster to process thousands of photos in parallel, rather than on our Python vision algorithms. At the same time, being fairly new to Python, we were exploring some of Python's more intricate features: pickling, function and module introspection, variable keyword arguments, and higher-order functions.

With Python's amazing toolbox at our disposable, we realized we could provide a Pythonic way for anyone, but especially scientists, to use an innumerable number of cores without ever having to halt the development of their algorithms to manage servers. The result was the cloud package, which takes advantage of many of Python's introspective features to automatically move computational work to the cloud. Here's a simple example to showcase how easy it is:

import cloud
from autotagger.vision import face_detector
from autotagger.facebook import get_friends_photos

list_of_photos = get_friends_photos()

# detect the faces of my friends in parallel on the cloud
job_ids = cloud.map(face_detector, list_of_photos)

Two new lines of code unlock the power of potentially thousands of cores on the cloud. We'll also show the a-ha moment where we realized the automagic of PiCloud was possible.

Accelerating the Speed of Science (8 min)

Python's strength is its diversity of users. Thanks to its readability, ease of use, and extensive libraries (NumPy, SciPy, etc...), Python is popular among all scientists, rather than just computer scientists. Likewise, PiCloud's focus on simplicitly caters to those who aren't necessarily programmers by training. We believe accessibility is critical in an age where science has become increasingly reliant on server farms for data analysis.

Examples of what's been done on PiCloud with Python:

We've opened up the accessibility and availability of computing to everyone in the world who doesn't work at a national laboratory, or a Google. It's for this reason that we were credited by a user as "accelerating the speed of science." PiCloud has processed over 100 million jobs for our three thousand users worldwide.

Going from Concept to Platform (5 min)

Initially as a two-man team, we were hard-pressed to develop fast and release quickly to validate our idea to ourselves and potential investors. Python was a great choice of language for us because it allowed us to develop fast without sacrificing readability with only modest sacrifices in performance; we made sure that all the heavy lifting was done in Python libraries that pushed worked to C. We're happy to share stories of features that had to be built in 0 days, and were thanks to Python. :)

For audience members looking to start a new project: There are a lot of technology building blocks are readily available and robust that make it incredibly easy to go from concept to product in a short time frame with a small team.

Architecture (5 min)

Since PiCloud is built almost entirely in Python for Python, we will share novel portions of our architecture. This section is dedicated to educating the audience on the next-generation of distributed systems that are made possible by the ease of server provisioning on the cloud and Python.

  • PiCloud grows and shrinks on its own. Our automated scaling system written in Python using SciPy estimates the computational load on our servers in real-time and without human-intervention scales the number of servers in our farm to match demand.
  • PiCloud has a customized serializer (we've open-sourced it), which allows for the automatic serialization of files, modules, and byte code for anonymous functions, to make the process of offloading computation to the cloud as easy as possible.
  • Extremely fast scheduler written in pure Python
  • Control Panel using Django

Conclusion (1 min)

We owe our gratitude to the entire Python community. More than just forming the bedrock of our platform, they've enabled us to change the face of parallel computing and empower thousands of scientists around the world.