Saturday 2:35 p.m.–3:05 p.m.
Data intensive biology in the cloud: instrumenting ALL the things
Titus Brown
- Audience level:
- Intermediate
- Category:
- Science
Description
Cloud computing offers some great opportunities for science, but most
cloud computing platforms are I/O and memory limited, and hence
are poor matches for data-intensive computing. After 4 years of
research software development we are now instrumenting and benchmarking
our analysis pipelines; numbers, lessons learned, and future plans
will be discussed. Everything is open source.
Abstract
The cloud provides great opportunities for a variety of important
computational science challenges, including reproducible science,
standardized computational workflows, comparative benchmarking, and
focused optimization. It can also help be a disruptive force for the
betterment of science by eliminating the need for large infrastructure
investments and supporting exploratory computational science on
previously challenging scales. However, most cloud computing use in
science so far has focused on relatively mundane "pleasantly parallel"
problems. Our lab has spent many moons addressing a large,
non-parallelizable "big data/big graph" problem -- sequence assembly
-- with a mixture of Python and C++, some fun new data structures and
algorithms, and a lot of cloud computing. Most recently we have been
working on open computational "protocols", worfklows, and pipelines
for democritizing certain kinds of sequence analysis. As part of this
work we are tackling issues of standardized test data sets to support
comparative benchmarking, targeted optimization, reproducible science,
and computational standardization in biology. In this talk I'll
discuss our efforts to understand where our computational bottlenecks
are, what kinds of optimization and parallelization efforts make sense
financially, and how the cloud is enabling us to be usefully
disruptive. As a bonus I'll talk about how the focus on pleasantly
paralellizable tasks has warped everyone's brains and convinced them
that engineering, not research, is really interesting.