Clint Howarth, Maia Hansen, Jeffrey Larimer, Matthew Pearson, Andrew Roberts, Christian Stolte, Shailu Gargeya, Jennifer Wortman, Bruce Birren
Broad Institute of Harvard and MIT, Genome Sequencing & Analysis Program, Cambridge, MA, 02142
The Broad Institute is a world leader in genomic research. Our scientists sequence, analyze, and publish hundreds of genomes every year, containing millions of genes and billions of nucleotides. We study the bacteria that cause tuberculosis, the parasite responsible for malaria, the human and infant microbiomes, the West Nile and Dengue viruses, and many, many more organisms and projects of importance to human health. In so doing, we consume and produce a tremendous amount of data.
The Analysis and Annotation Engineering group (A2E) uses cpython and jython as fundamental technologies to help our scientists keep up with every order of magnitude of growth. Jython powers our fast, nimble, and friendly application layer in front of a solid Java/Hibernate-managed database and job management model. We recently replaced our awkward Java web stack with a sleek cpython-Flask genome navigation web application (olive.broadinstitute.org). Finally, we have a growing suite of novel python applications to help the infectious diseases community in its work: gene naming (Genepidgin, open source), workflow management via an internal django application (Atomation), and visualizing genomes through multiple annotations over time (Accordion, open source, coming 2012).
A2E is investing heavily in python as our primary development platform, because using python and jython allows our small team to accomplish big things.
This project has been funded in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No.: HHSN272200900018C.