19. Millions of Genes with Python and Jython

Clint Howarth

Type:: Poster
Audience level:: Novice
Category:: Industry Uses

March 11th 8:25 a.m. – 8:30 a.m.

Description

The Analysis and Annotation Engineering group at the Broad Institute uses cpython and jython as fundamental technologies to help sequence, analyze, and publish hundreds of bacterial and viral genomes every year. This poster outlines how we use python to allow our small team to accomplish big things.

Abstract

Millions of Genes with Python and Jython

Clint Howarth, Maia Hansen, Jeffrey Larimer, Matthew Pearson, Andrew Roberts, Christian Stolte, Shailu Gargeya, Jennifer Wortman, Bruce Birren

Broad Institute of Harvard and MIT, Genome Sequencing & Analysis Program, Cambridge, MA, 02142

The Broad Institute is a world leader in genomic research. Our scientists sequence, analyze, and publish hundreds of genomes every year, containing millions of genes and billions of nucleotides. We study the bacteria that cause tuberculosis, the parasite responsible for malaria, the human and infant microbiomes, the West Nile and Dengue viruses, and many, many more organisms and projects of importance to human health. In so doing, we consume and produce a tremendous amount of data.

The Analysis and Annotation Engineering group (A2E) uses cpython and jython as fundamental technologies to help our scientists keep up with every order of magnitude of growth. Jython powers our fast, nimble, and friendly application layer in front of a solid Java/Hibernate-managed database and job management model. We recently replaced our awkward Java web stack with a sleek cpython-Flask genome navigation web application (olive.broadinstitute.org). Finally, we have a growing suite of novel python applications to help the infectious diseases community in its work: gene naming (Genepidgin, open source), workflow management via an internal django application (Atomation), and visualizing genomes through multiple annotations over time (Accordion, open source, coming 2012).

A2E is investing heavily in python as our primary development platform, because using python and jython allows our small team to accomplish big things.

This project has been funded in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No.: HHSN272200900018C.