Computer-based methods and tools now play a more and more critical role in modern drug discovery. Driven by an explosive amount of data collected in various stages of scientific studies every day, there is an increasingly immense interest in extracting the most value from these data to better guide and support the current and future scientific discoveries.
At Novartis Institutes for BioMedical Research (NIBR), we use Python as one of the major scientific programming languages within computational chemistry and cheminformatics research. We find Python a great language for quick prototyping thanks to its elegance, ease of use, flexibility and rich third-party libraries. We also take advantage of Python's excellent support for tapping extra performance by progressively replacing components in C/C++. As a case study, here, we present an example system which spans both hard-core computation and web-based delivery and takes full advantage of Python's support on quick prototyping and fast iteration.
Using Python and the RDKit, an open source cheminformatics toolkit that features a first-class Python API, we built an in-house tool to mine experiment data for relevant pharmacophores. A pharmacophore is a set of molecular features that could be responsible for biological relevance of a molecule, and learning pharmacophores is an important step in understanding the mechanism of drug activity. Python proves to be a perfect prototyping tool for building an initial version of the tool. For our tool, we employed Boost.Python to rewrite performance-critical components in C++ while keeping the rest of the code intact and the programming interface pythonic (read: nice to use).
To reach the largest audience possible, we developed a web-based UI on top the API. Leveraging the power of Django, Piston, and Celery, we were able to get the tool into the hands of scientists quickly. Fast iteration was essential in gradually shaping the tool and made it practical and useful. Tools like Django-South and Fabric allows us to continue improving and growing it in a progressive manner at very fast pace without disruptive side effects.