This talk covers the details of how to build a highly scalable image processing pipeline using Python and third-party open source libraries and tools such as OpenCV, NumPy, Tesseract, ImageMagick, Tornado, Nginx and MySQL.
We’ll take as an example the Python based pipeline we built at Endorse.com which processes hundreds of thousands of receipt pictures sent by our users via their mobile phone. Images get processed by a distributed architecture that extracts the product level purchase data and stores it in our back-end storage for handling by our downstream business layer.
After an overview of the architecture of the system, we will dive into the specifics of each area:
We will then open up for a Q&A session with the audience.