Introduction to Parallel Computing on an NVIDIA GPU using PyCUDA
log in to bookmark this presentaton
Abstract
There are two approaches to parallelizing a computationally heavy procedure: use a messaging queue such as AMQP to distribute tasks among a networked cluster or increase the number of processors in a single machine. This talk focuses on techniques for adapting mathematical code to run on specialized multi-core graphic processors.
Modern graphic processors have hard-coded transistors for common vector and matrix operations, making them ideal for general scientific computing. However, the NVIDIA CUDA's unique design requires knowledge of its hardware to adapt algorithms effectively. This talk covers basic CUDA architecture, API functions and several examples to illustrate the different kinds of problems that will benefit from parallelization.