top band

DistArray - Distributed array computing for Python

Kurt Smith, Robert Grant

Audience level:
Python Libraries


DistArray is an up-and-coming Python package providing distributed NumPy-like multidimensional arrays, ufuncs, and IO to bring the strengths of NumPy to data-parallel high-performance computing (HPC). We build on widely-used Python HPC libraries and have introduced the Distributed Array Protocol to exchange arrays without copying with external distributed libraries like Trilinos.


DistArray provides general multidimensional NumPy-like distributed arrays to Python. It intends to bring the strengths of NumPy to data-parallel high-performance computing. DistArray has a similar API to NumPy. DistArray is for users who - know and love Python and NumPy, - want to scale NumPy to larger distributed datasets, - want to interactively play with distributed data but also - want to run batch-oriented distributed programs, - want an easier way to drive and coordinate existing MPI-based codes, - have a lot of data that may already be distributed, - want a global view ("think globally") with local control ("act locally"), - need to tap into existing parallel libraries like Trilinos, PETSc, or Elemental, and - want the interactivity of IPython and the performance of MPI. DistArray is designed to work with other packages that implement the [Distributed Array Protocol]( Distributed Array Protocol). DistArray was started by Brian Granger in 2008 and is currently being developed at Enthought by a team led by Kurt Smith, in partnership with Bill Spotz from Sandia's (Py)Trilinos project.
bottom band background