Overview and getting started¶

Introduction¶
Let's start with an overview of IPython's architecture for parallel and distributed computing. This architecture abstracts out parallelism in a very general way, which enables IPython to support many different styles of parallelism including:
- Single program, multiple data (SPMD) parallelism
- Multiple program, multiple data (MPMD) parallelism
- Message passing using MPI or ØMQ
- Task farming
- Data parallel
- Coordination of distributed processes
- Combinations of these approaches
- Custom user defined approaches
Most importantly, IPython enables all types of parallel applications to
be developed, executed, debugged and monitored interactively. Hence,
the I in IPython. Some example use cases for
IPython.parallel:
Quickly parallelize algorithms that are embarrassingly parallel using a number of simple approaches. Many simple things can be parallelized interactively in one or two lines of code.
Steer traditional MPI applications on a supercomputer from an IPython session on your laptop.
Analyze and visualize large datasets (that could be remote and/or distributed) interactively using IPython and tools like matplotlib/TVTK.
Develop, test and debug new parallel algorithms (that may use MPI or PyZMQ) interactively.
Tie together multiple MPI jobs running on different systems into one giant distributed and parallel system.
Start a parallel job on your cluster and then have a remote collaborator connect to it and pull back data into their local IPython session for plotting and analysis.
Run a set of tasks on a set of CPUs using dynamic load balancing.
Architecture overview¶

The IPython architecture consists of four components:
- The IPython engine
- The IPython hub
- The IPython schedulers
- The cluster client
These components live in the IPython.parallel package and are
installed with IPython.
IPython engine¶
The IPython engine is a Python instance that accepts Python commands over a network connection. When multiple engines are started, parallel and distributed computing becomes possible. An important property of an IPython engine is that it blocks while user code is being executed. Read on for how the IPython controller solves this problem to expose a clean asynchronous API to the user.
IPython controller¶

The IPython controller processes provide an interface for working with a
set of engines. At a general level, the controller is a collection of
processes to which IPython engines and clients can connect. The
controller is composed of a Hub and a collection of
Schedulers, which may be in processes or threads.
The controller provides a single point of contact for users who
wish to utilize the engines in the cluster. There is a variety of
different ways of working with a controller, but all of these
models are implemented via the View.apply method, after
constructing View objects to represent different collections engines.
The two primary models for interacting with engines are:
- A Direct interface, where engines are addressed explicitly.
- A LoadBalanced interface, where the Scheduler is trusted with assigning work to appropriate engines.
Advanced users can readily extend the View models to enable other styles of parallelism.
The Hub¶
The center of an IPython cluster is the Hub. The Hub can be viewed as an über-logger, which keeps track of engine connections, schedulers, clients, as well as persist all task requests and results in a database for later use.
Schedulers¶
All actions that can be performed on the engine go through a Scheduler. While the engines themselves block when user code is run, the schedulers hide that from the user to provide a fully asynchronous interface to a set of engines. Each Scheduler is a small GIL-less function in C provided by pyzmq (the Python load-balanced scheduler being an exception).
ØMQ and PyZMQ¶
All of this is implemented with the lovely ØMQ messaging library, and pyzmq, the lightweight Python bindings, which allows very fast zero-copy communication of objects like numpy arrays.
IPython client and views¶
There is one primary object, the Client, for
connecting to a cluster. For each execution model, there is a
corresponding View. These views allow users to
interact with a set of engines through the interface. Here are the two
default views:
- The
DirectViewclass for explicit addressing. - The
LoadBalancedViewclass for destination-agnostic scheduling.
Getting Started¶
Starting the IPython controller and engines¶
To follow along with this tutorial, you will need to start the IPython
controller and four IPython engines. The simplest way of doing this is
to use the ipcluster command:
$ ipcluster start -n 4
There isn't time to go into it here, but ipcluster can be used to start engines and the controller with various batch systems including:
- SGE
- PBS
- LSF
- MPI
- SSH
- WinHPC
More information on starting and configuring the IPython cluster in the IPython.parallel docs.
Once you have started the IPython controller and one or more engines, you are ready to use the engines to do something useful.
To make sure everything is working correctly, let's do a very simple demo:
from IPython import parallel
rc = parallel.Client()
rc.block = True
rc.ids
[0, 1, 2, 3]
def mul(a,b):
return a*b
mul(5,6)
30
What does it look like to call this function remotely?
Just turn f(*args, **kwargs) into view.apply(f, *args, **kwargs)!
rc[0].apply(mul, 5, 6)
30
And the same thing in parallel?
rc[:].apply(mul, 5, 6)
[30, 30, 30, 30]
Python has a builtin map for calling a function with a variety of arguments
map(mul, range(1,10), range(2,11))
[2, 6, 12, 20, 30, 42, 56, 72, 90]
So how do we do this in parallel?
view = rc.load_balanced_view()
view.map(mul, range(1,10), range(2,11))
[2, 6, 12, 20, 30, 42, 56, 72, 90]
I'll go into further detail about different views, and asynchronous communication later. First, let's peek at the performance of the IPython.parallel infrastructure, so we can see what level of granularity we can consider using.