../_images/book_cover.jpg

This notebook contains an excerpt from the Python Programming and Numerical Methods - A Guide for Engineers and Scientists, the content is also available at Berkeley Python Numerical Methods.

The copyright of the book belongs to Elsevier. We also have this interactive book online for a better learning experience. The code is released under the MIT license. If you find this content useful, please consider supporting the work on Elsevier or Amazon!

< 13.2 Multiprocessing | Contents | 13.4 Summary and Problems >

Use joblib

In Python, there are also other 3rd party packages that can make the parallel computing easier, especially for some daily tasks. joblib is one of them, it provides an easy simple way to do parallel computing (it has many other usages as well).

First you need to install it by running

pip install joblib

Let’s see how can we run the previous example using this new package.

from joblib import Parallel, delayed
import numpy as np

def random_square(seed):
    np.random.seed(seed)
    random_num = np.random.randint(0, 10)
    return random_num**2
results = Parallel(n_jobs=8)\
    (delayed(random_square)(i) for i in range(1000000))

We can see the parallel part of the code becomes one line by using the joblib library, which is very convenient. The Parallel is a helper class that essentially provides a convenient interface for the multiprocessing module we saw before. The delayed is used to capture the arguments of the target function, in this case, the random_square. We run the above code with 8 CPUs, if you want to use all of computational power on your machine. You can use all CPUs on your machine by setting n_jobs=-1. If you set it to -2, all CPUs but one are used.

Besides, you can turn on the verbose argument to output the status messages.

results = Parallel(n_jobs=-1, verbose=1)\
    (delayed(random_square)(i) for i in range(1000000))
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done  60 tasks      | elapsed:    0.1s
[Parallel(n_jobs=-1)]: Done 176056 tasks      | elapsed:    3.0s
[Parallel(n_jobs=-1)]: Done 787056 tasks      | elapsed:   12.4s
[Parallel(n_jobs=-1)]: Done 1000000 out of 1000000 | elapsed:   15.5s finished

There are multiple backends in joblib, which means using different ways to do the parallel computing. If you set the backend as multiprocessing, under the hood, it is actually create a multiprocessing pool that uses separate Python woker processes to execute tasks concurrently on separate CPUs.

results = \
    Parallel(n_jobs=-1, backend='multiprocessing', verbose=1)\
    (delayed(random_square)(i) for i in range(1000000))
[Parallel(n_jobs=-1)]: Using backend MultiprocessingBackend with 12 concurrent workers.
[Parallel(n_jobs=-1)]: Done 220 tasks      | elapsed:    0.0s
[Parallel(n_jobs=-1)]: Done 457032 tasks      | elapsed:    1.9s
[Parallel(n_jobs=-1)]: Done 1000000 out of 1000000 | elapsed:    3.8s finished

< 13.2 Multiprocessing | Contents | 13.4 Summary and Problems >