This notebook contains an excerpt from the Python Programming and Numerical Methods - A Guide for Engineers and Scientists, the content is also available at Berkeley Python Numerical Methods.

The copyright of the book belongs to Elsevier. We also have this interactive book online for a better learning experience. The code is released under the MIT license. If you find this content useful, please consider supporting the work on Elsevier or Amazon!

< 8.2 Complexity Matters | Contents | 8.4 Summary and Problems >

The Profiler

Using the magic command

Even if it does not change the Big-O complexity of a program, many programmers will spend long hours to make their code run twice as fast or to gain even smaller improvements.

There are ways to check the run time of the code in the Jupyter notebook, here we will introduce the magic commands to do that:

Notice that the double percent magic command will measure the run time for all the code in a cell, while the single percent command only works for a single statement.

%time sum(range(200))
CPU times: user 6 µs, sys: 1 µs, total: 7 µs
Wall time: 9.06 µs
%timeit sum(range(200))
1.24 µs ± 70.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
s = 0
for i in range(200):
    s += i
CPU times: user 15 µs, sys: 0 ns, total: 15 µs
Wall time: 17.9 µs
s = 0
for i in range(200):
    s += i
7.06 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

WARNING! Sometimes it is may not be proper to use the timeit, since it will run many loops for the code. If you have the code will run a long time, time it with many loops will take a really long time.

Use Python Profiler

You could also use the Python profiler (you can read more in the Python documentation) to profile the code you write. In Jupyter notebook the magic commands are:

Let’s see the following example, that sums random numbers over and over again.

import numpy as np
def slow_sum(n, m):

    for i in range(n):
        # we create a size m array of random numbers
        a = np.random.rand(m)

        s = 0
        # in this loop we iterate through the array
        # and add elements to the sum one by one
        for j in range(m):
            s += a[j]   
%prun slow_sum(1000, 10000)

You should see something like the following table:

         1004 function calls in 1.413 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    1.320    1.320    1.413    1.413 <ipython-input-20-cc5de53096ac>:1(slow_sum)
     1000    0.093    0.000    0.093    0.000 {method 'rand' of 'mtrand.RandomState' objects}
        1    0.000    0.000    1.413    1.413 {built-in method builtins.exec}
        1    0.000    0.000    1.413    1.413 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

The table is showing the following columns (from Python Profiler)

ncalls: for the number of calls,
tottime: for the total time spent in the given function (and excluding time made in calls to sub-functions),
percall: is the quotient of tottime divided by ncalls
cumtime is the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions.
percall is the quotient of cumtime divided by primitive calls

Use Line Profiler

Many times, we want to get a sense which line in my code scripts takes a long time, so that we could rewrite this line to make it more efficient. This could be done using the line_profiler, which could profile the code line by line. But it is not shipped with Python, therefore, we need first install it. Then we could use the magic command:

# Note, you only need run this once. 
!conda install line_profiler
Solving environment: done

# All requested packages already installed.

After you installed this package, we can load the line_profiler extension:

%load_ext line_profiler

The way we use the line_profiler to profile the code is shown as the following:

%lprun -f slow_sum slow_sum(1000, 10000)

After you run the above command, we will get the results from the line by line profiling:

Timer unit: 1e-06 s

Total time: 6.1411 s
File: <ipython-input-20-cc5de53096ac>
Function: slow_sum at line 1

Line #      Hits         Time  Per Hit   % Time  Line Contents
     1                                           def slow_sum(n, m):
     3      1001        301.0      0.3      0.0      for i in range(n):
     4                                                   # we create a size m array of random numbers
     5      1000      87876.0     87.9      1.4          a = np.random.rand(m)
     7      1000        439.0      0.4      0.0          s = 0
     8                                                   # in this loop we iterate through the array
     9                                                   # and add elements to the sum one by one
    10  10001000    2463579.0      0.2     40.1          for j in range(m):
    11  10000000    3588901.0      0.4     58.4              s += a[j]

We could see that the results include a summary for each line of the function, we clearly see that line 10 and 11 takes the majority time of the total running time.

Usually when code takes longer to run than you would like, there is a bottleneck where much of the time is being spent. That is, there is a line of code that is taking much longer to execute than the other lines in the program. Addressing the bottleneck in a program will usually lead to the biggest improvement in performance, even if there are other areas of your code that are more easily improved.

TIP! Start at the bottleneck when improving the performance of code.

< 8.2 Complexity Matters | Contents | 8.4 Summary and Problems >