{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"\n",
"\n",
"\n",
"*This notebook contains an excerpt from the [Python Programming and Numerical Methods - A Guide for Engineers and Scientists](https://www.elsevier.com/books/python-programming-and-numerical-methods/kong/978-0-12-819549-9), the content is also available at [Berkeley Python Numerical Methods](https://pythonnumericalmethods.berkeley.edu/notebooks/Index.html).*\n",
"\n",
"*The copyright of the book belongs to Elsevier. We also have this interactive book online for a better learning experience. The code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work on [Elsevier](https://www.elsevier.com/books/python-programming-and-numerical-methods/kong/978-0-12-819549-9) or [Amazon](https://www.amazon.com/Python-Programming-Numerical-Methods-Scientists/dp/0128195495/ref=sr_1_1?dchild=1&keywords=Python+Programming+and+Numerical+Methods+-+A+Guide+for+Engineers+and+Scientists&qid=1604761352&sr=8-1)!*"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"\n",
"< [8.2 Complexity Matters](chapter08.02-Complexity-Matters.ipynb) | [Contents](Index.ipynb) | [8.4 Summary and Problems](chapter08.04-Summary-and-Problems.ipynb) >"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"# The Profiler"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Using the magic command\n",
"Even if it does not change the Big-O complexity of a program, many programmers will spend long hours to make their code run twice as fast or to gain even smaller improvements.\n",
"\n",
"There are ways to check the run time of the code in the Jupyter notebook, here we will introduce the magic commands to do that:\n",
"\n",
"* %time: Get the run time of a single statement\n",
"* %timeit: Get the repeated run time of a single statement\n",
"* %%time: Get the run time of all the code in the cell\n",
"* %%timeit: Get the repeated run time of a cell\n",
"\n",
"Notice that the double percent magic command will measure the run time for all the code in a cell, while the single percent command only works for a single statement. "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 6 µs, sys: 1 µs, total: 7 µs\n",
"Wall time: 9.06 µs\n"
]
},
{
"data": {
"text/plain": [
"19900"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"%time sum(range(200))"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.24 µs ± 70.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)\n"
]
}
],
"source": [
"%timeit sum(range(200))"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CPU times: user 15 µs, sys: 0 ns, total: 15 µs\n",
"Wall time: 17.9 µs\n"
]
}
],
"source": [
"%%time\n",
"s = 0\n",
"for i in range(200):\n",
" s += i"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"7.06 µs ± 414 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)\n"
]
}
],
"source": [
"%%timeit\n",
"s = 0\n",
"for i in range(200):\n",
" s += i"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"**WARNING!** Sometimes it is may not be proper to use the *timeit*, since it will run many loops for the code. If you have the code will run a long time, time it with many loops will take a really long time. "
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Use Python Profiler\n",
"\n",
"You could also use the Python profiler (you can read more in the Python documentation) to profile the code you write. In Jupyter notebook the magic commands are:\n",
"\n",
"* %prun: Run a single statement through the python code profiler\n",
"* %%prun: Run a cell through the python code profiler\n",
"\n",
"Let's see the following example, that sums random numbers over and over again. "
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"def slow_sum(n, m):\n",
"\n",
" for i in range(n):\n",
" # we create a size m array of random numbers\n",
" a = np.random.rand(m)\n",
"\n",
" s = 0\n",
" # in this loop we iterate through the array\n",
" # and add elements to the sum one by one\n",
" for j in range(m):\n",
" s += a[j] "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" "
]
}
],
"source": [
"%prun slow_sum(1000, 10000)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"You should see something like the following table:\n",
"\n",
"```\n",
" 1004 function calls in 1.413 seconds\n",
"\n",
" Ordered by: internal time\n",
"\n",
" ncalls tottime percall cumtime percall filename:lineno(function)\n",
" 1 1.320 1.320 1.413 1.413 :1(slow_sum)\n",
" 1000 0.093 0.000 0.093 0.000 {method 'rand' of 'mtrand.RandomState' objects}\n",
" 1 0.000 0.000 1.413 1.413 {built-in method builtins.exec}\n",
" 1 0.000 0.000 1.413 1.413 :1()\n",
" 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"The table is showing the following columns (from Python Profiler)\n",
"\n",
">**ncalls**: for the number of calls, \n",
"**tottime**: for the total time spent in the given function (and excluding time made in calls to sub-functions), \n",
"**percall**: is the quotient of tottime divided by ncalls \n",
"**cumtime** is the total time spent in this and all subfunctions (from invocation till exit). This figure is accurate even for recursive functions. \n",
"**percall** is the quotient of cumtime divided by primitive calls\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"## Use Line Profiler\n",
"\n",
"Many times, we want to get a sense which line in my code scripts takes a long time, so that we could rewrite this line to make it more efficient. This could be done using the *line\\_profiler*, which could profile the code line by line. But it is not shipped with Python, therefore, we need first install it. Then we could use the magic command:\n",
"\n",
"* %lprun: Run the line by line profile on a single statement."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Solving environment: done\n",
"\n",
"# All requested packages already installed.\n",
"\n"
]
}
],
"source": [
"# Note, you only need run this once. \n",
"!conda install line_profiler"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"After you installed this package, we can load the *line_profiler* extension:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"%load_ext line_profiler"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"The way we use the line_profiler to profile the code is shown as the following:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"button": false,
"collapsed": true,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"outputs": [],
"source": [
"%lprun -f slow_sum slow_sum(1000, 10000)"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"After you run the above command, we will get the results from the line by line profiling:\n",
"\n",
"```\n",
"Timer unit: 1e-06 s\n",
"\n",
"Total time: 6.1411 s\n",
"File: \n",
"Function: slow_sum at line 1\n",
"\n",
"Line # Hits Time Per Hit % Time Line Contents\n",
"==============================================================\n",
" 1 def slow_sum(n, m):\n",
" 2 \n",
" 3 1001 301.0 0.3 0.0 for i in range(n):\n",
" 4 # we create a size m array of random numbers\n",
" 5 1000 87876.0 87.9 1.4 a = np.random.rand(m)\n",
" 6 \n",
" 7 1000 439.0 0.4 0.0 s = 0\n",
" 8 # in this loop we iterate through the array\n",
" 9 # and add elements to the sum one by one\n",
" 10 10001000 2463579.0 0.2 40.1 for j in range(m):\n",
" 11 10000000 3588901.0 0.4 58.4 s += a[j]\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"We could see that the results include a summary for each line of the function, we clearly see that line 10 and 11 takes the majority time of the total running time. \n",
"\n",
"Usually when code takes longer to run than you would like, there is a **bottleneck** where much of the time is being spent. That is, there is a line of code that is taking much longer to execute than the other lines in the program. Addressing the bottleneck in a program will usually lead to the biggest improvement in performance, even if there are other areas of your code that are more easily improved.\n",
"\n",
"**TIP!** Start at the bottleneck when improving the performance of code."
]
},
{
"cell_type": "markdown",
"metadata": {
"button": false,
"new_sheet": false,
"run_control": {
"read_only": false
}
},
"source": [
"\n",
"< [8.2 Complexity Matters](chapter08.02-Complexity-Matters.ipynb) | [Contents](Index.ipynb) | [8.4 Summary and Problems](chapter08.04-Summary-and-Problems.ipynb) >"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.8"
}
},
"nbformat": 4,
"nbformat_minor": 2
}