Data scientists often need to write a lot of complex, slow, CPU- and I/O-heavy code — whether you’re working with large matrices, millions of rows of data, reading in data files, or web-scraping.
Wouldn’t you hate to waste your time refactoring one section of your code, trying to wring out every last ounce of performance, when a few simple changes to another section could speed up your code tenfold?
If you’re looking for a way to speed up your code, a profiler can show you exactly which parts are taking the most time, allowing you to see which sections would benefit most from optimization.
A profiler measures the time or space complexity of a program. There’s certainly value in theorizing about the big O complexity of an algorithm but it can be equally valuable to examine the real complexity of an algorithm.
Where is the biggest slowdown in your code? Is your code I/O bound or CPU bound? Which specific lines are causing the slowdowns?
Once you’ve answered those questions you’ll A) have a better understanding of your code and B) know where to target your optimization efforts in order to get the biggest boon with the least effort.
Let’s dive into some quick examples using Python.
The Basics
You might already be familiar with a few methods of timing your code. You could check the time before and after a line executes like this:
In [1]: start_time = time.time()
...: a_function() # Function you want to measure
...: end_time = time.time()
...: time_to_complete = end_time - start_time
...: time_to_complete
Out[1]: 1.0110783576965332
Or, if you’re in a Jupyter Notebook, you could use the magic %time
command to time the execution of a statement, like this:
In [2]: %time a_function()
CPU times: user 14.2 ms, sys: 41 µs, total: 14.2 ms
Wall time: 1.01 s
Or, you could use the other magic command %timeit
which gets a more accurate measurement by running the command multiple times, like this:
In [3]: %timeit a_function()
1.01 s ± 1.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Alternatively, if you want to time your whole script, you can use the bash command time
, like so…
$ time python my_script.py
real 0m1.041s
user 0m0.040s
sys 0m0.000s
These techniques are great if you want to get a quick sense of how long a script or a section of code takes to run but they are less useful when you want a more comprehensive picture. It would be a nightmare if you had to wrap each line in time.time()
checks. In the next section, we’ll look at how to use Python’s built-in profiler.
Diving Deeper with cProfile
When you’re trying to get a better understanding of how your code is running, the first place to start is cProfile, Python’s built-in profiler. cProfile will keep track of how often and for how long parts of your program were executed.
Just keep in mind that cProfile shouldn’t be used to benchmark your code. It’s written in C which makes it fast but it still introduces some overhead that could throw off your times.
There are multiple ways to use cProfile but one simple way is from the command line.
Before we demo cProfile, let’s start by looking at a basic sample program that will download some text files, count the words in each one, and then save the top 10 words from each to a file. Now that being said, it isn’t too important what the code does, just that we’ll be using it to show how the profiler works.
import re
from collections import Counter
import urllib.request
def get_book(url):
"""Load text from a URL"""
with urllib.request.urlopen(url) as response:
text = response.read().decode(errors='ignore')
return text.lower()
def split_words(book):
"""Extract the words with regex"""
return re.split("[^A-Za-z]+",book)
def count_words(words):
"""Create a dictionary with word counts"""
return Counter(words)
def read_books(urls):
"""For each url in urls,
load the book and count the words"""
# Create a place to store word counts
word_counts = {}
# For each book: load it, count words, store the counts
for title, path in urls.items():
book = get_book(path)
words = split_words(book)
counts = count_words(words)
word_counts[title] = counts.most_common()[:10]
return word_counts
def save_results(results,path):
"""Save the results to a text file"""
with open(path,'w') as f:
for book, words in results.items():
f.write(f"BOOK: {book}")
for word, count in words:
f.write(f"{' '*8}{word:10s}{count:6d}")
urls = {
'pride-and-prejudice': \
'https://www.gutenberg.org/files/1342/1342-0.txt',
'alice-in-wonderland': \
'https://www.gutenberg.org/files/11/11-0.txt',
'sherlock-holmes': \
'https://www.gutenberg.org/files/1661/1661-0.txt',
'moby-dick': \
'https://www.gutenberg.org/files/2701/2701-0.txt',
'count-of-monte-cristo': \
'https://www.gutenberg.org/files/1184/1184-0.txt'
}
output_file = "my-results.txt"
# Download the books and count words
results = read_books(urls)
# Save the results to a text file
save_results(results,output_file)
Now, with the following command, we’ll profile our script.
$ python -m cProfile -o profile.stat script.py
The -o
flag specifies an output file for cProfile to save the profiling statistics.
Next, we can fire up python to examine the results using the pstats module (also part of the standard library).
In [1]: import pstats
...: p = pstats.Stats("profile.stat")
...: p.sort_stats(
...: "cumulative" # sort by cumulative time spent
...: ).print_stats(
...: "script.py" # only show fn calls in script.py
...: )
Fri Aug 07 08:12:06 2020 profile.stat46338 function calls (45576 primitive calls) in 6.548 secondsOrdered by: cumulative time
List reduced from 793 to 6 due to restriction <'script.py'>ncalls tottime percall cumtime percall filename:lineno(function)
1 0.008 0.008 5.521 5.521 script.py:1(<module>)
1 0.012 0.012 5.468 5.468 script.py:19(read_books)
5 0.000 0.000 4.848 0.970 script.py:5(get_book)
5 0.000 0.000 0.460 0.092 script.py:11(split_words)
5 0.000 0.000 0.112 0.022 script.py:15(count_words)
1 0.000 0.000 0.000 0.000 script.py:32(save_results)
Wow! Look at all that useful info!
For each function called, we’re seeing the following information:
ncalls
: number of times the function was calledtottime
: total time spent in the given function (excluding calls to sub-functions)percall
:tottime
divided byncalls
cumtime
: total time spent in this function and all sub-functionspercall
: (again)cumtime
divided byncalls
filename:lineo(function)
: the file name, line number, and function name
When reading this output, note the fact that we’re hiding a lot of data – in fact, we’re only seeing 6 out of 793 rows. Those hidden rows are all the sub-functions being called from within functions like urllib.request.urlopen
or re.split
. Also, note that the <module>
row corresponds to the code in script.py
that isn’t inside a function.
Now let’s look back at the results, sorted by cumulative duration.
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.008 0.008 5.521 5.521 script.py:1(<module>)
1 0.012 0.012 5.468 5.468 script.py:19(read_books)
5 0.000 0.000 4.848 0.970 script.py:5(get_book)
5 0.000 0.000 0.460 0.092 script.py:11(split_words)
5 0.000 0.000 0.112 0.022 script.py:15(count_words)
1 0.000 0.000 0.000 0.000 script.py:32(save_results)
Keep in mind the hierarchy of function calls. The top-level, <module>
, calls read_books
and save_results
. read_books
calls get_book
, split_words
, and count_words
. By comparing cumulative times, we see that most of <module>
’s time is spent in read_books
and most of read_books
’s time is spent in get_book
, where we make our HTTP request, making this script (unsurprisingly) I/O bound.
Next, let’s take a look at how we can be even more granular by profiling our code line-by-line.
Profiling Line-by-Line
Once we’ve used cProfile to get a sense of what function calls are taking the most time, we can examine those functions line-by-line to get an even clearer picture of where our time is being spent.
For this, we’ll need to install the line-profiler
library with the following command:
$ pip install line-profiler
Once installed, we just need to add the @profile
decorator to the function we want to profile. Here’s the updated snippet from our script:
@profile
def read_books(urls):
"""For each url in urls,
load the book and count the words"""
# Create a place to store word counts
word_counts = {}
# For each book: load it, count words, store the counts
for title, path in urls.items():
book = get_book(path)
words = split_words(book)
counts = count_words(words)
word_counts[title] = counts.most_common()[:10]
return word_counts
Note the fact that we don’t need to import the profile
decorator function — it will be injected by line-profiler
.
Now, to profile our function, we can run the following:
$ kernprof -l -v script-prof.py
kernprof
is installed along with line-profiler
. The -l
flag tells line-profiler
to go line-by-line and the -v
flag tells it to print the result to the terminal rather than save it to a file.
The result for our script would look something like this:
Wrote profile results to script-profile.py.lprof
Timer unit: 1e-06 s
Total time: 5.62982 s
File: script-profile.py
Function: read_books at line 19
Line # Hits Time Per Hit % Time Line Contents
========================================================
19 @profile
20 def read_books(urls):
21 """For each url in urls,
22 load the book and count the words"""
23 # Create a place to store word counts
24 1 1.0 1.0 0.0 word_counts = {}
25 # Per book: load, count words, store counts
26 6 15.0 2.5 0.0 for title, path in urls.items():
27 5 5038674.0 1007734.8 89.5 book = get_book(path)
28 5 431378.0 86275.6 7.7 words = split_words(book)
29 5 121321.0 24264.2 2.2 counts = count_words(words)
30 5 38429.0 7685.8 0.7 word_counts[title] = counts.most_common()[:10]
31 1 0.0 0.0 0.0 return word_counts
The key column to focus on here is % Time
. As you can see, 89.5% of our time parsing each book is spent in the get_book
function — making the HTTP request — further validation that our program is I/O bound rather than CPU bound.
Now, with this new info in mind, if we wanted to speed up our code we wouldn’t want to waste our time trying to make our word counter more efficient. It only takes a fraction of the time compared to the HTTP request. Instead, we’d focus on speeding up our requests — possibly by making them asynchronously.
Here, the results are hardly surprising, but on a larger and more complicated program, line-profiler
is an invaluable tool in our programming tool belt, allowing us to peer under the hood of our program and find the computational bottlenecks.
Profiling Memory
In addition to profiling the time-complexity of our program, we can also profile its memory-complexity.
In order to do line-by-line memory profiling, we’ll need to install the memory-profiler library which also uses the same @profile decorator to determine which function to profile.
$ pip install memory-profiler
$ python -m memory_profiler script.py
The result of running memory-profiler
on our same script should look something like the following:
Filename: script.py
Line # Mem usage Increment Line Contents
================================================
19 38.426 MiB 38.426 MiB @profile
20 def read_books(urls):
21 """For each url in urls,
22 load the book and count the words"""
23 # Create a place to store word counts
24 38.426 MiB 0.000 MiB word_counts = {}
25 # For each book: load it, count words, store the counts
26 85.438 MiB 0.000 MiB for title, path in urls.items():
27 72.137 MiB 9.281 MiB book = get_book(path)
28 95.852 MiB 23.715 MiB words = split_words(book)
29 85.438 MiB 0.496 MiB counts = count_words(words)
30 85.438 MiB 0.254 MiB word_counts[title] = counts.most_common()[:10]
31 85.438 MiB 0.000 MiB return word_counts
There are currently some issues with the accuracy of the “Increment” so just focus on the “Mem usage” column for now.
Our script had peak memory usage on line 28 when we split the books up into words.
Conclusion
Hopefully, now you’ll have a few new tools in your programming tool belt to help you write more efficient code and quickly determine how to best spend your optimization-time.
You can read more about cProfile here, line-profiler here, and memory-profiler here. I also highly recommend the book High Performance Python, by Micha Gorelick and Ian Ozsvald [1].
References
[1] M. Gorelick and I. Ozsvald, High Performance Python 2nd Edition (2020), O’Reilly Media, Inc.