This post explains my current profiling strategy (mostly for my own reference). It has proven useful to first make the code work what you want to do. Based on this working version, it is possible to identify bottlenecks. Then, a two-tier strategy has been proven useful:
- Profiling at the macro level: Which function is the bottleneck?
- Profiling at the micro level: Which line in the function identified in the first step is the bottleneck?
Brett Slatkin gives a good overview in his book "Effective Python" (amazon link)
“Python provides a built-in profiler for determining which parts of a program are responsible for its execution time. This lets you focus your optimization efforts on the biggest sources of trouble and ignore parts of the program that don’t impact speed. [...] Python provides two built-in profilers, one that is pure Python (profile) and another that is a C-extension module (cProfile). The cProfile built-in module is better because of its minimal impact on the performance of your program while it’s being profiled.”
cProfile can be called like this
python -m cProfile <myscript.py>
You can also sort the output by cumulative time spent per function
python -m cProfile -s cumulative <myscript.py>
The output can be also saved to an ASCII file, here
python -m cProfile -o <cProfile_output.prof> <myscript.py>
Snakeviz can read output generated from
and creates a "sunburst"
gprof2dot can read output generated from
gprof2dot.py -f pstats <cProfile_output.prof> | dot -Tpng -o output.png
and generates a dot graph
line_profiler is a module for doing line-by-line profiling of functions.
in your script, you decorate the functions you want to profile with
@profile def slow_function(a, b, c): ...
kernprof -l <myscript.py>
view the results
python -m line_profiler myscript.py.lprof
The timeit module can be used to measure the performance of small code snippets (e.g., when you identified the worst performing line via
line_profiler, and you are testing out alternatives).
cProfile as decorator
- this is another useful overview
- one could use pystone to make performance results comparable between different hardware (e.g., between different iOS hardware)