Notes
Here is a visualization analysis tool viztracer——python代码执行性能分析工具
Original Link
https://www.zhihu.com/question/485980118
Main Text
Among all martial arts techniques, speed is the ultimate unbeatable factor.
The same goes for programming; if your code runs fast and you can quickly identify the cause of slow code, your coding skills are excellent.
Today, I’m sharing an extremely practical Python performance analysis tool pyinstrument, which can quickly locate the slowest part of your code and help improve code performance. It supports Python 3.7+ and can analyze asynchronous code. With just one command, it can display the time consumption of specific code. Python developers should definitely try it out.
Installation
pip install pyinstrument
Simple Usage
Start the pyinstrument Profiler at the beginning of your program, stop it at the end, and print the analysis results as follows:
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# The code you want to analyze here
profiler.stop()
profiler.print()
For example, in this code 123.py, we can clearly see that the list comprehension is comparatively slow:
from pyinstrument import Profiler
profiler = Profiler()
profiler.start()
# The code you want to analyze here
a = [i for i in range(100000)]
b = (i for i in range(100000))
profiler.stop()
profiler.print()
![]()
The above analysis requires modifying the source code. If you use the command-line tool, there is no need to modify the source code; just run pyinstrument xxxx.py:
For example, there is this sorting program c_sort.py:
import sys
import time
import numpy as np
arr = np.random.randint(0, 10, 10)
def slow_key(el):
time.sleep(0.01)
return el
arr = list(arr)
for i in range(10):
arr.sort(key=slow_key)
print(arr)
In this code, time.sleep(0.01) is deliberately placed to delay performance. Let’s see if pyinstrument can identify it by running the command pyinstrument c_sort.py:
![]()
From the results, the program ran for 1.313 seconds, and the sleep function took 1.219 seconds, which is clearly the bottleneck. Now, let’s remove it and look at the results again:
![]()
After removal, the slowest part of the performance became the numpy module’s initialization code __init__.py. However, since this code is not written by ourselves and is not particularly slow, it is unnecessary to worry about it.
Analyzing Flask Code
Web applications can also use this tool to find performance bottlenecks. For example, with Flask, you just need to record time before requests and measure time after requests. Simply add this to Flask’s request interceptor:
from flask import Flask, g, make_response, request
app = Flask(__name__)
@app.before_request
def before_request():
if "profile" in request.args:
g.profiler = Profiler()
g.profiler.start()
@app.after_request
def after_request(response):
if not hasattr(g, "profiler"):
return response
g.profiler.stop()
output_html = g.profiler.output_html()
return make_response(output_html)
Suppose there is this API:
@app.route("/dosomething")
def do_something():
import requests
requests.get("http://google.com")
return "Google says hello!"
To test the bottleneck of this API, you can add the parameter profile to the URL: http://127.0.0.1:5000/dosomething?profile. The lines of code that run slower will be clearly visible:
![]()
Analyzing Django Code
Analyzing Django code is also very simple; just add the following to the MIDDLEWARE setting in Django’s configuration file:
"pyinstrument.middleware.ProfilerMiddleware",
Then you can add the profile parameter to the URL to view performance data:
![]()
If you don’t want everyone to see it, only want administrators to view it, you can add this code to settings.py:
def custom_show_pyinstrument(request):
return request.user.is_superuser
PYINSTRUMENT_SHOW_CALLBACK = "%s.custom_show_pyinstrument" % __name__
If you don’t want to check performance via adding URL parameters, you can add this in settings.py:
PYINSTRUMENT_PROFILE_DIR = 'profiles'
This way, every time a Django interface is accessed, the analysis result will be saved as an HTML file under the profiles folder in the project directory.
Analyzing Asynchronous Code
Simple asynchronous code analysis:
async_example_simple.py:
import asyncio
from pyinstrument import Profiler
async def main():
p = Profiler()
with p:
print("Hello ...")
await asyncio.sleep(1)
print("... World!")
p.print()
asyncio.run(main())
More complex asynchronous code analysis:
import asyncio
import time
import pyinstrument
def do_nothing():
pass
def busy_wait(duration):
end_time = time.time() + duration
while time.time() < end_time:
do_nothing()
async def say(what, when, profile=False):
if profile:
p = pyinstrument.Profiler()
p.start()
busy_wait(0.1)
sleep_start = time.time()
await asyncio.sleep(when)
print(f"slept for {time.time() - sleep_start:.3f} seconds")
busy_wait(0.1)
print(what)
if profile:
p.stop()
p.print(show_all=True)
loop = asyncio.get_event_loop()
loop.create_task(say("first hello", 2, profile=True))
loop.create_task(say("second hello", 1, profile=True))
loop.create_task(say("third hello", 3, profile=True))
loop.run_forever()
loop.close()
How It Works
Pyinstrument interrupts the program every 1ms and records the entire stack at that point. It uses a C extension and PyEval_SetProfile to do this but only reads samples every 1 millisecond. You might think the reported sample count is a bit low, but don’t worry, it does not reduce accuracy. The default 1ms interval is the lower bound for capturing stack frames, but if a single function call takes a long time, it records at the end of that call. It efficiently “packs” these samples and records them at the end.
Pyinstrument is a statistical profiler and does not trace every function call your program makes. Instead, it records the call stack every 1 millisecond. Compared to other profilers, statistical profilers have much lower overhead than tracing profilers.
For example, if I want to figure out why web requests in Django are slow, using cProfile might give me this:
151940 function calls (147672 primitive calls) in 1.696 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.696 1.696 profile:0(<code object <module> at 0x1053d6a30, file "./manage.py", line 2>)
1 0.001 0.001 1.693 1.693 manage.py:2(<module>)
1 0.000 0.000 1.586 1.586 __init__.py:394(execute_from_command_line)
1 0.000 0.000 1.586 1.586 __init__.py:350(execute)
1 0.000 0.000 1.142 1.142 __init__.py:254(fetch_command)
43 0.013 0.000 1.124 0.026 __init__.py:1(<module>)
388 0.008 0.000 1.062 0.003 re.py:226(_compile)
158 0.005 0.000 1.048 0.007 sre_compile.py:496(compile)
1 0.001 0.001 1.042 1.042 __init__.py:78(get_commands)
153 0.001 0.000 1.036 0.007 re.py:188(compile)
106/102 0.001 0.000 1.030 0.010 __init__.py:52(__getattr__)
1 0.000 0.000 1.029 1.029 __init__.py:31(_setup)
1 0.000 0.000 1.021 1.021 __init__.py:57(_configure_logging)
2 0.002 0.001 1.011 0.505 log.py:1(<module>)
After seeing that, you might still be confused. It is often difficult to understand how your own code relates to this trace. Pyinstrument records the entire call stack, so tracing expensive calls is much easier. It also hides library frameworks by default, allowing you to focus on the application/modules affecting performance:
_ ._ __/__ _ _ _ _ _/_ Recorded: 14:53:35 Samples: 131
/_//_/// /_\ / //_// / //_'/ // Duration: 3.131 CPU time: 0.195
/ _/ v3.0.0b3
Program: examples/django_example/manage.py runserver --nothreading --noreload
3.131 <module> manage.py:2
└─ 3.118 execute_from_command_line django/core/management/__init__.py:378
[473 frames hidden] django, socketserver, selectors, wsgi...
2.836 select selectors.py:365
0.126 _get_response django/core/handlers/base.py:96
└─ 0.126 hello_world django_example/views.py:4
Final words
This article shared how to use pyinstrument. With this performance analysis tool, optimizing your code will save much time in the future. Such an efficiency tool is definitely worth sharing—after all, life is short, isn’t it better to have more time to do something interesting? If you find it useful, please like, bookmark, and share. Thank you for your support!