How to Quickly Identify the Slowest Code in Python?

Notes

Here is a visualization analysis tool viztracer——python代码执行性能分析工具

Original Link

https://www.zhihu.com/question/485980118

Main Text

Among all martial arts techniques, speed is the ultimate unbeatable factor.

The same goes for programming; if your code runs fast and you can quickly identify the cause of slow code, your coding skills are excellent.

Today, I’m sharing an extremely practical Python performance analysis tool pyinstrument, which can quickly locate the slowest part of your code and help improve code performance. It supports Python 3.7+ and can analyze asynchronous code. With just one command, it can display the time consumption of specific code. Python developers should definitely try it out.

Installation

pip install pyinstrument

Simple Usage

Start the pyinstrument Profiler at the beginning of your program, stop it at the end, and print the analysis results as follows:

from pyinstrument import Profiler

profiler = Profiler()
profiler.start()

# The code you want to analyze here

profiler.stop()

profiler.print()

For example, in this code 123.py, we can clearly see that the list comprehension is comparatively slow:

from pyinstrument import Profiler

profiler = Profiler()
profiler.start()

# The code you want to analyze here
a = [i for i in range(100000)]
b = (i for i in range(100000))

profiler.stop()
profiler.print()

The above analysis requires modifying the source code. If you use the command-line tool, there is no need to modify the source code; just run pyinstrument xxxx.py:

For example, there is this sorting program c_sort.py:

import sys
import time

import numpy as np

arr = np.random.randint(0, 10, 10)

def slow_key(el):
    time.sleep(0.01)
    return el 

arr = list(arr)

for i in range(10):
    arr.sort(key=slow_key)

print(arr)

In this code, time.sleep(0.01) is deliberately placed to delay performance. Let’s see if pyinstrument can identify it by running the command pyinstrument c_sort.py:

From the results, the program ran for 1.313 seconds, and the sleep function took 1.219 seconds, which is clearly the bottleneck. Now, let’s remove it and look at the results again:

After removal, the slowest part of the performance became the numpy module’s initialization code __init__.py. However, since this code is not written by ourselves and is not particularly slow, it is unnecessary to worry about it.

Analyzing Flask Code

Web applications can also use this tool to find performance bottlenecks. For example, with Flask, you just need to record time before requests and measure time after requests. Simply add this to Flask’s request interceptor:

from flask import Flask, g, make_response, request
app = Flask(__name__)

@app.before_request
def before_request():
    if "profile" in request.args:
        g.profiler = Profiler()
        g.profiler.start()


@app.after_request
def after_request(response):
    if not hasattr(g, "profiler"):
        return response
    g.profiler.stop()
    output_html = g.profiler.output_html()
    return make_response(output_html)

Suppose there is this API:

@app.route("/dosomething")
def do_something():
    import requests
    requests.get("http://google.com")
    return "Google says hello!"

To test the bottleneck of this API, you can add the parameter profile to the URL: http://127.0.0.1:5000/dosomething?profile. The lines of code that run slower will be clearly visible:

Analyzing Django Code

Analyzing Django code is also very simple; just add the following to the MIDDLEWARE setting in Django’s configuration file:

"pyinstrument.middleware.ProfilerMiddleware",

Then you can add the profile parameter to the URL to view performance data:

If you don’t want everyone to see it, only want administrators to view it, you can add this code to settings.py:

def custom_show_pyinstrument(request):
    return request.user.is_superuser

PYINSTRUMENT_SHOW_CALLBACK = "%s.custom_show_pyinstrument" % __name__

If you don’t want to check performance via adding URL parameters, you can add this in settings.py:

PYINSTRUMENT_PROFILE_DIR = 'profiles'

This way, every time a Django interface is accessed, the analysis result will be saved as an HTML file under the profiles folder in the project directory.

Analyzing Asynchronous Code

Simple asynchronous code analysis:

async_example_simple.py:

import asyncio

from pyinstrument import Profiler


async def main():
    p = Profiler()
    with p:
        print("Hello ...")
        await asyncio.sleep(1)
        print("... World!")
    p.print()

asyncio.run(main())

More complex asynchronous code analysis:

import asyncio
import time

import pyinstrument


def do_nothing():
    pass


def busy_wait(duration):
    end_time = time.time() + duration

    while time.time() < end_time:
        do_nothing()


async def say(what, when, profile=False):
    if profile:
        p = pyinstrument.Profiler()
        p.start()

    busy_wait(0.1)
    sleep_start = time.time()
    await asyncio.sleep(when)
    print(f"slept for {time.time() - sleep_start:.3f} seconds")
    busy_wait(0.1)

    print(what)
    if profile:
        p.stop()
        p.print(show_all=True)


loop = asyncio.get_event_loop()

loop.create_task(say("first hello", 2, profile=True))
loop.create_task(say("second hello", 1, profile=True))
loop.create_task(say("third hello", 3, profile=True))

loop.run_forever()
loop.close()

How It Works

Pyinstrument interrupts the program every 1ms and records the entire stack at that point. It uses a C extension and PyEval_SetProfile to do this but only reads samples every 1 millisecond. You might think the reported sample count is a bit low, but don’t worry, it does not reduce accuracy. The default 1ms interval is the lower bound for capturing stack frames, but if a single function call takes a long time, it records at the end of that call. It efficiently “packs” these samples and records them at the end.

Pyinstrument is a statistical profiler and does not trace every function call your program makes. Instead, it records the call stack every 1 millisecond. Compared to other profilers, statistical profilers have much lower overhead than tracing profilers.

For example, if I want to figure out why web requests in Django are slow, using cProfile might give me this:

151940 function calls (147672 primitive calls) in 1.696 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.696    1.696 profile:0(<code object <module> at 0x1053d6a30, file "./manage.py", line 2>)
        1    0.001    0.001    1.693    1.693 manage.py:2(<module>)
        1    0.000    0.000    1.586    1.586 __init__.py:394(execute_from_command_line)
        1    0.000    0.000    1.586    1.586 __init__.py:350(execute)
        1    0.000    0.000    1.142    1.142 __init__.py:254(fetch_command)
       43    0.013    0.000    1.124    0.026 __init__.py:1(<module>)
      388    0.008    0.000    1.062    0.003 re.py:226(_compile)
      158    0.005    0.000    1.048    0.007 sre_compile.py:496(compile)
        1    0.001    0.001    1.042    1.042 __init__.py:78(get_commands)
      153    0.001    0.000    1.036    0.007 re.py:188(compile)
  106/102    0.001    0.000    1.030    0.010 __init__.py:52(__getattr__)
        1    0.000    0.000    1.029    1.029 __init__.py:31(_setup)
        1    0.000    0.000    1.021    1.021 __init__.py:57(_configure_logging)
        2    0.002    0.001    1.011    0.505 log.py:1(<module>)

After seeing that, you might still be confused. It is often difficult to understand how your own code relates to this trace. Pyinstrument records the entire call stack, so tracing expensive calls is much easier. It also hides library frameworks by default, allowing you to focus on the application/modules affecting performance:

_     ._   __/__   _ _  _  _ _/_   Recorded: 14:53:35  Samples:  131
 /_//_/// /_\ / //_// / //_'/ //    Duration: 3.131     CPU time: 0.195
/   _/                    v3.0.0b3

Program: examples/django_example/manage.py runserver --nothreading --noreload

3.131 <module>  manage.py:2
└─ 3.118 execute_from_command_line  django/core/management/__init__.py:378
      [473 frames hidden]  django, socketserver, selectors, wsgi...
         2.836 select  selectors.py:365
         0.126 _get_response  django/core/handlers/base.py:96
         └─ 0.126 hello_world  django_example/views.py:4

Final words

This article shared how to use pyinstrument. With this performance analysis tool, optimizing your code will save much time in the future. Such an efficiency tool is definitely worth sharing—after all, life is short, isn’t it better to have more time to do something interesting? If you find it useful, please like, bookmark, and share. Thank you for your support!

How to Quickly Locate the Slowest Code in Python? - Response by Huicacao - Zhihu
https://www.zhihu.com/question/485980118/answer/2139431728

How to Quickly Locate the Slowest Code in Python? - Answer by Klein Hu - Zhihu
https://www.zhihu.com/question/485980118/answer/2124117608