为什么调用不同数值的time.sleep会改变与sleep无关的部分的执行时间?

huangapple go评论58阅读模式
英文:

Why does calling time.sleep with different values alter the execution time of parts that does not depend on the sleep?

问题

我运行这段代码多次,使用不同的SLEEP_TIME值,例如SLEEP_TIME=0SLEEP_TIME=1e-3SLEEP_TIME=10e-3,甚至完全省略了代码中的time.sleep行。对于每个SLEEP_TIME值,测得的平均工作时间都会发生变化,尽管睡眠操作在测量的代码之外。这对我来说毫无道理 - 为什么调用time.sleep会改变进程的行为,即使代码绝对不依赖于睡眠?

我在Linux和Windows上都测试了以下代码,行为类似(尽管在Windows中完全省略睡眠会导致性能显著下降):

import numpy as np
import multiprocessing 
import time

SLEEP_TIME = 1e-3

def do_work():
    total_time = 0
    time_to_run = 500
    for i in range(time_to_run):
        t0 = time.time()
        
        # 开始工作
        nparr = np.ones((1000,100,30))
        nparr[nparr == 0] = 1
        sp = nparr.shape # 同步前一次调用
        # 结束工作
        
        t1 = time.time()                
        total_time += t1 - t0        
        time.sleep(SLEEP_TIME) # 为什么这会有影响???这在工作之外并且不在测量之内
    
    print(f"平均工作时间: {1000 * total_time / time_to_run:.2f}毫秒")        

if __name__ == '__main__':
    
    p1 = multiprocessing.Process(target=do_work)        
    p1.start()
    p2 = multiprocessing.Process(target=do_work)        
    p2.start()

    p1.join()
    p2.join()
    

示例结果(在Linux上):

无睡眠(注释掉time.sleep)

输出:

平均工作时间: 4.50毫秒

平均工作时间: 4.56毫秒

SLEEP_TIME = 0

输出:

平均工作时间: 4.46毫秒

平均工作时间: 4.52毫秒

SLEEP_TIME = 1e-3

输出:

平均工作时间: 4.76毫秒

平均工作时间: 4.82毫秒

SLEEP_TIME = 10e-3

输出:

平均工作时间: 7.05毫秒

平均工作时间: 7.07毫秒

发生了什么?操作系统是否试图(并失败了)优化我的进程?无论以前的睡眠时间多少,如何尽可能快地执行工作部分?

ChatGPT建议我在文件顶部添加以下内容:

import os
os.environ["OMP_NUM_THREADS"] = "1"  # 或者您选择的任何数字

虽然这可以提高具有较长睡眠的执行时间,但执行时间仍然有所不同。

编辑:我修复了加入策略,正如一些人正确建议的那样。尽管这不会影响问题,但为了避免混淆,将代码编写正确是更好的。

英文:

I run this code multiple time with different SLEEP_TIME, for example SLEEP_TIME=0, SLEEP_TIME=1e-3, SLEEP_TIME=10e-3 and also omitted the time.sleep line altogether from the code. For every value of SLEEP_TIME the measured average work time changes, even though the sleep is outside the measured code. This makes zero sense to me - why would calling time.sleep change the way the process behaves even though the code absolutely does not depend on the sleep?

I tested the following code with both linux and windows and the behavior is similar (though in windows omitting the sleep altogether causes the performance to degrade significantly).

import numpy as np
import multiprocessing 
import time

SLEEP_TIME = 1e-3

def do_work():
    total_time = 0
    time_to_run = 500
    for i in range(time_to_run):
        t0 = time.time()
        
        # start work
        nparr = np.ones((1000,100,30))
        nparr[nparr == 0] = 1
        sp = nparr.shape # to synchronize previous call
        # end work
        
        t1 = time.time()                
        total_time += t1 - t0        
        time.sleep(SLEEP_TIME) # WHY DOES THIS MATTER???? THIS IS OUTSIDE THE WORK AND OUTSIDE MEASUREMENT
    
    print(f"avg work time: {1000 * total_time / time_to_run:.2f}ms")        

if __name__ == '__main__':
    
    p1 = multiprocessing.Process(target=do_work)        
    p1.start()
    p2 = multiprocessing.Process(target=do_work)        
    p2.start()

    p1.join()
    p2.join()
    

Example results (on linux):

No sleep (commenting out time.sleep)

Output:

> avg work time: 4.50ms
>
> avg work time: 4.56ms

SLEEP_TIME = 0

Output:

> avg work time: 4.46ms
>
> avg work time: 4.52ms

SLEEP_TIME = 1e-3

Output:

> avg work time: 4.76ms
>
> avg work time: 4.82ms

SLEEP_TIME = 10e-3

Output:

> avg work time: 7.05ms
>
> avg work time: 7.07ms

What is happening here? Is the OS trying (and failing) to optimize my process? And how can I execute the work part as fast as possible regardless of the amount of previous sleep time?

ChatGPT suggested I should add to the top of the file:

import os
os.environ["OMP_NUM_THREADS"] = "1"  # or whatever number you choose

While it improves the time of execution with large sleeps, the execution time still defers.

EDIT: I fixed the join strategy like some have rightly suggested. Though it's doesn't affect the problem in question it is better to write the code correctly to avoid confusion.

答案1

得分: 2

我在我的Ubuntu机器上复现了您的Python脚本行为。在我的情况下,这与Python无关,我在一个C++程序中发现了类似的性能下降,该程序在每次计算之间都会休眠。

在Linux中有各种机制可以在系统负载低时降低CPU的频率以节省电能。在我的情况下,“CPU频率调整管理器”被设置为所有CPU上的“powersave”。您可以通过运行以下命令来检查这一点:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

在我的情况下,切换到“performance”会导致具有和不具有休眠的相似的时间测量,而且在从“powersave”更改之前,测量的时间甚至比不休眠时的测量时间还要低。要更改这些设置,请运行:

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

请注意,这将使用更多的电力,并可能导致更多的热量生成,因此您可能希望监控CPU的温度,以确保它们不会过热。

英文:

I reproduced the behavior of your python script on my Ubuntu machine.
In my case it was not specific to python, and I found similar performance degradation in a c++ program that sleeps between each computation.

There are various mechanisms in Linux that reduce the frequency of the CPU(s) in order to save power when the system load is low. In my case, the "CPU frequency scaling governor" was set to "powersave" on all CPUs. You can check this by running:

cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

In my case, changing to "performance" yields similar time measurements with and without sleep, and the measured time is now even lower than the measured time without sleeping before changing from "powersave".
To change these settings, run:

echo performance | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Note that this will use more power and could lead to more heat generation, so you might want to monitor the temperature of your CPUs to make sure they don't overheat.

答案2

得分: 0

尝试运行多次测量,例如通过将 do_work 更改为:

RUNS = 10

def do_work():
    time_to_run = 50
    min_total_time = None
    for j in range(RUNS):
        total_time = 0
        for i in range(time_to_run):
            t0 = time.time()

            # 开始工作
            nparr = np.ones((1000,100,30))
            nparr[nparr == 0] = 1
            sp = nparr.shape # 同步前一次调用
            # 结束工作

            t1 = time.time()
            total_time += t1 - t0
            time.sleep(SLEEP_TIME) # 为什么这个重要????这在工作之外,也在测量之外
        if min_total_time is None:
            min_total_time = total_time
        else:
            min_total_time = min(min_total_time, total_time)
    
    print(f"最小工作时间:{1000 * min_total_time / time_to_run:.2f}毫秒")

我认为你会发现差异消失了。

问题在于,你的计算机所花费的时间非常不一致,因为你的操作系统和后台运行的其他程序不断占用CPU资源的一部分。有时候你的操作系统只是需要做一些事情——它可能需要检查更新或某个计划的活动会启动。

测试平均值并不真正有用,因为一个单一的异常值可能会产生过大的影响。

你尝试运行的次数越多,你就能更好地比较差异。

英文:

Try running measuring multiple runs, for example by changing do_work to:

RUNS = 10

def do_work():
    time_to_run = 50
    min_total_time = None
    for j in range(RUNS):
        total_time = 0
        for i in range(time_to_run):
            t0 = time.time()

            # start work
            nparr = np.ones((1000,100,30))
            nparr[nparr == 0] = 1
            sp = nparr.shape # to synchronize previous call
            # end work

            t1 = time.time()
            total_time += t1 - t0
            time.sleep(SLEEP_TIME) # WHY DOES THIS MATTER???? THIS IS OUTSIDE THE WORK AND OUTSIDE MEASUREMENT
        if min_total_time is None:
            min_total_time = total_time
        else:
            min_total_time = min(min_total_time, total_time)
    
    print(f"min work time: {1000 * min_total_time / time_to_run:.2f}ms")

I think you'll find the differences disappear.

The thing is, the time taken by your computer is very inconsistent because your OS and other programs running in the background are constantly taking up bits of CPU resources. Sometimes your OS just needs to do something -- it may need to check for updates or some scheduled activity will kick in.

Testing the average isn't really useful because one single outlier can have an outsized effect.

The more runs you try, the better you'll be able to compare differences.

答案3

得分: 0

你的机器速度相当快,这是一个相当短的计算过程。因此它夸大了效果。

例如,在我的机器上,相同的计算需要14.2毫秒。我确实能够复现这个效果,但它更加微妙,我不得不运行它多次来计算平均/均值,然后计算p值,以确保这不仅仅是运气(在单次运行中,有时会因为偶然性而得到相反的趋势)。

所以,对于0毫秒的休眠,大约是14毫秒,1毫秒是14.2毫秒,10毫秒是14.5毫秒,100毫秒也是14.5毫秒。因此,它确实存在,但不像休眠时间被计入工作时间(对于100毫秒,我必须等待1分钟,而我的机器负载几乎为零,因此没有使用CPU,在获得结果之前:大部分时间都花在了睡眠上,然而,这并没有太大改变工作时间)。

我不能确定原因。但我会赌@thebjorn的假设:因为你的CPU对只睡眠的进程感到无聊,它会转而处理其他进程...或类似的情况。

请注意,如果你确实使用了CPU,因此它确实有事可做,例如使用8个进程而不是2个(在我这台有4个核心的机器上。如果你有更多核心,这很可能,你应该使用更多进程),那么效果将会颠倒:
有8个并发进程时,0、1和10毫秒的睡眠时间分别为52、48、42毫秒。

这是有道理的:你睡得越多,你的进程之间的竞争就越少。如果我使用一个巨大的睡眠时间,那么回到大约15毫秒,因为进程大部分时间都在睡觉,它们最终会在核心之间的竞争中几乎可以忽略不计。

所以,你在这里测量的是你的进程之间以及与其他进程之间的竞争情况。
如果你的进程的工作量太小,以至于彼此不会干扰(在具有4个核心的机器上运行2个进程,即使它们从不休眠,也不会相互竞争),那么重要的是与其他进程的竞争。如果你睡得很多,也许你的进程更容易被打断,以便来自外部的小竞争干扰它们。我不知道。调度程序的运行方式很神秘。
如果你的进程对你的CPU来说的工作量超过了充足的程度(我猜应该是这样,否则你就不会测量性能并使用多个进程),那么重要的是彼此之间的竞争,相反,你睡得越多,你与他人的竞争就越少。

英文:

You have a rather fast machine, and this is a rather short computation.
So it exagerates the effect.

On my machine, for example, the same computation takes 14.2 ms. And I am indeed able to reproduce the effect, but it is more subtle, and I had to run it several time to compute avg/mean, and then p-value to be sure that it wasn't just luck (with single runs, sometimes I get the inverse tendency because of chance).

So, it is around 14 ms for 0 sleep, 14.2 for 1ms, 14.5 for 10 ms, and 14.5 also for 100 ms. So, it does exist, but is not at all like the sleep time was counted into the work time (for 100 ms I had to wait 1 minute, while my machine load was practically 0, so no cpu used, before getting this: most of the time is spent sleeping, and, nevertheless, it doesn't change that much the work time).

I can't be sure of why. But I would bet for @thebjorn hypothesis: because your cpu is bored with your process that does only sleeping, it switches to other processes... or something like that.

Note that if your do use your cpu, so it has really something to do, like using 8 processes instead of 2 (on my machine with 4 cores. If you have more cores, which is likely, you should use even more processes), then the effect is inverted:
With 8 concurrent processes, work time is 52, 48, 42 ms respectively for sleep time of 0, 1, and 10 ms.

Which makes sense: the more you sleep, the less your processes are competing which each other. If I use a huge sleep time, then, back to around 15 ms, because process being most of the time sleeping, the probability that the end up being in competition between each other for cores is negligible.

So, what you are measuring here is how your process are in competition with each other, and with others.
If the work of your processes is too small to bother each other (2 processes on a 4 cores machines, even if they are never sleeping, are not in competition with each other), then, what matters is the competition with other processes. And if you sleep a lot, maybe your processes are more often interrupted to let the small competition from outside bother them. I don't know. Scheduler works in mysterious ways.
If your process have more than enough work for your cpu (which, I guess, should be the case otherwise you wouldn't be bothering measuring performance and using multiple processes), then, what matters is competition with each other, and on the contrary, the more you sleep, the less you compete with others.

huangapple
  • 本文由 发表于 2023年6月15日 21:41:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76483104.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定