英文:
Why does sleeping before a write slow down the subsequent write?
问题
我在写入文件(或套接字)时偶尔遇到性能下降的情况,并且在调试过程中,我使用了一个小的测试工具来重现这个问题。我发现一个奇怪的行为,即在写入之间的睡眠时间越长,写入本身所需的时间就越长。我在Go和Python中都复现了这个测试工具,并且在我的Mac和Linux机器上都看到了类似的结果。
关键的困惑在于,在这两种情况下的计时代码看起来是这样的:
sleep(x)
startTime = now()
write(msg)
endTime = now()
writeTime = endTime - startTime
尽管睡眠时间根本没有包含在计时中,但writeTime
随着sleep
的增加而显著变长!
Python代码:
import time
f = open("/dev/null", "w")
msg = "0" * 1024
for sleepMicros in [0, 1, 10, 100, 1000]:
totalSendTime = 0
cntOver10us = 0
for _ in range(10000):
time.sleep(sleepMicros / 1000 / 1000)
startSendTime = time.time()
f.write(msg)
sentTime = time.time()
timeInSend = sentTime - startSendTime
totalSendTime += timeInSend
if timeInSend > 10 / 1000 / 1000:
cntOver10us += 1
print(f"sleep: {sleepMicros}us\tTotal time in send: {totalSendTime * 1000}ms\tCount over 10us: {cntOver10us}")
Go代码:
package main
import (
"fmt"
"os"
"time"
)
func main() {
f, _ := os.Create("/dev/null")
msg := make([]byte, 1024)
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
totalSendTime := time.Duration(0)
cntOver10us := 0
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i++ {
time.Sleep(sleepDuration)
startSendTime := time.Now()
_, err := f.Write(msg)
sentTime := time.Now()
if err != nil {
fmt.Println("SendMessage got error", err)
}
timeInSend := sentTime.Sub(startSendTime)
totalSendTime += timeInSend
if timeInSend > 10*time.Microsecond {
cntOver10us++
}
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
}
结果:
Python:
(venv) % python write_to_devnull_mini.py
sleep: 0us Total time in send: 2.109527587890625ms Count over 10us: 0
sleep: 1us Total time in send: 1.9390583038330078ms Count over 10us: 0
sleep: 10us Total time in send: 1.9574165344238281ms Count over 10us: 0
sleep: 100us Total time in send: 5.476236343383789ms Count over 10us: 9
sleep: 1000us Total time in send: 50.98319053649902ms Count over 10us: 1179
Go:
(venv) % go run write_to_devnull_mini.go
sleep: 0s Total time in send: 5.724469ms Count over 10us: 0
sleep: 1μs Total time in send: 6.404582ms Count over 10us: 1
sleep: 10μs Total time in send: 4.346918ms Count over 10us: 0
sleep: 100μs Total time in send: 10.909356ms Count over 10us: 8
sleep: 1ms Total time in send: 112.627213ms Count over 10us: 1307
我最好的猜测是这是调度器的某种副作用,但我很想了解发生了什么。为什么在写入之间睡眠1毫秒会导致在调用time.Now()
之间花费的总时间增加20倍,并且“慢”(>10微秒)写入的数量增加了200倍?
英文:
I'm seeing occasionally slow performance in writing to a file (or socket) and in debugging, I ended up with a small test harness to reproduce it. I'm seeing strange behavior whereby the longer I sleep in between writes, the longer the write itself takes. I reproduced the test harness in both Go and Python and see similar results in both. I also see similar results on both my mac and linux machines.
The key confusion is that the timing code in both cases looks like
sleep(x)
startTime = now()
write(msg)
endTime = now()
writeTime = endTime - startTime
where writeTime
gets significantly longer as the sleep
increases, despite the sleep not being included in the timing at all!
Python:
import time
f = open("/dev/null", "w")
msg = "0" * 1024
for sleepMicros in [0, 1, 10, 100, 1000]:
totalSendTime = 0
cntOver10us = 0
for _ in range(10000):
time.sleep(sleepMicros / 1000 / 1000)
startSendTime = time.time()
f.write(msg)
sentTime = time.time()
timeInSend = sentTime - startSendTime
totalSendTime += timeInSend
if timeInSend > 10 / 1000 / 1000:
cntOver10us += 1
print(f"sleep: {sleepMicros}us\tTotal time in send: {totalSendTime * 1000}ms\tCount over 10us: {cntOver10us}")
Go:
package main
import (
"fmt"
"os"
"time"
)
func main() {
f, _ := os.Create("/dev/null")
msg := make([]byte, 1024)
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
totalSendTime := time.Duration(0)
cntOver10us := 0
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i++ {
time.Sleep(sleepDuration)
startSendTime := time.Now()
_, err := f.Write(msg)
sentTime := time.Now()
if err != nil {
fmt.Println("SendMessage got error", err)
}
timeInSend := sentTime.Sub(startSendTime)
totalSendTime += timeInSend
if timeInSend > 10*time.Microsecond {
cntOver10us++
}
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
}
Results:
Python:
(venv) % python write_to_devnull_mini.py
sleep: 0us Total time in send: 2.109527587890625ms Count over 10us: 0
sleep: 1us Total time in send: 1.9390583038330078ms Count over 10us: 0
sleep: 10us Total time in send: 1.9574165344238281ms Count over 10us: 0
sleep: 100us Total time in send: 5.476236343383789ms Count over 10us: 9
sleep: 1000us Total time in send: 50.98319053649902ms Count over 10us: 1179
Go:
(venv) % go run write_to_devnull_mini.go
sleep: 0s Total time in send: 5.724469ms Count over 10us: 0
sleep: 1µs Total time in send: 6.404582ms Count over 10us: 1
sleep: 10µs Total time in send: 4.346918ms Count over 10us: 0
sleep: 100µs Total time in send: 10.909356ms Count over 10us: 8
sleep: 1ms Total time in send: 112.627213ms Count over 10us: 1307
My best guess is that this is some artifact of the scheduler, but would love to understand what's happening. How does sleeping for 1ms in between writes cause the total time spent between calls to time.Now()
to go up by 20x, and the number of "slow" (>10us) writes to increase 200-fold?
答案1
得分: 1
等一下 - 这看起来有点可疑:
totalSendTime := time.Duration(0)
cntOver10us := 0
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i++ {
...
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
你真的不想在每次迭代中重置 totalSendTime 和 cntOver10us 吗?否则,无论你睡眠多长时间,totalSendTime 和 cntOver10us 都会继续增加。
你可以这样修改代码:
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
totalSendTime := time.Duration(0)
cntOver10us := 0
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i++ {
...
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
否则,由于你从未重置它,每个外部循环的 totalSendTime 只会增加。
英文:
Wait - this looks suspicious:
totalSendTime := time.Duration(0)
cntOver10us := 0
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i++ {
...
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
Don't you really want to reset totalSendTime and cntOver10us on each iteration? Otherwise, totalSleep and cntOver10us is going to keep going up no matter how long you sleep.
for _, sleepMicros := range []time.Duration{0, 1, 10, 100, 1000} {
totalSendTime := time.Duration(0)
cntOver10us := 0
sleepDuration := sleepMicros * time.Microsecond
for i := 0; i < 10000; i++ {
...
}
fmt.Println("sleep:", sleepDuration, "\tTotal time in send:", totalSendTime, "\tCount over 10us:", cntOver10us)
}
Otherwise, your totalSendTime for each outer loop is only going to go up since you never seem to reset it
答案2
得分: 1
你忽略了一些细节,比如核心数量、调度器以及来自其他进程的加载。
可能是频繁的睡眠导致了你的动态优先级降低。
这与用户设置的nice
值不同。
预测你的进程每秒只使用几个周期,
所以可以用一个具有更紧迫截止时间的高优先级进程来抢占cPython字节码解释器。
但也可能是频繁的长时间睡眠导致
其他进程污染了缓存,导致你遭受更多的缓存未命中。
或者,类似地,你缺乏亲和性并且不断被重新调度
在不同的核心上。
如果你想调查这个问题,可以对缓存进行仪器化,或查看ps
报告的PSR。
一个简单的查看方法是
import resource
...
r = resource.getrusage(resource.RUSAGE_SELF)
并注意自愿和r.ru_nivcsw
非自愿
上下文切换的增量模式。
还要验证r.ru_minflt
是否保持不变。
嗯,想起来了,由于L3缓存未命中而导致的缓慢
应该会显示为getrusage()报告的非常数用户和系统模式增量。
英文:
You omitted details like core count, scheduler, and loading from other processes.
It could be that frequent sleeping is reducing your dynamic priority.
That's different from the user-set nice
value.
Your process is predicted to use only a few cycles per second,
so it's OK to pre-empt the cPython bytecode interpreter
with a higher priority process that has a tighter deadline.
But it could also be that frequent long sleeps are allowing
other processes to dirty the cache so you suffer more cache misses.
Or, similarly, you lack affinity and keep getting rescheduled
on different cores.
Instrument the cache, or look at PSR reported by ps
if you want to investigate that.
A simple thing to look at would be
import resource
...
r = resource.getrusage(resource.RUSAGE_SELF)
and note patterns in how voluntary and r.ru_nivcsw
involuntary
context switches increment.
Also, verify that r.ru_minflt
remains constant.
Hmmm, come to think of it, slowness due to L3 cache miss
should show up as non-constant user and sys mode deltas
reported by getrusage().
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论