英文:
Why does bash "forget" about my background processes?
问题
以下是您提供的代码的翻译部分:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[@]}"; do
wait "$pid"
done
我期望的行为如下:
- 在第一个循环中执行
- 在第一个PID上等待大约一秒钟
- 继续执行第二个循环
而实际上,我收到以下错误信息:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(不同的PID会重复出现171次)
如果我使用较短的循环(50而不是999)运行脚本,则不会出现错误。
发生了什么?
注:我在Windows上使用GNU bash 4.4.23。
英文:
I have this code:
#!/bin/bash
pids=()
for i in $(seq 1 999); do
sleep 1 &
pids+=( "$!" )
done
for pid in "${pids[@]}"; do
wait "$pid"
done
I expect the following behavior:
- spin through the first loop
- wait about a second on the first pid
- spin through the second loop
Instead, I get this error:
./foo.sh: line 8: wait: pid 24752 is not a child of this shell
(repeated 171 times with different pids)
If I run the script with shorter loop (50 instead of 999), then I get no errors.
What's going on?
Edit: I am using GNU bash 4.4.23 on Windows.
答案1
得分: 4
POSIX says:
>实现不必在当前shell执行环境中保留超过{CHILD_MAX}最近的已知进程ID条目。
{CHILD_MAX} 在此处指的是每个用户允许的同时运行进程的最大数量。您可以使用getconf
实用程序获取此限制的值:
$ getconf CHILD_MAX
13195
Bash将至多两倍于该数量的已退出后台进程状态存储在一个循环缓冲区中,当您对旧进程的PID调用wait
时,如果已被覆盖,则会显示not a child of this shell。您可以查看它的实现在这里。
英文:
POSIX says:
>The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.
{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf
utility:
$ getconf CHILD_MAX
13195
Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait
on the PID of an old one that's been overwritten. You can see how it's implemented here.
答案2
得分: 3
sleep
通过fork
+exec
在后台执行。- 在某个时刻,
sleep
退出,留下一个僵尸进程。 - 该僵尸进程保持在原地,保持其PID,直到其父进程调用
wait
来检索其退出代码。
然而,像bash这样的shell实际上会以稍有不同的方式处理这个问题。它们主动回收它们的僵尸子进程并将它们的退出代码存储在内存中,以便可以释放这些进程使用的系统资源。然后当你调用wait
时,shell只是将内存中存储的值交给你,但那时候僵尸进程可能早就消失了。
现在,由于所有这些退出状态都被存储在内存中,有一个实际上的限制,即在你在shell中没有调用wait
之前,有多少后台进程可以退出,否则你会耗尽shell中可用于存储这些信息的所有内存。我预计你在你的环境中的进程数在几百个时达到了这个限制,而其他用户在他们的环境中可以达到几千个。不管怎样,最终的结果都是一样的 - 最终没有地方可以存储有关你的子进程的信息,因此这些信息会丢失。
英文:
The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:
sleep
is executed in the background via afork
+exec
.- At some point,
sleep
exits leaving behind a zombie. - That zombie remains in place, holding its PID, until its parent calls
wait
to retrieve its exit code.
However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait
the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.
Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait
before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.
答案3
得分: 1
我可以在ArchLinux上重现这个问题,使用以下命令:docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids'
和任何更早的版本都可以。在bash:5.1.0中无法重现。
> 发生了什么?
看起来是你的Bash版本中的一个错误。在Bash:5.1中的jobs.c和wait.def中有一些改进,changelog中提到了确保在waitchld()不是从信号处理程序调用时阻塞SIGCHLD
。从外表看,这似乎是在处理另一个SIGCHLD信号时处理SIGCHLD信号的问题。
英文:
I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids'
and any earlier. I can't reproduce with bash:5.1.0 .
> What's going on?
It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler
is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论