为什么bash会“忘记”我的后台进程?

huangapple go评论54阅读模式
英文:

Why does bash "forget" about my background processes?

问题

以下是您提供的代码的翻译部分:

#!/bin/bash
pids=()
for i in $(seq 1 999); do
  sleep 1 &
  pids+=( "$!" )
done
for pid in "${pids[@]}"; do
  wait "$pid"
done

我期望的行为如下:

  • 在第一个循环中执行
  • 在第一个PID上等待大约一秒钟
  • 继续执行第二个循环

而实际上,我收到以下错误信息:

./foo.sh: line 8: wait: pid 24752 is not a child of this shell

(不同的PID会重复出现171次)

如果我使用较短的循环(50而不是999)运行脚本,则不会出现错误。

发生了什么?

注:我在Windows上使用GNU bash 4.4.23。

英文:

I have this code:

#!/bin/bash
pids=()
for i in $(seq 1 999); do
  sleep 1 &
  pids+=( "$!" )
done
for pid in "${pids[@]}"; do
  wait "$pid"
done

I expect the following behavior:

  • spin through the first loop
  • wait about a second on the first pid
  • spin through the second loop

Instead, I get this error:

./foo.sh: line 8: wait: pid 24752 is not a child of this shell

(repeated 171 times with different pids)

If I run the script with shorter loop (50 instead of 999), then I get no errors.

What's going on?

Edit: I am using GNU bash 4.4.23 on Windows.

答案1

得分: 4

POSIX says:
>实现不必在当前shell执行环境中保留超过{CHILD_MAX}最近的已知进程ID条目。

{CHILD_MAX} 在此处指的是每个用户允许的同时运行进程的最大数量。您可以使用getconf实用程序获取此限制的值:

$ getconf CHILD_MAX
13195

Bash将至多两倍于该数量的已退出后台进程状态存储在一个循环缓冲区中,当您对旧进程的PID调用wait时,如果已被覆盖,则会显示not a child of this shell。您可以查看它的实现在这里

英文:

POSIX says:
>The implementation need not retain more than the {CHILD_MAX} most recent entries in its list of known process IDs in the current shell execution environment.

{CHILD_MAX} here refers to the maximum number of simultaneous processes allowed per user. You can get the value of this limit using the getconf utility:

$ getconf CHILD_MAX
13195

Bash stores the statuses of at most twice as that many exited background processes in a circular buffer, and says not a child of this shell when you call wait on the PID of an old one that's been overwritten. You can see how it's implemented here.

答案2

得分: 3

  1. sleep通过fork+exec在后台执行。
  2. 在某个时刻,sleep退出,留下一个僵尸进程。
  3. 该僵尸进程保持在原地,保持其PID,直到其父进程调用wait来检索其退出代码。

然而,像bash这样的shell实际上会以稍有不同的方式处理这个问题。它们主动回收它们的僵尸子进程并将它们的退出代码存储在内存中,以便可以释放这些进程使用的系统资源。然后当你调用wait时,shell只是将内存中存储的值交给你,但那时候僵尸进程可能早就消失了。

现在,由于所有这些退出状态都被存储在内存中,有一个实际上的限制,即在你在shell中没有调用wait之前,有多少后台进程可以退出,否则你会耗尽shell中可用于存储这些信息的所有内存。我预计你在你的环境中的进程数在几百个时达到了这个限制,而其他用户在他们的环境中可以达到几千个。不管怎样,最终的结果都是一样的 - 最终没有地方可以存储有关你的子进程的信息,因此这些信息会丢失。

英文:

The way you might reasonably expect this to work, as it would if you wrote a similar program in most other languages, is:

  1. sleep is executed in the background via a fork+exec.
  2. At some point, sleep exits leaving behind a zombie.
  3. That zombie remains in place, holding its PID, until its parent calls wait to retrieve its exit code.

However, shells such as bash actually do this a little differently. They proactively reap their zombie children and store their exit codes in memory so that they can deallocate the system resources those processes were using. Then when you wait the shell just hands you whatever value is stored in memory, but the zombie could be long gone by then.

Now, because all of these exit statuses are being stored in memory, there is a practical limit to how many background processes can exit without you calling wait before you've filled up all the memory you have available for this in the shell. I expect that you're hitting this limit somewhere in the several hundreds of processes in your environment, while other users manage to make it into the several thousands in theirs. Regardless, the outcome is the same - eventually there's nowhere to store information about your children and so that information is lost.

答案3

得分: 1

我可以在ArchLinux上重现这个问题,使用以下命令:docker run -ti --rm bash:5.0.18 bash -c 'pids=; for ((i=1;i<550;++i)); do true & pids+=" $!"; done; wait $pids' 和任何更早的版本都可以。在bash:5.1.0中无法重现。

> 发生了什么?

看起来是你的Bash版本中的一个错误。在Bash:5.1中的jobs.c和wait.def中有一些改进,changelog中提到了确保在waitchld()不是从信号处理程序调用时阻塞SIGCHLD。从外表看,这似乎是在处理另一个SIGCHLD信号时处理SIGCHLD信号的问题。

英文:

I can reproduce on ArchLinux with docker run -ti --rm bash:5.0.18 bash -c &#39;pids=; for ((i=1;i&lt;550;++i)); do true &amp; pids+=&quot; $!&quot;; done; wait $pids&#39; and any earlier. I can't reproduce with bash:5.1.0 .

> What's going on?

It looks like a bug in your version of Bash. There were a couple of improvements in jobs.c and wait.def in Bash:5.1 and Make sure SIGCHLD is blocked in all cases where waitchld() is not called from a signal handler is mentioned in the changelog. From the look of it, it looks like an issue with handling a SIGCHLD signal while already handling another SIGCHLD signal.

huangapple
  • 本文由 发表于 2023年2月6日 13:23:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75357592.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定