英文:
Go exec.CommandContext is not being terminated after context timeout
问题
在golang中,我通常可以使用context.WithTimeout()
与exec.CommandContext()
结合使用,以在超时后自动终止(使用SIGKILL)命令。
但是,我遇到了一个奇怪的问题,如果我使用sh -c
包装命令,并通过设置cmd.Stdout = &bytes.Buffer{}
来缓冲命令的输出,超时将不再起作用,命令将永远运行。
为什么会发生这种情况?
这里是一个最小可复现的示例:
package main
import (
"bytes"
"context"
"os/exec"
"time"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
cmdArgs := []string{"sh", "-c", "sleep infinity;"}
bufferOutputs := true
// 取消下面两行的注释将解决该问题:
// cmdArgs = []string{"sleep", "infinity;"}
// bufferOutputs = false
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
if bufferOutputs {
cmd.Stdout = &bytes.Buffer{}
}
_ = cmd.Run()
}
我在这个问题上标记了Linux,因为我只验证了它在Ubuntu 20.04上发生,并且我不确定它是否会在其他平台上复现。
英文:
In golang, I can usually use context.WithTimeout()
in combination with exec.CommandContext()
to get a command to automatically be killed (with SIGKILL) after the timeout.
But I'm running into a strange issue that if I wrap the command with sh -c
AND buffer the command's outputs by setting cmd.Stdout = &bytes.Buffer{}
, the timeout no longer works, and the command runs forever.
Why does this happen?
Here is a minimal reproducible example:
package main
import (
"bytes"
"context"
"os/exec"
"time"
)
func main() {
ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
defer cancel()
cmdArgs := []string{"sh", "-c", "sleep infinity"}
bufferOutputs := true
// Uncommenting *either* of the next two lines will make the issue go away:
// cmdArgs = []string{"sleep", "infinity"}
// bufferOutputs = false
cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
if bufferOutputs {
cmd.Stdout = &bytes.Buffer{}
}
_ = cmd.Run()
}
I've tagged this question with Linux because I've only verified that this happens on Ubuntu 20.04 and I'm not sure whether it would reproduce on other platforms.
答案1
得分: 3
我的问题是,当上下文超时时,子进程sleep
没有被杀死。sh
父进程被杀死了,但子进程sleep
还在运行。
通常情况下,这仍然允许cmd.Wait()
调用成功,但问题是cmd.Wait()
等待进程退出和输出被复制。因为我们已经分配了cmd.Stdout
,所以我们必须等待sleep
进程的标准输出管道的读端关闭,但它永远不会关闭,因为进程仍在运行。
为了杀死子进程,我们可以将进程作为其自己的进程组长启动,通过设置Setpgid
位来实现,这样我们就可以使用其负PID来杀死进程以及任何子进程。
这是我想出的一个完全实现这一功能的exec.CommandContext
的替代方案:
type Cmd struct {
ctx context.Context
*exec.Cmd
}
// NewCommand类似于exec.CommandContext,但确保在上下文超时时杀死子进程,而不仅仅是顶级进程。
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
return &Cmd{ctx, exec.Command(command, args...)}
}
func (c *Cmd) Start() error {
// 强制启用setpgid位,以便在上下文超时或被取消时可以杀死子进程。
if c.Cmd.SysProcAttr == nil {
c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
}
c.Cmd.SysProcAttr.Setpgid = true
err := c.Cmd.Start()
if err != nil {
return err
}
go func() {
<-c.ctx.Done()
p := c.Cmd.Process
if p == nil {
return
}
// 通过负PID杀死进程组,包括我们生成的顶级进程以及它生成的任何子进程。
_ = syscall.Kill(-p.Pid, syscall.SIGKILL)
}()
return nil
}
func (c *Cmd) Run() error {
if err := c.Start(); err != nil {
return err
}
return c.Wait()
}
英文:
My issue was that the child sleep
process was not being killed when the context timed out. The sh
parent process was being killed, but the child sleep
was being left around.
This would normally still allow the cmd.Wait()
call to succeed, but the problem is that cmd.Wait()
waits for both the process to exit and for outputs to be copied. Because we've assigned cmd.Stdout
, we have to wait for the read-end of the sleep
process' stdout pipe to close, but it never closes because the process is still running.
In order to kill child processes, we can instead start the process as its own process group leader by setting the Setpgid
bit, which will then allow us to kill the process using its negative PID to kill the process as well as any subprocesses.
Here is a drop-in replacement for exec.CommandContext
I came up with that does exactly this:
type Cmd struct {
ctx context.Context
*exec.Cmd
}
// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
return &Cmd{ctx, exec.Command(command, args...)}
}
func (c *Cmd) Start() error {
// Force-enable setpgid bit so that we can kill child processes when the
// context times out or is canceled.
if c.Cmd.SysProcAttr == nil {
c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
}
c.Cmd.SysProcAttr.Setpgid = true
err := c.Cmd.Start()
if err != nil {
return err
}
go func() {
<-c.ctx.Done()
p := c.Cmd.Process
if p == nil {
return
}
// Kill by negative PID to kill the process group, which includes
// the top-level process we spawned as well as any subprocesses
// it spawned.
_ = syscall.Kill(-p.Pid, syscall.SIGKILL)
}()
return nil
}
func (c *Cmd) Run() error {
if err := c.Start(); err != nil {
return err
}
return c.Wait()
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论