Go的exec.CommandContext在上下文超时后没有被终止。

huangapple go评论70阅读模式
英文:

Go exec.CommandContext is not being terminated after context timeout

问题

在golang中,我通常可以使用context.WithTimeout()exec.CommandContext()结合使用,以在超时后自动终止(使用SIGKILL)命令。

但是,我遇到了一个奇怪的问题,如果我使用sh -c包装命令,并通过设置cmd.Stdout = &bytes.Buffer{}来缓冲命令的输出,超时将不再起作用,命令将永远运行。

为什么会发生这种情况?

这里是一个最小可复现的示例:

package main

import (
	"bytes"
	"context"
	"os/exec"
	"time"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
	defer cancel()

	cmdArgs := []string{"sh", "-c", "sleep infinity;"}
	bufferOutputs := true

	// 取消下面两行的注释将解决该问题:

	// cmdArgs = []string{"sleep", "infinity;"}
	// bufferOutputs = false

	cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
	if bufferOutputs {
		cmd.Stdout = &bytes.Buffer{}
	}
	_ = cmd.Run()
}

我在这个问题上标记了Linux,因为我只验证了它在Ubuntu 20.04上发生,并且我不确定它是否会在其他平台上复现。

英文:

In golang, I can usually use context.WithTimeout() in combination with exec.CommandContext() to get a command to automatically be killed (with SIGKILL) after the timeout.

But I'm running into a strange issue that if I wrap the command with sh -c AND buffer the command's outputs by setting cmd.Stdout = &bytes.Buffer{}, the timeout no longer works, and the command runs forever.

Why does this happen?

Here is a minimal reproducible example:

package main

import (
	"bytes"
	"context"
	"os/exec"
	"time"
)

func main() {
	ctx, cancel := context.WithTimeout(context.Background(), 100*time.Millisecond)
	defer cancel()

	cmdArgs := []string{"sh", "-c", "sleep infinity"}
	bufferOutputs := true

	// Uncommenting *either* of the next two lines will make the issue go away:

	// cmdArgs = []string{"sleep", "infinity"}
	// bufferOutputs = false

	cmd := exec.CommandContext(ctx, cmdArgs[0], cmdArgs[1:]...)
	if bufferOutputs {
		cmd.Stdout = &bytes.Buffer{}
	}
	_ = cmd.Run()
}

I've tagged this question with Linux because I've only verified that this happens on Ubuntu 20.04 and I'm not sure whether it would reproduce on other platforms.

答案1

得分: 3

我的问题是,当上下文超时时,子进程sleep没有被杀死。sh父进程被杀死了,但子进程sleep还在运行。

通常情况下,这仍然允许cmd.Wait()调用成功,但问题是cmd.Wait()等待进程退出和输出被复制。因为我们已经分配了cmd.Stdout,所以我们必须等待sleep进程的标准输出管道的读端关闭,但它永远不会关闭,因为进程仍在运行。

为了杀死子进程,我们可以将进程作为其自己的进程组长启动,通过设置Setpgid位来实现,这样我们就可以使用其负PID来杀死进程以及任何子进程。

这是我想出的一个完全实现这一功能的exec.CommandContext的替代方案:

type Cmd struct {
	ctx context.Context
	*exec.Cmd
}

// NewCommand类似于exec.CommandContext,但确保在上下文超时时杀死子进程,而不仅仅是顶级进程。
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
	return &Cmd{ctx, exec.Command(command, args...)}
}

func (c *Cmd) Start() error {
	// 强制启用setpgid位,以便在上下文超时或被取消时可以杀死子进程。
	if c.Cmd.SysProcAttr == nil {
		c.Cmd.SysProcAttr = &syscall.SysProcAttr{}
	}
	c.Cmd.SysProcAttr.Setpgid = true
	err := c.Cmd.Start()
	if err != nil {
		return err
	}
	go func() {
		<-c.ctx.Done()
		p := c.Cmd.Process
		if p == nil {
			return
		}
		// 通过负PID杀死进程组,包括我们生成的顶级进程以及它生成的任何子进程。
		_ = syscall.Kill(-p.Pid, syscall.SIGKILL)
	}()
	return nil
}

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}
英文:

My issue was that the child sleep process was not being killed when the context timed out. The sh parent process was being killed, but the child sleep was being left around.

This would normally still allow the cmd.Wait() call to succeed, but the problem is that cmd.Wait() waits for both the process to exit and for outputs to be copied. Because we've assigned cmd.Stdout, we have to wait for the read-end of the sleep process' stdout pipe to close, but it never closes because the process is still running.

In order to kill child processes, we can instead start the process as its own process group leader by setting the Setpgid bit, which will then allow us to kill the process using its negative PID to kill the process as well as any subprocesses.

Here is a drop-in replacement for exec.CommandContext I came up with that does exactly this:

type Cmd struct {
	ctx context.Context
	*exec.Cmd
}

// NewCommand is like exec.CommandContext but ensures that subprocesses
// are killed when the context times out, not just the top level process.
func NewCommand(ctx context.Context, command string, args ...string) *Cmd {
	return &amp;Cmd{ctx, exec.Command(command, args...)}
}

func (c *Cmd) Start() error {
	// Force-enable setpgid bit so that we can kill child processes when the
	// context times out or is canceled.
	if c.Cmd.SysProcAttr == nil {
		c.Cmd.SysProcAttr = &amp;syscall.SysProcAttr{}
	}
	c.Cmd.SysProcAttr.Setpgid = true
	err := c.Cmd.Start()
	if err != nil {
		return err
	}
	go func() {
		&lt;-c.ctx.Done()
		p := c.Cmd.Process
		if p == nil {
			return
		}
		// Kill by negative PID to kill the process group, which includes
        // the top-level process we spawned as well as any subprocesses
        // it spawned.
		_ = syscall.Kill(-p.Pid, syscall.SIGKILL)
	}()
	return nil
}

func (c *Cmd) Run() error {
	if err := c.Start(); err != nil {
		return err
	}
	return c.Wait()
}

huangapple
  • 本文由 发表于 2022年4月2日 09:38:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/71714228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定