调用Go中的setns函数在mnt命名空间中返回EINVAL错误。

huangapple go评论76阅读模式
英文:

Calling setns from Go returns EINVAL for mnt namespace

问题

C代码运行正常,并正确进入了命名空间,但Go代码似乎总是从setns调用中返回EINVAL错误,无法进入mnt命名空间。我尝试了多种排列组合(包括使用cgo和外部.so的嵌入式C代码)在Go的1.21.3和当前的tip版本上。

通过在gdb中逐步执行代码,我发现两个序列以完全相同的方式调用libc中的setns函数(至少在我看来是这样)。

我已经将问题归结为下面的代码。我做错了什么?

设置

我有一个用于启动快速busybox容器的shell别名:

alias startbb='docker inspect --format "{{ .State.Pid }}" $(docker run -d busybox sleep 1000000)'

运行后,startbb将启动一个容器并输出其PID。

lxc-checkconfig输出:

Found kernel config file /boot/config-3.8.0-44-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: missing
Network namespace: enabled
Multiple /dev/pts instances: enabled

--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: missing
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled

uname -a输出:

Linux gecko 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

正常工作的C代码

以下C代码工作正常:

#include <errno.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>

main(int argc, char* argv[]) {
    int i;
    char nspath[1024];
    char *namespaces[] = { "ipc", "uts", "net", "pid", "mnt" };

    if (geteuid()) { fprintf(stderr, "%s\n", "abort: you want to run this as root"); exit(1); }

    if (argc != 2) { fprintf(stderr, "%s\n", "abort: you must provide a PID as the sole argument"); exit(2); }

    for (i=0; i<5; i++) {
        sprintf(nspath, "/proc/%s/ns/%s", argv[1], namespaces[i]);
        int fd = open(nspath, O_RDONLY);
        
        if (setns(fd, 0) == -1) { 
            fprintf(stderr, "setns on %s namespace failed: %s\n", namespaces[i], strerror(errno));
        } else {
            fprintf(stdout, "setns on %s namespace succeeded\n", namespaces[i]);
        }

        close(fd);
    }
}

使用gcc -o checkns checkns.c编译后,运行sudo ./checkns <PID>的输出为:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace succeeded

失败的Go代码

相反,以下Go代码(应该是相同的)并不完全正常工作:

package main

import (
    "fmt"
    "os"
    "path/filepath"
    "syscall"
)

func main() {
    if syscall.Geteuid() != 0 {
        fmt.Println("abort: you want to run this as root")
        os.Exit(1)
    }

    if len(os.Args) != 2 {
        fmt.Println("abort: you must provide a PID as the sole argument")
        os.Exit(2)
    }

    namespaces := []string{"ipc", "uts", "net", "pid", "mnt"}

    for i := range namespaces {
        fd, _ := syscall.Open(filepath.Join("/proc", os.Args[1], "ns", namespaces[i]), syscall.O_RDONLY, 0644)
        err, _, msg := syscall.RawSyscall(308, uintptr(fd), 0, 0) // 308 == setns

        if err != 0 {
            fmt.Println("setns on", namespaces[i], "namespace failed:", msg)
        } else {
            fmt.Println("setns on", namespaces[i], "namespace succeeded")
        }

    }
}

相反,运行sudo go run main.go <PID>会产生以下输出:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace failed: invalid argument
英文:

The C code works fine and correctly enters the namespace, but the Go code always seems to return EINVAL from the setns call to enter the mnt namespace. I've tried a number of permutations (including embedded C code with cgo and external .so) on Go 1.2, 1.3 and the current tip.

Stepping through the code in gdb shows that both sequences are calling setns in libc the exact same way (or so it appears to me).

I have boiled what seems to be the issue down to the code below. What am I doing wrong?

Setup

I have a shell alias for starting quick busybox containers:

alias startbb=&#39;docker inspect --format &quot;{{ .State.Pid }}&quot; $(docker run -d busybox sleep 1000000)&#39;

After running this, startbb will start a container and output it's PID.

lxc-checkconfig outputs:

Found kernel config file /boot/config-3.8.0-44-generic
--- Namespaces ---
Namespaces: enabled
Utsname namespace: enabled
Ipc namespace: enabled
Pid namespace: enabled
User namespace: missing
Network namespace: enabled
Multiple /dev/pts instances: enabled

--- Control groups ---
Cgroup: enabled
Cgroup clone_children flag: enabled
Cgroup device: enabled
Cgroup sched: enabled
Cgroup cpu account: enabled
Cgroup memory controller: missing
Cgroup cpuset: enabled

--- Misc ---
Veth pair device: enabled
Macvlan: enabled
Vlan: enabled
File capabilities: enabled

uname -a produces:

Linux gecko 3.8.0-44-generic #66~precise1-Ubuntu SMP Tue Jul 15 04:01:04 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

Working C code

The following C code works fine:

#include &lt;errno.h&gt;
#include &lt;sched.h&gt;
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
#include &lt;fcntl.h&gt;

main(int argc, char* argv[]) {
	int i;
	char nspath[1024];
	char *namespaces[] = { &quot;ipc&quot;, &quot;uts&quot;, &quot;net&quot;, &quot;pid&quot;, &quot;mnt&quot; };

	if (geteuid()) { fprintf(stderr, &quot;%s\n&quot;, &quot;abort: you want to run this as root&quot;); exit(1); }

	if (argc != 2) { fprintf(stderr, &quot;%s\n&quot;, &quot;abort: you must provide a PID as the sole argument&quot;); exit(2); }

	for (i=0; i&lt;5; i++) {
		sprintf(nspath, &quot;/proc/%s/ns/%s&quot;, argv[1], namespaces[i]);
		int fd = open(nspath, O_RDONLY);
		
		if (setns(fd, 0) == -1) { 
			fprintf(stderr, &quot;setns on %s namespace failed: %s\n&quot;, namespaces[i], strerror(errno));
		} else {
			fprintf(stdout, &quot;setns on %s namespace succeeded\n&quot;, namespaces[i]);
		}

		close(fd);
	}
}

After compiling with gcc -o checkns checkns.c, the output of sudo ./checkns &lt;PID&gt; is:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace succeeded

Failing Go code

Conversely, the following Go code (which should be identical) doesn't work quite as well:

package main

import (
	&quot;fmt&quot;
	&quot;os&quot;
	&quot;path/filepath&quot;
	&quot;syscall&quot;
)

func main() {
	if syscall.Geteuid() != 0 {
		fmt.Println(&quot;abort: you want to run this as root&quot;)
		os.Exit(1)
	}

	if len(os.Args) != 2 {
		fmt.Println(&quot;abort: you must provide a PID as the sole argument&quot;)
		os.Exit(2)
	}

	namespaces := []string{&quot;ipc&quot;, &quot;uts&quot;, &quot;net&quot;, &quot;pid&quot;, &quot;mnt&quot;}

	for i := range namespaces {
		fd, _ := syscall.Open(filepath.Join(&quot;/proc&quot;, os.Args[1], &quot;ns&quot;, namespaces[i]), syscall.O_RDONLY, 0644)
		err, _, msg := syscall.RawSyscall(308, uintptr(fd), 0, 0) // 308 == setns

		if err != 0 {
			fmt.Println(&quot;setns on&quot;, namespaces[i], &quot;namespace failed:&quot;, msg)
		} else {
			fmt.Println(&quot;setns on&quot;, namespaces[i], &quot;namespace succeeded&quot;)
		}

	}
}

Instead, running sudo go run main.go &lt;PID&gt; produces:

setns on ipc namespace succeeded
setns on uts namespace succeeded
setns on net namespace succeeded
setns on pid namespace succeeded
setns on mnt namespace failed: invalid argument

答案1

得分: 8

(在Go项目上提交了一个问题)

所以,对于这个问题的答案是,你必须从单线程上下文中调用setns。这是有道理的,因为setns应该将当前线程加入到命名空间中。由于Go是多线程的,你需要在Go运行时线程启动之前进行setns调用。

认为这是因为调用syscall.RawSyscall的线程不是主线程--即使使用了runtime.LockOSThread,结果也不是你所期望的(即goroutine被“锁定”到主C线程,因此等效于下面解释的构造函数技巧)。

在提交问题后,我得到的回复建议使用“cgo构造函数技巧”。我找不到关于这个“技巧”的“正式”文档,但Docker/Michael Crosby的nsinit中使用了它,尽管我逐行检查了那段代码,但我没有尝试以这种方式运行它(请参见下面的沮丧)。

这个“技巧”基本上是你可以让cgo在Go运行时启动之前执行一个C函数。

为了做到这一点,你可以在你想在Go启动之前运行的函数上添加__attribute__((constructor))宏:

/*
__attribute__((constructor)) void init() {
    // this code will execute before Go starts up
    // in runs in a single-threaded C context
    // before Go's threads start running
}
*/
import "C"

使用这个作为模板,我修改了checkns.go,像这样:

/*
#include <sched.h>
#include <stdio.h>
#include <fcntl.h>

__attribute__((constructor)) void enter_namespace(void) {
   setns(open("/proc/<PID>/ns/mnt", O_RDONLY, 0644), 0);
}
*/
import "C"

... 文件的其余部分不变 ...

这段代码是有效的,但需要硬编码PID,因为它没有正确从命令行输入中读取,但它说明了这个想法(如果你从上面描述的容器中提供一个PID,它也可以工作)。

这让人沮丧,因为我想多次调用setns,但由于这段C代码在Go运行时启动之前执行,所以没有可用的Go代码。

**更新:**在内核邮件列表中搜索提供了这个链接,其中记录了这个问题。我似乎找不到它在任何实际发布的手册中,但这是来自对setns(2)的补丁的引用,由Eric Biederman确认:

> 如果进程是多线程的,则不能将其重新关联到新的挂载命名空间。更改挂载命名空间要求调用者在其自己的用户命名空间中具有CAP_SYS_CHROOT和CAP_SYS_ADMIN两个能力,并在目标挂载命名空间中具有CAP_SYS_ADMIN能力。

英文:

(There is an issue filed on the Go project)

So, the answer to this question is that you have to call setns from a single-threaded context. This makes sense since setns should join the current thread to the namespace. Since Go is multi-threaded, you need to make the setns call before the Go runtime threads start.

I think this is because the thread in which the call to syscall.RawSyscall executes is not the main thread -- even with runtime.LockOSThread the result is not what you would expect (ie. that the goroutine is "locked" to the main C thread and therefore equivalent to the constructor trick explained below).

The reply I got after filing the issue suggested using "the cgo constructor trick". I couldn't find any "proper" documentation on this "trick", but it is used in nsinit by Docker/Michael Crosby and even though I went over that code line by line, I didn't try running it this way (see below for frustration).

The "trick" is basically that you can get cgo to execute a C function prior to starting the Go runtime.

To do this, you add the __attribute__((constructor)) macro to decorate the function you want to run before Go starts up:

/*
__attribute__((constructor)) void init() {
    // this code will execute before Go starts up
    // in runs in a single-threaded C context
    // before Go&#39;s threads start running
}
*/
import &quot;C&quot;

Using this as a template, I modified checkns.go like this:

/*
#include &lt;sched.h&gt;
#include &lt;stdio.h&gt;
#include &lt;fcntl.h&gt;

__attribute__((constructor)) void enter_namespace(void) {
   setns(open(&quot;/proc/&lt;PID&gt;/ns/mnt&quot;, O_RDONLY, 0644), 0);
}
*/
import &quot;C&quot;

... rest of file is unchanged ...

This code works, but requires the PID to be hardcoded since it's not being read properly from the commandline input, but it illustrates the idea (and works if you provide a PID from a container started as described above).

It's frustrating because I wanted call setns multiple times but since this C code executes before the Go runtime starts, there is no Go code available.

Update: Shlepping around in the kernel mailing lists provides this link to a conversation that documents this. I can't seem to find it in any actually published manpages, but here's the quote from a patch to setns(2), confirmed by Eric Biederman:

> A process may not be reassociated with a new mount namespace if
> it is multi-threaded. Changing the mount namespace requires
> that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN
> capabilities in its own user namespace and CAP_SYS_ADMIN in the
> target mount namespace.

huangapple
  • 本文由 发表于 2014年9月7日 04:56:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/25704661.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定