英文:
golang mount namespace: mounted volume are not cleared after the process exits?
问题
以下是代码的翻译结果:
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
var command string = "/usr/bin/bash"
func container_command() {
fmt.Printf("启动容器命令 %s\n", command)
cmd := exec.Command(command)
cmd.SysProcAttr = &syscall.SysProcAttr{Cloneflags: syscall.CLONE_NEWPID |
syscall.CLONE_NEWNS,
}
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
fmt.Println("错误", err)
os.Exit(1)
}
}
func main() {
fmt.Printf("启动当前进程 %d\n", os.Getpid())
container_command()
fmt.Printf("命令结束\n")
}
运行这段代码并挂载一个目录,发现即使程序退出后,该目录仍然存在。
[root@localhost go]# go run namespace-1.go
启动当前进程 7558
启动容器命令 /usr/bin/bash
[root@ns-process go]# mount --bind /home /mnt
[root@ns-process go]# ls /mnt
vagrant
[root@ns-process go]# exit
exit
命令结束
[root@localhost go]# ls /mnt
vagrant
[root@localhost go]#
如果这是期望的行为,那么容器实现中是如何挂载 proc 的?因为如果我在命名空间内挂载 proc,我会得到以下结果:
[root@ns-process go]# mount -t proc /proc
[root@ns-process go]# exit
exit
命令结束
[root@localhost go]# mount
mount: failed to read mtab: No such file or directory
[root@localhost go]#
必须重新挂载 proc 才能恢复它。
更新:
在 C 语言中进行相同的操作也会得到相同的结果,我认为这应该是一种预期的行为。
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char* const container_args[] = {
"/bin/bash",
NULL
};
int container_main(void* arg)
{
printf("容器 [%5d] - 在容器中!\n", getpid());
sethostname("container",10);
system("mount -t proc proc /proc");
execv(container_args[0], container_args);
printf("出现问题了!\n");
return 1;
}
int main()
{
printf("启动一个容器!\n");
int container_pid = clone(container_main, container_stack+STACK_SIZE,
CLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);
waitpid(container_pid, NULL, 0);
printf("容器结束!\n");
return 0;
}
命令输出:
[root@localhost ~]# gcc a.c
[root@localhost ~]# ./a.out
启动一个容器!
容器 [ 1] - 在容器中!
[root@container ~]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 08:57 pts/0 00:00:00 /bin/bash
root 17 1 0 08:57 pts/0 00:00:00 ps -ef
[root@container ~]# exit
exit
容器结束!
[root@localhost ~]# ps -ef
错误,请执行:mount -t proc proc /proc
[root@localhost ~]# cat a.c
英文:
code below, I thought if I starts a process with syscall.CLONE_NEWNS, every mount option inside the namespace will be cleared when the process exits.
but it is not?
package main
import (
"fmt"
"os"
"os/exec"
"syscall"
)
var command string = "/usr/bin/bash"
func container_command() {
fmt.Printf("starting container command %s\n", command)
cmd := exec.Command(command)
cmd.SysProcAttr = &syscall.SysProcAttr{Cloneflags: syscall.CLONE_NEWPID |
syscall.CLONE_NEWNS,
}
cmd.Stdin = os.Stdin
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stderr
if err := cmd.Run(); err != nil {
fmt.Println("error", err)
os.Exit(1)
}
}
func main() {
fmt.Printf("starting current process %d\n", os.Getpid())
container_command()
fmt.Printf("command ended\n")
}
run this and mount a directory, this directory still exits after the program exits.
[root@localhost go]# go run namespace-1.go
starting current process 7558
starting container command /usr/bin/bash
[root@ns-process go]# mount --bind /home /mnt
[root@ns-process go]# ls /mnt
vagrant
[root@ns-process go]# exit
exit
command ended
[root@localhost go]# ls /mnt
vagrant
[root@localhost go]#
if this is the desired behavior, how is the proc get mounted in container implementations? because if I mount proc inside the namespace, I will get
[root@ns-process go]# mount -t proc /proc
[root@ns-process go]# exit
exit
command ended
[root@localhost go]# mount
mount: failed to read mtab: No such file or directory
[root@localhost go]#
proc has to be remounted to get it back.
update:
doing the same in C also gives the same result, I think this should be a intended behavior.
#define _GNU_SOURCE
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <sched.h>
#include <signal.h>
#include <unistd.h>
#define STACK_SIZE (1024 * 1024)
static char container_stack[STACK_SIZE];
char* const container_args[] = {
"/bin/bash",
NULL
};
int container_main(void* arg)
{
printf("Container [%5d] - inside the container!\n", getpid());
sethostname("container",10);
system("mount -t proc proc /proc");
execv(container_args[0], container_args);
printf("Something's wrong!\n");
return 1;
}
int main()
{
printf("start a container!\n");
int container_pid = clone(container_main, container_stack+STACK_SIZE,
CLONE_NEWUTS | CLONE_NEWPID | CLONE_NEWNS | SIGCHLD, NULL);
waitpid(container_pid, NULL, 0);
printf("container ended!\n");
return 0;
}
command output:
[root@localhost ~]# gcc a.c
[root@localhost ~]# ./a.out
start a container!
Container [ 1] - inside the container!
[root@container ~]# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 08:57 pts/0 00:00:00 /bin/bash
root 17 1 0 08:57 pts/0 00:00:00 ps -ef
[root@container ~]# exit
exit
container stopped!
[root@localhost ~]# ps -ef
Error, do this: mount -t proc proc /proc
[root@localhost ~]# cat a.c
答案1
得分: 2
这是由于命名空间之间的挂载事件传播导致的。您的挂载点的传播类型是MS_SHARED
。
MS_SHARED
:此挂载点与其“对等组”成员的其他挂载点共享挂载和卸载事件。当在此挂载点下添加或删除挂载点时,此更改将传播到对等组,以便挂载或卸载也将在每个对等挂载点下进行。传播也会以相反的方向发生,因此对等挂载上的挂载和卸载事件也会传播到此挂载点。
来源-https://lwn.net/Articles/689856/
/proc/self/mountinfo
中的shared:N
标签表示该挂载正在与对等组共享传播事件:
$ sudo go run namespace-1.go
[root@localhost]# mount --bind /home/andrii/test /mnt
# 传播类型为MS_SHARED
[root@localhost]# grep '/mnt' /proc/self/mountinfo
264 175 254:0 /home/andrii/test /mnt rw,noatime shared:1 - ext4
/dev/mapper/cryptroot rw,data=ordered
[root@localhost]# exit
$ ls /mnt
test_file
在大多数Linux发行版中,默认的传播类型是MS_SHARED
,由systemd
设置。请参阅man 7 mount_namespaces
中的NOTES
:
尽管新挂载点的默认传播类型在许多情况下是MS_PRIVATE,但MS_SHARED通常更有用。出于这个原因,systemd(1)在系统启动时自动将所有挂载点重新挂载为MS_SHARED。因此,在大多数现代系统上,默认的传播类型实际上是MS_SHARED。
如果您想要一个完全隔离的命名空间,可以通过以下方式使所有挂载点变为私有:
$ sudo go run namespace-1.go
[root@localhost]# mount --make-rprivate /
[root@localhost]# mount --bind /home/andrii/test /mnt
# 传播类型现在是MS_PRIVATE
[root@localhost]# grep '/mnt' /proc/self/mountinfo
264 175 254:0 /home/andrii/test /mnt rw,noatime - ext4
/dev/mapper/cryptroot rw,data=ordered
[root@localhost]# exit
$ ls /mnt
英文:
This happens due to mount events propagation between namespaces. The propagation type of your mount point is MS_SHARED
.
> MS_SHARED
: This mount point shares mount and unmount events with other mount points that are members of its "peer group". When a mount point is added or removed under this mount point, this change will propagate to the peer group, so that the mount or unmount will also take place under each of the peer mount points. Propagation also occurs in the reverse direction, so that mount and unmount events on a peer mount will also propagate to this mount point.
Source - https://lwn.net/Articles/689856/
The shared:N
tag in /proc/self/mountinfo
indicates that the mount is sharing propagation events with a peer group:
$ sudo go run namespace-1.go
[root@localhost]# mount --bind /home/andrii/test /mnt
# The propagation type is MS_SHARED
[root@localhost]# grep '/mnt' /proc/self/mountinfo
264 175 254:0 /home/andrii/test /mnt rw,noatime shared:1 - ext4
/dev/mapper/cryptroot rw,data=ordered
[root@localhost]# exit
$ ls /mnt
test_file
On most Linux distributions the default propagation type is MS_SHARED
which is set by systemd
. See NOTES
in man 7 mount_namespaces
:
> Notwithstanding the fact that the default propagation type for new
mount points is in many cases MS_PRIVATE, MS_SHARED is typically more
useful. For this reason, systemd(1) automatically remounts all mount
points as MS_SHARED on system startup. Thus, on most modern systems,
the default propagation type is in practice MS_SHARED.
If you want a fully isolated namespace, you can make all mount points private this way:
$ sudo go run namespace-1.go
[root@localhost]# mount --make-rprivate /
[root@localhost]# mount --bind /home/andrii/test /mnt
# The propagation type is MS_PRIVATE now
[root@localhost]# grep '/mnt' /proc/self/mountinfo
264 175 254:0 /home/andrii/test /mnt rw,noatime - ext4
/dev/mapper/cryptroot rw,data=ordered
[root@localhost]# exit
$ ls /mnt
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论