在pt_regs中,为什么bp和sp的值差异如此之大,而且为什么bp小于sp。

huangapple go评论66阅读模式
英文:

In pt_regs why the value of bp and sp differ so greatly and why bp is smaller than sp

问题

在终端中使用“ls”命令时,如果要观察进程创建的过程,并在terminal中使用gdb设置在arch/x86/kernel/process.c的copy_thread函数处断点,然后打印pt_regs的值。

{bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xa0f38e8, bp = 0x8266000, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, ip = 0xb7f29549, cs = 0x73, __csh = 0x0, flags = 0x206, sp = 0xbfab35f0, ss = 0x7b, __ssh = 0x0}

pt_regs的bp是0x8266000,pt_regs的sp是0xbfab35f0。我已经找到它们被赋值的地方。pt_regs的sp在arch/x86/entry/common.c的do_SYSENTER_32函数中被赋值。

__visible noinstr long do_SYSENTER_32(struct pt_regs *regs)
{
    /* SYSENTER loses RSP, but the vDSO saved it in RBP. */
    regs->sp = regs->bp;

    /* SYSENTER clobbers EFLAGS.IF.  Assume it was set in usermode. */
    regs->flags |= X86_EFLAGS_IF;

    return do_fast_syscall_32(regs);
}

pt_regs的bp在__do_fast_syscall_32中通过get_user函数赋值。看起来是从用户空间的值。

static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
{
    // 做其他的事情...

    /* 从vDSO存储的位置获取EBP。*/
    if (IS_ENABLED(CONFIG_X86_64)) {
        /*
         * 微小的优化:我们正在追踪的指针明确是32位的,所以它不可能超出范围。
         */
        res = __get_user(*(u32 *)&regs->bp,
             (u32 __user __force *)(unsigned long)(u32)regs->sp);
    } else {
        res = get_user(*(u32 *)&regs->bp,
               (u32 __user __force *)(unsigned long)(u32)regs->sp);
    }

    // 做其他的事情...
    return true;
}

堆栈显示了函数调用的顺序。

#0  copy_thread (clone_flags=clone_flags@entry=18874368, sp=0, arg=0, p=0xc31c0a00, tls=0)
    at arch/x86/kernel/process.c:133
#1  0xc1058722 in copy_process (pid=pid@entry=0x0, trace=trace@entry=0, node=node@entry=-1, 
    args=<optimized out>) at kernel/fork.c:2122
#2  0xc10593cc in kernel_clone (args=args@entry=0xc68e9f38) at kernel/fork.c:2500
#3  0xc1059807 in __do_sys_clone (child_tidptr=0xa0f38e8, tls=0, parent_tidptr=0x0, newsp=0, 
    clone_flags=<optimized out>) at kernel/fork.c:2617
#4  __se_sys_clone (child_tidptr=168769768, tls=0, parent_tidptr=0, newsp=0, 
    clone_flags=<optimized out>) at kernel/fork.c:2585
#5  __ia32_sys_clone (regs=<optimized out>) at kernel/fork.c:2585
#6  0xc1b04b85 in do_syscall_32_irqs_on (nr=<optimized out>, regs=0xc68e9fb4)
    at arch/x86/entry/common.c:77
#7  __do_fast_syscall_32 (regs=regs@entry=0xc68e9fb4) at arch/x86/entry/common.c:140
#8  0xc1b04c29 in do_fast_syscall_32 (regs=0xc68e9fb4) at arch/x86/entry/common.c:165
#9  0xc1b04c75 in do_SYSENTER_32 (regs=<optimized out>) at arch/x86/entry/common.c:208
#10 0xc1b0e32f in entry_SYSENTER_32 () at arch/x86/entry/entry_32.S:952
#11 0x01200011 in ?? ()
#12 0x00000000 in ?? ()

我有以下疑问:

  • 为什么pt_regs中的ebp和esp的值相差如此之大?
  • 为什么pt_regs中的ebp的值比pt_regs中的esp的值小,因为栈是向下增长的?

我使用的是可调试的Linux-5.12.10版本,而“ls”命令是从Busybox编译而来的。

英文:

When watching the creation of process when tapping ‘ls‘ in terminal, set breakpoint at copy_thread of arch/x86/kernel/process.c with gdb, then print values of pt_regs.

{bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xa0f38e8, bp = 0x8266000,
  ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, __esh = 0x0, fs = 0x0, __fsh = 0x0,
  gs = 0x33, __gsh = 0x0, orig_ax = 0x78, ip = 0xb7f29549, cs = 0x73, __csh = 0x0, flags = 0x206,
  sp = 0xbfab35f0, ss = 0x7b, __ssh = 0x0}

the bp of pt_regs is 0x8266000, sp of pt_regs is 0xbfab35f0.
I have find the place where they are assiged.
the sp of pt_regs is assigned in do_SYSENTER_32 of arch/x86/entry/common.c

__visible noinstr long do_SYSENTER_32(struct pt_regs *regs)
{
	/* SYSENTER loses RSP, but the vDSO saved it in RBP. */
	regs-&gt;sp = regs-&gt;bp;

	/* SYSENTER clobbers EFLAGS.IF.  Assume it was set in usermode. */
	regs-&gt;flags |= X86_EFLAGS_IF;

	return do_fast_syscall_32(regs);
}

the bp of pt_regs is assigned in __do_fast_syscall_32 by get_user. It seems from userspace value.

static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
{
	// do other stuff...

	/* Fetch EBP from where the vDSO stashed it. */
	if (IS_ENABLED(CONFIG_X86_64)) {
		/*
		 * Micro-optimization: the pointer we&#39;re following is
		 * explicitly 32 bits, so it can&#39;t be out of range.
		 */
		res = __get_user(*(u32 *)&amp;regs-&gt;bp,
			 (u32 __user __force *)(unsigned long)(u32)regs-&gt;sp);
	} else {
		res = get_user(*(u32 *)&amp;regs-&gt;bp,
		       (u32 __user __force *)(unsigned long)(u32)regs-&gt;sp);
	}

    // do other stuff...
	return true;
}

and the stack shows the order of functions.

#0  copy_thread (clone_flags=clone_flags@entry=18874368, sp=0, arg=0, p=0xc31c0a00, tls=0)
    at arch/x86/kernel/process.c:133
#1  0xc1058722 in copy_process (pid=pid@entry=0x0, trace=trace@entry=0, node=node@entry=-1, 
    args=&lt;optimized out&gt;) at kernel/fork.c:2122
#2  0xc10593cc in kernel_clone (args=args@entry=0xc68e9f38) at kernel/fork.c:2500
#3  0xc1059807 in __do_sys_clone (child_tidptr=0xa0f38e8, tls=0, parent_tidptr=0x0, newsp=0, 
    clone_flags=&lt;optimized out&gt;) at kernel/fork.c:2617
#4  __se_sys_clone (child_tidptr=168769768, tls=0, parent_tidptr=0, newsp=0, 
    clone_flags=&lt;optimized out&gt;) at kernel/fork.c:2585
#5  __ia32_sys_clone (regs=&lt;optimized out&gt;) at kernel/fork.c:2585
#6  0xc1b04b85 in do_syscall_32_irqs_on (nr=&lt;optimized out&gt;, regs=0xc68e9fb4)
    at arch/x86/entry/common.c:77
#7  __do_fast_syscall_32 (regs=regs@entry=0xc68e9fb4) at arch/x86/entry/common.c:140
#8  0xc1b04c29 in do_fast_syscall_32 (regs=0xc68e9fb4) at arch/x86/entry/common.c:165
#9  0xc1b04c75 in do_SYSENTER_32 (regs=&lt;optimized out&gt;) at arch/x86/entry/common.c:208
#10 0xc1b0e32f in entry_SYSENTER_32 () at arch/x86/entry/entry_32.S:952
#11 0x01200011 in ?? ()
#12 0x00000000 in ?? ()

I have doubt below:

  • Why do the ebp and esp stored in pt_regs differ so greatly?
  • Why is the value of ebp stored in pt_regs smaller than the value of
    esp stored in pt_regs, since the stack grows downward?

I used the debuggable linux-5.12.10,and the command 'ls' is compiled from busybox.

答案1

得分: 3

考虑遗留的 INT $0x80 系统调用机制和现代 IA32 快速系统调用机制在寄存器和堆栈使用方面的差异:

寄存器/堆栈 遗留系统调用 快速系统调用
eax 系统调用号 系统调用号
ebx arg1 arg1
ecx arg2 arg2
edx arg3 arg3
esi arg4 arg4
edi arg5 arg5
ebp arg6 用户堆栈指针
用户堆栈上的参数 arg6

对于快速系统调用机制,当 entry_SYSENTER_32 构造内核堆栈上的 struct pt_regs 条目时,sp 成员将指向内核堆栈,而 bp 成员将指向用户堆栈。因此,快速系统调用机制会修正 spbp 成员以与遗留系统调用机制兼容。sp 成员的值在 do_SYSENTER_32() 中被校正:

/* SYSENTER 失去了 RSP,但 vDSO 保存在 RBP 中。 */
regs->sp = regs->bp;

bp 成员的值在 __do_fast_syscall_32() 中被校正,设置为用户堆栈中的 arg6 值:

/* 从 vDSO 存储的位置获取 EBP。 */
if (IS_ENABLED(CONFIG_X86_64)) {
    /*
     * 微小优化:我们正在跟随的指针明确为 32 位,因此不会超出范围。
     */
    res = __get_user(*(u32 *)&regs->bp,
         (u32 __user __force *)(unsigned long)(u32)regs->sp);
} else {
    res = get_user(*(u32 *)&regs->bp,
           (u32 __user __force *)(unsigned long)(u32)regs->sp);
}

当从 do_int80_syscall_32()(用于遗留系统调用机制)或从 __do_fast_syscall_32()(用于快速系统调用机制)调用 do_syscall_32_irqs_on() 时,regs->bpregs->sp 的值将如预期一样,无论使用哪种系统调用机制。


快速系统调用的另一个修正出现在 regs->ip 上。sysenter 指令会丢失 EIP 寄存器的原始值,通常从 vDSO 中的 __kernel_vsyscall() 函数执行。regs->ipdo_fast_syscall_32() 中被校正:

/*
 * 使用内部 vDSO SYSENTER/SYSCALL32 调用约定调用。调整 regs,使其看起来像是通过 int80 进入的。
 */
unsigned long landing_pad = (unsigned long)current->mm->context.vdso +
					vdso_image_32.sym_int80_landing_pad;

/*
 * SYSENTER 失去了 EIP,即使 SYSCALL32 也需要我们向前跳转
 * 以便 'regs->ip -= 2' 落回到 int $0x80 指令。
 * 修正它。
 */
regs->ip = landing_pad;

vDSO 中的 sysenter 指令后面紧跟着一个 int $0x80 指令。landing_pad 值是该 int $0x80 指令之后的地址,因此在从快速系统调用返回时不会到达该指令。

vDSO 中存在一个 int $0x80 指令,紧跟在 sysenter 指令之后。landing_pad 值是该 int $0x80 指令之后的地址,因此当从快速系统调用返回时将不会到达该指令,以有效将快速系统调用转换为旧的系统调用,以支持不支持 sysentersysexit 指令的旧 CPU。在这种情况下,vDSO 中的 __kernel_vsyscall() 中的 mov %esp, %ebp; sysenter 指令序列将被替换为 nop 指令,并且 CPU 将到达该指令序列之后紧接着的 int $0x80 指令,从而将快速系统调用变成旧的系统调用。那个遗留的系统调用将返回到紧接着 int $0x80 指令之后的点,就像快速系统调用一样。

英文:

Consider the difference in register and stack usage for the legacy INT $0x80 system call mechanism and the modern fast system call mechanism for IA32:

Register / stack Legacy system call Fast system call
eax system call number system call number
ebx arg1 arg1
ecx arg2 arg2
edx arg3 arg3
esi arg4 arg4
edi arg5 arg5
ebp arg6 user stack pointer
arg on user stack arg6

For the fast system call mechanism, when entry_SYSENTER_32 constructs the struct pt_regs entry on the kernel stack, the sp member will point to the kernel stack and the bp member will point to the user stack. Therefore, the fast system call mechanism fixes up the sp and bp members for compatibility with the legacy system call mechanism. The sp member value is corrected in do_SYSENTER_32():

    /* SYSENTER loses RSP, but the vDSO saved it in RBP. */
    regs-&gt;sp = regs-&gt;bp;

The bp member value is corrected in __do_fast_syscall_32(), setting it to the arg6 value from the user stack:

    /* Fetch EBP from where the vDSO stashed it. */
    if (IS_ENABLED(CONFIG_X86_64)) {
        /*
         * Micro-optimization: the pointer we&#39;re following is
         * explicitly 32 bits, so it can&#39;t be out of range.
         */
        res = __get_user(*(u32 *)&amp;regs-&gt;bp,
             (u32 __user __force *)(unsigned long)(u32)regs-&gt;sp);
    } else {
        res = get_user(*(u32 *)&amp;regs-&gt;bp,
               (u32 __user __force *)(unsigned long)(u32)regs-&gt;sp);
    }

When do_syscall_32_irqs_on() is called from do_int80_syscall_32() (for the legacy system call mechanism) or from __do_fast_syscall_32() (for the fast system call mechanism), the regs-&gt;bp and regs-&gt;sp values will be as expected no matter which of the system call mechanisms was used.


Another fix-up for fast system calls occurs for regs-&gt;ip. The original value of the EIP register is lost by the sysenter instruction, which is normally executed from the __kernel_vsyscall() function in the vDSO. regs-&gt;ip is corrected in do_fast_syscall_32():

	/*
	 * Called using the internal vDSO SYSENTER/SYSCALL32 calling
	 * convention.  Adjust regs so it looks like we entered using int80.
	 */
	unsigned long landing_pad = (unsigned long)current-&gt;mm-&gt;context.vdso +
					vdso_image_32.sym_int80_landing_pad;

	/*
	 * SYSENTER loses EIP, and even SYSCALL32 needs us to skip forward
	 * so that &#39;regs-&gt;ip -= 2&#39; lands back on an int $0x80 instruction.
	 * Fix it up.
	 */
	regs-&gt;ip = landing_pad;

The vDSO contains an int $0x80 instruction immediately after the sysenter instruction. The landing_pad value is the address just after that int $0x80 instruction, so that instruction will not be reached when returning from the fast system call.

The reason for the int $0x80 instruction in the vDSO is to support older CPUs that lack the sysenter and sysexit instructions. In that case, the mov %esp, %ebp; sysenter instruction sequence in __kernel_vsyscall() in the vDSO will be replaced with nop instructions and the CPU will reach the int $0x80 instruction that immediately follows that instruction sequence, effectively changing the fast system call into a legacy system call for older CPUs. That legacy system call will return to the point just after the int $0x80 instruction just like the fast system call.

答案2

得分: 0

十八天后,我再次进行了实验,并注意到一个奇怪的现象。
我使用32位的Ubuntu系统编译一个程序,简化后的程序如下:

#include<unistd.h>

int main(int argc, char* argv[]) {
    fork();
    return 0;
}

将编译后的 a.out 放入我的 rootfs.img.gz 并启动 QEMU:

qemu-system-i386 -m 256m -kernel ./bzImage -initrd ./rootfs.img.gz -append "root=/dev/ram init=/linuxrc nokaslr" -serial file:output.txt -s -S

然后使用 GDB,设置断点 break copy_thread

在 QEMU 中的 Linux Shell 中输入命令 ./a.out

因为在 copy_thread 函数中有一行代码 *childregs = *current_pt_regs(),所以我可以通过 print childregs 来查看用户栈信息。

当 Shell 创建 a.out 进程时,Linux 内核停在 copy_thread。此时,我输入以下命令 p/x *childregs

(gdb) p/x *childregs
$10 = {bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0x9acb3e8, 
  bp = 0x8289000, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, 
  __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, 
  ip = 0xb7f93549, cs = 0x73, __csh = 0x0, flags = 0x216, sp = 0xbfe797ec, 
  ss = 0x7b, __ssh = 0x0}

Shell 的栈信息为 bp = 0x8289000, sp = 0xbfe797ecbp 的值非常奇怪。

a.out 执行 fork() 时,Linux 内核再次停在 copy_thread。此时,我输入以下命令 p/x *childregs

(gdb) p/x *childregs
$11 = {bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xb7eeb128, 
  bp = 0xbfa23818, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, 
  __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, 
  ip = 0xb7ef0549, cs = 0x73, __csh = 0x0, flags = 0x246, sp = 0xbfa237d0, 
  ss = 0x7b, __ssh = 0x0}

a.out 的栈信息为 bp = 0xbfa23818,sp = 0xbfa237d0bp 的值大于 sp 的值,两者之间的差异并不大。这符合我的预期。

但是当 Shell 创建 a.out 进程时,bp 的值为 0x8289000,我不知道当时到底发生了什么。

英文:

Eighteen days later, I did the experiment again and noticed a strange phenomenon.
I use 32-bit ubuntu system to compile a program, the simplified program is

#include&lt;unistd.h&gt;

int main(int argc, char* argv[]) {
	fork();
	return 0;
}

Place the compiled a.out into my rootfs.img.gz and start qemu

qemu-system-i386 -m 256m -kernel ./bzImage -initrd ./rootfs.img.gz -append &quot;root=/dev/ram init=/linuxrc nokaslr&quot; -serial file:output.txt -s -S

Then use gdb, set break copy_thread

I enter the command ./a.out in shell of linux in qemu.

Because there is a code *childregs = *current_pt_regs() in copy_thread function, I can watch the user stack info by print childregs

When shell create process of a.out, the linux kernel stopped at copy_thread. At the time I enter p/x *childregs

(gdb) p/x *childregs
$10 = {bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0x9acb3e8, 
  bp = 0x8289000, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, 
  __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, 
  ip = 0xb7f93549, cs = 0x73, __csh = 0x0, flags = 0x216, sp = 0xbfe797ec, 
  ss = 0x7b, __ssh = 0x0}

The shell's stack info is bp = 0x8289000,sp = 0xbfe797ec. bp's value is very strange.

When a.out run fork(), the linux kernel stopped at copy_thread again. At the time I enter p/x *childregs.

(gdb) p/x *childregs
$11 = {bx = 0x1200011, cx = 0x0, dx = 0x0, si = 0x0, di = 0xb7eeb128, 
  bp = 0xbfa23818, ax = 0xffffffda, ds = 0x7b, __dsh = 0x0, es = 0x7b, 
  __esh = 0x0, fs = 0x0, __fsh = 0x0, gs = 0x33, __gsh = 0x0, orig_ax = 0x78, 
  ip = 0xb7ef0549, cs = 0x73, __csh = 0x0, flags = 0x246, sp = 0xbfa237d0, 
  ss = 0x7b, __ssh = 0x0}

The a.out stack info is bp = 0xbfa23818,sp = 0xbfa237d0. bp's value is larger than sp's value, and the differ between the values is not so much greatly. This is what I expected

But when shell create a.out process, the bp is 0x8289000, I don't know what exactly happened at that time.

huangapple
  • 本文由 发表于 2023年7月10日 17:02:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定