C/C++:获取从共享库导入的函数的地址

huangapple go评论71阅读模式
英文:

C/C++: Taking the address of a function imported from a shared library

问题

I recently learnt of PLT/GOT in ELF, and am now confused: how are addresses-of-imported-functions taken and stored into function pointers?

我最近学到了ELF中的PLT/GOT,现在感到困惑:导入函数的地址是如何获取并存储到函数指针中的?

I tested and two addresses-of-an-imported-function taken at different shared libraries are indeed equal (as the standard requires), but this seems to require some magic I don't understand.

我进行了测试,发现在不同的共享库中获取的两个导入函数的地址确实相等(如标准要求的那样),但这似乎需要一些我不理解的魔法。

These can't be the addresses of the PLT slots, as they are different for different shared libs. These also can't be the address at the exporting library, as this is resolved (by default) only after the function is actually called.

这些不能是PLT槽的地址,因为它们对于不同的共享库是不同的。这也不能是导出库的地址,因为这仅在实际调用函数之后才会被解析(默认情况下)。

What am I missing?

我漏掉了什么?

The question is phrased in Linux context, but the same could be asked about windows and IAT slots. Also, not sure about the 'linkers' tag but I suspect it might be relevant.

这个问题是在Linux环境下提出的,但在Windows和IAT槽中也可能提出类似的问题。不确定'linkers'标签是否相关。

英文:

I recently learnt of PLT/GOT in ELF, and am now confused: how are addresses-of-imported-functions taken and stored into function pointers?

I tested and two addresses-of-an-imported-function taken at different shared libraries are indeed equal (as the standard requires), but this seems to require some magic I don't understand.

These can't be the addresses of the PLT slots, as they are different for different shared libs. These also can't be the address at the exporting library, as this is resolved (by default) only after the function is actually called.

What am I missing?

The question is phrased in Linux context, but the same could be asked about windows and IAT slots. Also, not sure about the 'linkers' tag but I suspect it might be relevant.

答案1

得分: 2

以下是您要翻译的内容:

"让我们从创建一个示例开始:

// foo.c int foo() { return 42; } // bar.c #include <stdio.h> extern int foo(); int bar() { printf(" %s:%d: &foo = %p\n", __FILE__, __LINE__, &foo); return foo(); } // main.c #include <stdio.h> extern int foo(); extern int bar(); int main() { printf("%s:%d: &foo = %p\n", __FILE__, __LINE__, &foo); return bar(); } 使用以下命令构建它: gcc -g -fPIC -shared -o foo.so foo.c && gcc -g -fPIC -shared -o bar.so bar.c && gcc -g main.c ./bar.so ./foo.so -no-pie-no-pie 不是必需的,但它使得调试更容易)。 $ ./a.out main.c:7: &foo = 0x7f3d66a180f9 bar.c:6: &foo = 0x7f3d66a180f9 它成功了。现在我们准备回答“魔法是如何发生的?”。

首先让我们检查main的反汇编:


(gdb) disas main 函数 main 的汇编代码转储:
   0x0000000000401136 <+0>: push   %rbp
   0x0000000000401137 <+1>: mov    %rsp,%rbp
   0x000000000040113a <+4>: mov    0x2e9f(%rip),%rax        # 0x403fe0
   0x0000000000401141 <+11>: mov    %rax,%rcx
   0x0000000000401144 <+14>: mov    $0x7,%edx
   0x0000000000401149 <+19>: lea    0xeb4(%rip),%rax        # 0x402004
   0x0000000000401150 <+26>: mov    %rax,%rsi
   0x0000000000401153 <+29>: lea    0xeb1(%rip),%rax        # 0x40200b
   0x000000000040115a <+36>: mov    %rax,%rdi
   0x000000000040115d <+39>: mov    $0x0,%eax
   0x0000000000401162 <+44>: callq  0x401040 <printf@plt>
   0x0000000000401167 <+49>: mov    $0x0,%eax
   0x000000000040116c <+54>: callq  0x401030 <bar@plt>
   0x0000000000401171 <+59>: pop    %rbp
   0x0000000000401172 <+60>: retq
``` 这里我们可以看到`printf`的最后一个参数来自于加载地址`0x403fe0`处的值。那个地址上有什么内容?

``` readelf -WS a.out | grep '.got' [22] .got PROGBITS 0000000000403fd0 002fd0 000018 08 WA 0 0 8 [23] .got.plt PROGBITS 0000000000403fe8 002fe8 000028 08 WA 0 0 8 ``` 显然,该地址是`&.got[2]`。这个值是如何到达那里的?回到GDB:

``` (gdb) watch *(void**)0x403fe0 硬件观察点 1: *(void**)0x403fe0 (gdb) run 启动程序:/tmp/shlib/a.out

Hardware watchpoint 1: *(void**)0x403fe0

Old value = (void *) 0x0 New value = (void *) 0x7ffff7fba0f9 elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, scope=<optimized out>, map=0x7ffff7ffe2e0) at ../sysdeps/x86_64/dl-machine.h:408 408 ../sysdeps/x86_64/dl-machine.h: 没有那个文件或目录。

(gdb) bt #0 elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, scope=<optimized out>, map=0x7ffff7ffe2e0) at ../sysdeps/x86_64/dl-machine.h:408 #1 _dl_relocate_object (l=l@entry=0x7ffff7ffe2e0, scope=<optimized out>, reloc_mode=<optimized out>, consider_profiling=<optimized out>, consider_profiling@entry=0) at ./elf/dl-reloc.c:301 #2 0x00007ffff7fe8c09 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at ./elf/rtld.c:2322 #3 0x00007ffff7fe519f in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffd960, dl_main=dl_main@entry=0x7ffff7fe6e10 <dl_main>) at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140 #4 0x00007ffff7fe6b1c in _dl_start_final (arg=<error reading variable: Cannot access memory at address 0xffffd8c8>) at ./elf/rtld.c:497 #5 _dl_start (arg=<optimized out>) at ./elf/rtld.c:584 #6 0x00007ffff7fe59c8 in _start () from /lib64/ld-linux-x86-64.so.2 ``` 因此,运行时加载程序将值放在那里作为重新定位`a.out`的一部分(在frame#1中,您可以看到`l->addr == 0`和`l->name == ""`,这对应于主可执行文件)。

是什么导致加载程序在未调用`foo`的情况下解析`foo`?

``` readelf -Wr a.out | egrep 'foo|bar' 000000000

<details>
<summary>英文:</summary>

Let&#39;s start by creating an example:

// foo.c
int foo() { return 42; }

// bar.c
#include <stdio.h>

extern int foo();
int bar()
{
printf(" %s:%d: &foo = %p\n", FILE, LINE, &foo);
return foo();
}

// main.c
#include <stdio.h>

extern int foo();
extern int bar();
int main()
{
printf("%s:%d: &foo = %p\n", FILE, LINE, &foo);
return bar();
}

Build it with:

gcc -g -fPIC -shared -o foo.so foo.c &&
gcc -g -fPIC -shared -o bar.so bar.c &&
gcc -g main.c ./bar.so ./foo.so -no-pie

(the `-no-pie` is not necessary, but makes it easier to debug).

$ ./a.out
main.c:7: &foo = 0x7f3d66a180f9
bar.c:6: &foo = 0x7f3d66a180f9

It worked. Now we are ready to answer &quot;how does the magic happen?&quot;.

First let&#39;s examine `main` disassembly:

gdb -q ./a.out
Reading symbols from ./a.out...

(gdb) disas main
Dump of assembler code for function main:
0x0000000000401136 <+0>: push %rbp
0x0000000000401137 <+1>: mov %rsp,%rbp
0x000000000040113a <+4>: mov 0x2e9f(%rip),%rax # 0x403fe0
0x0000000000401141 <+11>: mov %rax,%rcx
0x0000000000401144 <+14>: mov $0x7,%edx
0x0000000000401149 <+19>: lea 0xeb4(%rip),%rax # 0x402004
0x0000000000401150 <+26>: mov %rax,%rsi
0x0000000000401153 <+29>: lea 0xeb1(%rip),%rax # 0x40200b
0x000000000040115a <+36>: mov %rax,%rdi
0x000000000040115d <+39>: mov $0x0,%eax
0x0000000000401162 <+44>: callq 0x401040 <printf@plt>
0x0000000000401167 <+49>: mov $0x0,%eax
0x000000000040116c <+54>: callq 0x401030 <bar@plt>
0x0000000000401171 <+59>: pop %rbp
0x0000000000401172 <+60>: retq

Here we can see that the last argument to `printf` comes from loading a value at address `0x403fe0`. What is at that address?

readelf -WS a.out | grep '.got'
[22] .got PROGBITS 0000000000403fd0 002fd0 000018 08 WA 0 0 8
[23] .got.plt PROGBITS 0000000000403fe8 002fe8 000028 08 WA 0 0 8

Apparently that address is `&amp;.got[2]`. How does the value and up there? Back to GDB:

(gdb) watch (void*)0x403fe0
Hardware watchpoint 1: (void*)0x403fe0
(gdb) run
Starting program: /tmp/shlib/a.out

Hardware watchpoint 1: (void*)0x403fe0

Old value = (void *) 0x0
New value = (void *) 0x7ffff7fba0f9
elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, scope=<optimized out>, map=0x7ffff7ffe2e0) at ../sysdeps/x86_64/dl-machine.h:408
408 ../sysdeps/x86_64/dl-machine.h: No such file or directory.
(gdb) bt
#0 elf_dynamic_do_Rela (skip_ifunc=<optimized out>, lazy=<optimized out>, nrelative=<optimized out>, relsize=<optimized out>, reladdr=<optimized out>, scope=<optimized out>, map=0x7ffff7ffe2e0) at ../sysdeps/x86_64/dl-machine.h:408
#1 _dl_relocate_object (l=l@entry=0x7ffff7ffe2e0, scope=<optimized out>, reloc_mode=<optimized out>, consider_profiling=<optimized out>, consider_profiling@entry=0) at ./elf/dl-reloc.c:301
#2 0x00007ffff7fe8c09 in dl_main (phdr=<optimized out>, phnum=<optimized out>, user_entry=<optimized out>, auxv=<optimized out>) at ./elf/rtld.c:2322
#3 0x00007ffff7fe519f in _dl_sysdep_start (start_argptr=start_argptr@entry=0x7fffffffd960, dl_main=dl_main@entry=0x7ffff7fe6e10 <dl_main>) at ../sysdeps/unix/sysv/linux/dl-sysdep.c:140
#4 0x00007ffff7fe6b1c in _dl_start_final (arg=<error reading variable: Cannot access memory at address 0xffffd8c8>) at ./elf/rtld.c:497
#5 _dl_start (arg=<optimized out>) at ./elf/rtld.c:584
#6 0x00007ffff7fe59c8 in _start () from /lib64/ld-linux-x86-64.so.2

So the runtime loader put the value there as part of relocating the `a.out` (in frame#1 you can see that `l-&gt;addr == 0` and `l-&gt;name == &quot;&quot;`, which correspond to the main executable).

What caused the loader to resolve `foo` without it being called?

readelf -Wr a.out | egrep 'foo|bar'
0000000000403fe0 0000000500000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0
0000000000404000 0000000200000007 R_X86_64_JUMP_SLOT 0000000000000000 bar + 0

Here you can see that calling a function (`bar` here) and taking address of a function (`foo` here) results in _different_ relocation records.

The `JUMP` relocation can be resolved lazily (when the function is called), but `GLOB_DAT` can not. The loader has to resolve all `GLOB_DAT` relocations at load time, and it does.

Likewise, in `bar.so` we have:

gdb -q ./bar.so
(gdb) disas bar
0x0000000000001109 <+0>: push %rbp
0x000000000000110a <+1>: mov %rsp,%rbp
0x000000000000110d <+4>: mov 0x2ebc(%rip),%rax # 0x3fd0
0x0000000000001114 <+11>: mov %rax,%rcx
0x0000000000001117 <+14>: mov $0x6,%edx
0x000000000000111c <+19>: lea 0xedd(%rip),%rax # 0x2000
...

readelf -Wr bar.so | grep foo
0000000000003fd0 0000000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0

readelf -WS bar.so | grep '.got'
[11] .plt.got PROGBITS 0000000000001040 001040 000010 08 AX 0 0 8
[20] .got PROGBITS 0000000000003fc0 002fc0 000028 08 WA 0 0 8
[21] .got.plt PROGBITS 0000000000003fe8 002fe8 000020 08 WA 0 0 8

so the `&amp;foo` is filled in `&amp;bar.so:.got[2]` at load time as well.

P.S. We can also look in the output from `readelf -Wr a.out bar.so` to see what other relocations are present and why `&amp;foo` is filled in the third slot of the GOT:

File: a.out

Relocation section '.rela.dyn' at offset 0x4f0 contains 3 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000403fd0 0000000100000006 R_X86_64_GLOB_DAT 0000000000000000 __libc_start_main@GLIBC_2.34 + 0
0000000000403fd8 0000000400000006 R_X86_64_GLOB_DAT 0000000000000000 gmon_start + 0
0000000000403fe0 0000000500000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0
...

File: bar.so

Relocation section '.rela.dyn' at offset 0x3f8 contains 8 entries:
Offset Info Type Symbol's Value Symbol's Name + Addend
0000000000003df0 0000000000000008 R_X86_64_RELATIVE 1100
0000000000003df8 0000000000000008 R_X86_64_RELATIVE 10c0
0000000000004008 0000000000000008 R_X86_64_RELATIVE 4008
0000000000003fc0 0000000100000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_deregisterTMCloneTable + 0
0000000000003fc8 0000000300000006 R_X86_64_GLOB_DAT 0000000000000000 gmon_start + 0
0000000000003fd0 0000000400000006 R_X86_64_GLOB_DAT 0000000000000000 foo + 0
0000000000003fd8 0000000500000006 R_X86_64_GLOB_DAT 0000000000000000 _ITM_registerTMCloneTable + 0
0000000000003fe0 0000000600000006 R_X86_64_GLOB_DAT 0000000000000000 __cxa_finalize@GLIBC_2.2.5 + 0

The fact that both relocation records happened to address the third slot in `.got` is a coincidence -- the slots could easily be different.

</details>



huangapple
  • 本文由 发表于 2023年5月13日 22:43:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76243294.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定