在剥离的二进制文件中,_cgo_topofstack@@Base 是一个符号名。

huangapple go评论100阅读模式
英文:

_cgo_topofstack@@Base in a stripped binary

问题

_cgo_topofstack@@Base 是一个在来自 Go 的剥离二进制文件中的上下文中的含义。在剥离的代码中,这个符号是由 cgo 生成的。cgo 是一个用于在 Go 代码中调用 C 代码的工具。在这个例子中,_cgo_topofstack@@Base 是一个与 _cgo_topofstack 相关的符号,它是 cgo 生成的用于管理 C 代码调用栈的函数。这个符号存在于剥离的二进制文件中,是因为 cgo 生成的代码在链接时被包含进去了。剥离二进制文件只是去除了一些调试信息和符号表,但并不会影响 cgo 生成的代码的存在。关于剥离二进制文件的更多信息,你可以参考 这个链接。至于为什么 objdump 会显示这个符号,可能是因为它在分析二进制文件时将所有的符号都显示出来了。

英文:

What does _cgo_topofstack@@Base mean in the context of a stripped binary coming from Go?

$ cat simple.go
package main
import
(
    "net"
    "time"
    "strconv"
)

func main() {
    tcpAddr, _ := net.ResolveTCPAddr("tcp4", ":7777")
    listener, _ := net.ListenTCP("tcp", tcpAddr)
    conn, _ := listener.Accept()
    daytime := time.Now().String()+strconv.Itoa(0xdeadface)
    conn.Write([]byte(daytime))
}

The code is supposed to be stripped - what does _cgo_topofstack@@Base mean?

$ go build -gcflags=-l -ldflags "-s -w" -o simple_wo_symbols simple.go
$ objdump -D -S simple_wo_symbols > simple_wo_symbols.human
$ sed -n "198899,198904p" simple_wo_symbols.human
  4b9860:	e8 db c1 fb ff       	callq  475a40 <_cgo_topofstack@@Base+0xe4c0>
  4b9865:	48 8b 44 24 18       	mov    0x18(%rsp),%rax
  4b986a:	48 89 44 24 70       	mov    %rax,0x70(%rsp)
  4b986f:	48 8b 4c 24 20       	mov    0x20(%rsp),%rcx
  4b9874:	48 89 4c 24 40       	mov    %rcx,0x40(%rsp)
  4b9879:	ba ce fa ad de       	mov    $0xdeadface,%edx

EDIT (better specification of the question):

  • why does this symbol exist in a stripped binary?
  • ratify peter-cordes claim: called function is completely unrelated to the function at _cgo_topofstack@@Base, and it is an objdump (weird?) thing to add this (irrelevant and redundant) info
  • maybe related to this(?): is there a Go-idiomatic way of stripping?!

答案1

得分: 3

"_cgo_topofstack@@Base"是一个在你的剥离二进制文件中仍然存在的符号,出于某种原因。你的调用是对地址"0xe4c0"的调用,与实际"_cgo_topofstack"代码完全无关的函数存在于那里。

反汇编器通常将地址描述为符号加偏移量。

这种风格对于数据数组来说是有意义的(例如,将类似"x = global_array[10]"这样的编译成从"global_array+40"加载的操作,如果"global_array"的符号仍然存在),对于函数内的跳转也是如此。但对于这种情况通常没有帮助,除了让你看到附近的内容,并且有更小的数字可供查看。

与其实现复杂的逻辑来决定是否打印"symbol+offset"版本的地址,而不是只打印数值绝对地址,更容易(而且没有错误的风险)的方法是由汇编器始终这样做。从地址向后搜索并找到第一个符号。或者对于在一个节的第一个符号之前的地址,打印为"foo - 0x..."。人们需要运用判断力和经验来理解输出,特别是在查看剥离二进制文件的反汇编时。

(反汇编器无法查看的标志来检测是否为剥离二进制文件;检测这一点将是一种启发式方法,例如注意到大多数直接的"call"目标是没有自己的符号的地址。)

据我所知,GNU Binutils的"objdump"没有选项可以不打印地址的符号版本。"--no-addresses"有不同的作用。


我不确定"@@Base"是什么意思。虽然它似乎不是Go特有的。在我的x86-64 Arch GNU/Linux系统上,"objdump -d /bin/ls"(这是一个剥离的PIE可执行文件)将很多地址显示为类似"22d60 <_obstack_memory_used@@Base+0xc2a0>"的东西。所以那个符号恰好是在该程序的大部分代码之前的最后一个符号。

其他"@@"的情况包括该二进制文件中的glibc符号ABI版本控制,例如"23298 <optarg@@GLIBC_2.2.5>"。这个Arch Linux二进制文件是在一个最新的Arch Linux系统上编译的,实际上没有与古老的glibc 2.2.5链接,但我认为这意味着"optarg"的类型或其他东西自glibc 2.2.5以来没有改变。可能不仅仅是自那时以来,但2.2.5可能是glibc开始以这种方式命名符号的时候。请对这段话持怀疑态度,因为我不太清楚"libc.so"是如何安排"ld"用这些"@@"版本化的名称替换符号名(如"stderr"),或者这一历史。

英文:

_cgo_topofstack@@Base is a symbol that does still exist for some reason in your stripped binary. Your call is to an address 0xe4c0 beyond that, whatever function lives there, completely unrelated to the actual _cgo_topofstack code.

It's normal for disassemblers to describe addresses as symbol+offset.

That style makes sense for data arrays (e.g. compiling something like x = global_array[10] into a load from global_array+40, if the symbol for global_array is still around), and for jumps within functions. It's usually not helpful for cases like this, other than to let you see what's nearby, and to have smaller numbers to look at.

Instead of implementing fancy logic to decide whether or not to bother printing a symbol+offset version of an address, instead of just the numeric absolute address, it's much easier (and no risk of being wrong) for assemblers to just always do it. Search backward from the address and take the first symbol found. Or for addresses before the first symbol in a section, print as foo - 0x.... It's up to humans to use judgement and experience to make sense of the output, especially when looking at disassembly of stripped binaries.

(There isn't a flag a disassembler can look at to detect a stripped binary or not; detecting this would be a matter of a heuristic like noticing that most direct call targets are to addresses without their own symbol.)

AFAIK, GNU Binutils objdump doesn't have an option not to print symbolic versions of addresses. --no-addresses does something different.


I'm not sure what the @@Base is about. It doesn't seem to be unique to Go, though. On my x86-64 Arch GNU/Linux system, objdump -d /bin/ls (which is a stripped PIE executable) shows a lot of addresses as things like 22d60 &lt;_obstack_memory_used@@Base+0xc2a0&gt;. So that's the symbol that happened to be last before the bulk of the code for that program.

Other cases of @@ include glibc symbol ABI versioning in that same binary, e.g. 23298 &lt;optarg@@GLIBC_2.2.5&gt;. This Arch Linux binary was compiled on an up-to-date Arch Linux system, not actually linked against an ancient glibc 2.2.5, but I think that means optarg's type or something hasn't changed since glibc 2.2.5. And probably not since earlier, but 2.2.5 might have been when glibc started naming symbols this way. Take this paragraph with a big grain of salt because I don't really know how libc.so arranges for ld to substitute symbol names like stderr with these @@ versioned names, or the history of this.

答案2

得分: 2

关于_cgo_topofstack的问题,你可以在Go 1.4中的当前形式中了解到它原始名称为cgo_topofstack

(但是,正如Peter Cordes评论中所指出的那样,这并不能解释为什么这个符号会在剥离的二进制文件中仍然存在)

// Called from cgo wrappers, this function returns g->m->curg.stack.hi.
// Must obey the gcc calling convention.
TEXT cgo_topofstack(SB),NOSPLIT,$0
	get_tls(CX)
	MOVL	g(CX), AX
	MOVL	g_m(AX), AX
	MOVL	m_curg(AX), AX
	MOVL	(g_stack+stack_hi)(AX), AX
	RET

它是为了修复golang/go/issue 8771而存在的:

> ## cmd/cgo: C functions that return values fail if they call a Go callback that copies the stack

> Cgo使用一个调用C代码的包装函数,传递堆栈帧的地址。
这个包装函数由GCC编译,并调用用户编写的真实函数。

> 用户的函数可以调用Go回调函数。
这些Go回调函数将在原始调用者的堆栈上运行。
它们可能会导致堆栈复制。

> 如果在Go回调期间复制了堆栈,那么GCC编译的包装函数的调用者将在不同的位置运行。
GCC编译的包装函数使用的堆栈帧指针不会更新,因为堆栈复制器当然对GCC编译的代码一无所知。
我认为这对于函数的参数来说不是一个问题;当包装函数调用真实函数时,它们已经从堆栈帧中复制出来了。

> 但是,对于返回值的C函数来说,这是一个问题。
包装函数将获取C函数返回的值,并使用其指向堆栈帧的指针进行存储。
如果发生堆栈复制,该指针将不会被更新。
换句话说,包装函数可能会将返回值存储在旧的堆栈上,而不是新的堆栈上。

CL 144130043添加了以下内容:

> ## cgo: adjust return value location to account for stack copies.

> 在cgo调用期间,堆栈可能会被复制。
这个复制使得cgo对返回值区域的指针无效。

> 为了解决这个问题,传递包含堆栈顶部值的位置的地址(在G结构中)。
对于返回值的cgo函数,在cgo调用之前和之后读取stktop,以计算必要的调整来写入返回值。

它还通过提交e1364a6进行了修改。


@@部分应该是objdump的一个选项的结果,即--symbols

> 显示文件的符号表部分的条目(如果有的话)。
如果一个符号与版本信息相关联,则也会显示该信息。

> 版本字符串显示为符号名称的后缀,前面带有@字符。例如,foo@VER_1

> 如果版本是解析未版本化引用到符号时要使用的默认版本,则显示为后缀,前面带有两个@字符。例如,foo@@VER_2

英文:

Regarding what _cgo_topofstack is about, you can see it introduced in its current form in Go 1.4, original name cgo_topofstack

(But, as noted by Peter Cordes in the comments, this does not explain why that symbol would still be present in a stripped binary)

// Called from cgo wrappers, this function returns g-&gt;m-&gt;curg.stack.hi.
// Must obey the gcc calling convention.
TEXT cgo_topofstack(SB),NOSPLIT,$0
	get_tls(CX)
	MOVL	g(CX), AX
	MOVL	g_m(AX), AX
	MOVL	m_curg(AX), AX
	MOVL	(g_stack+stack_hi)(AX), AX
	RET

It was for fixing golang/go/issue 8771:

> ## cmd/cgo: C functions that return values fail if they call a Go callback that copies the stack

> Cgo uses a wrapper function that calls C code, passing the address of the stack frame.
This wrapper function is compiled by GCC, and it calls the real function written by the user.
>
> The user's function is permitted to call Go callbacks.
Those Go callbacks will run on the stack of the original caller.
They may cause a stack copy.
>
> If the stack gets copied during a Go callback, then the caller of the GCC-compiled wrapper is running in a different location.
The stack frame pointer used by the GCC-compiled wrapper is not updated, since of course the stack copier knows nothing about GCC-compiled code.
I don't think this is a problem for the arguments to the function; they have already been copied out of the stack frame when the wrapper calls the real function.
>
> However, it is a problem for C functions that return a value.
The wrapper will take the value returned by the C function, and store it using its pointer to the stack frame. That pointer will not have been updated if a stack copy occurs.
In other words, the wrapper may store the return value on the old stack, not the new one.

CL 144130043 adds:

> ## cgo: adjust return value location to account for stack copies.

> During a cgo call, the stack can be copied.
This copy invalidates the pointer that cgo has into the return value area.
>
> To fix this problem, pass the address of the location containing the stack
top value (which is in the G struct).
For cgo functions which return values, read the stktop before and after the cgo call to compute the adjustment necessary to write the return value.

It was amended with commit e1364a6.


The '@@' part should be the result of an option of objdump, --symbols

> Displays the entries in symbol table section of the file, if it has one.
If a symbol has version information associated with it then this is displayed as well.
>
> The version string is displayed as a suffix to the symbol name, preceeded by an @ character. For example foo@VER_1.
>
> If the version is the default version to be used when resolving unversioned references to the symbol then it is displayed as a suffix preceeded by two @ characters. For example foo@@VER_2.

huangapple
  • 本文由 发表于 2021年7月8日 15:34:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/68297387.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定