2020年1月3日 23:29:34go评论85阅读模式

英文:

What are .LX routine tags for in x86-64 asm?

问题

我尝试在网上搜索了这个问题，但没有找到任何答案。我正在学习汇编条件跳转，并且正在处理这个C例程：

long absdiff (long x, long y) {
	long result;

	if (x &gt; y)
		result = x-y;
	else
		result = y-x;

	return result;
}

我的笔记说它返回类似于以下的汇编代码：

absdiff:
	cmpq %rsi, %rdi
	jle  .L4
	movq %rdi, %rax
	subq %rsi, %rax
	ret
.L4:
	movq %rsi, %rax
	subq %rdi, %rax
	ret

我理解，如果 x <= y，例程将跳转到.L4，然后从该跳转的下一条指令返回并继续执行，直到ret，我知道这是错误的。由于.L4 中写入了 %rax，我认为它的 ret 对整个例程起作用，而不仅仅是从中跳转的那个。但在使用gdb调试C例程时，我还看到了类似于以下的代码：

0x1119 &lt;absdiff&gt;     mov   %rdi,%rax
0x111c &lt;absdiff+3&gt;   cmp   %rsi,%rdi
0x111f &lt;absdiff+6&gt;   jle   0x1125 &lt;absdiff+12&gt;
0x1121 &lt;absdiff+8&gt;   sub   %rsi,%rax
0x1124 &lt;absdiff+11&gt;  retq
0x1125 &lt;absdiff+12&gt;  sub   %rdi,%rsi
0x1128 &lt;absdiff+15&gt;  mov   %rsi,%rax
0x112b &lt;absdiff+18&gt;  retq

在这里，我理解该例程在不同的地方返回，就像你在C例程中编写不同的返回语句一样。所以我的问题是：汇编语言中的.LX例程标签的含义是什么，它们与从中跳转的例程有什么关系？

英文:

I've tried searching for this online but I haven't found any answers for it. I'm studying the assembly conditional jumps and am working with this C routine:

long absdiff (long x, long y) {
	long result;

	if (x &gt; y)
		result = x-y;
	else
		result = y-x;

	return result;
}

My notes say that it returns an asm code the likes of this one:

absdiff:
	cmpq %rsi, %rdi
	jle  .L4
	movq %rdi, %rax
	subq %rsi, %rax
	ret
.L4:
	movq %rsi, %rax
	subq %rdi, %rax
	ret

As I understand, the routine would jump to .L4 if x <= y and then return to the next instruction from that jump and continue until ret, which I know is wrong. Since %rax is written in .L4, I asume its ret works for the whole routine, not the one jumped to, but I've also seen this code more like this when debugging the C routine with gdb:

0x1119 &lt;absdiff&gt;     mov   %rdi,%rax
0x111c &lt;absdiff+3&gt;   cmp   %rsi,%rdi
0x111f &lt;absdiff+6&gt;   jle   0x1125 &lt;absdiff+12&gt;
0x1121 &lt;absdiff+8&gt;   sub   %rsi,%rax
0x1124 &lt;absdiff+11&gt;  retq
0x1125 &lt;absdiff+12&gt;  sub   %rdi,%rsi
0x1128 &lt;absdiff+15&gt;  mov   %rsi,%rax
0x112b &lt;absdiff+18&gt;  retq

Here I understand that the routine returns on different points just like you would write different returns on a C routine. So my question is: What's the meaning of .LX routine tags in assembly language and what relationship do have with the routine they are jumped to from?

答案1

得分: 4

jle 指令执行的是一个跳转而不是调用。这会直接传递控制权，而不会将返回地址推送到堆栈上：它类似于 C 中的 goto，而不是像调用一样。这意味着以下的 ret 会返回给 absdiff 的调用者，因为这仍然是堆栈上的顶部返回地址。

英文:

The jle instruction performs a jump rather than a call. This transfers control directly, without pushing a return address onto the stack: it is like a goto in C, rather than like a call. That means that the following ret returns to absdiff's caller, since that's still the top return address on the stack.

答案2

得分: 2

以下是翻译好的部分：

标签名称（例如.L4）由编译器自动编号，每次编译器需要一个分支目标时都会如此。
Clang通过计算基本块的数量来为其标签编号（因此第一个函数中的第4个基本块将具有类似.LBB0_3的标签名称），但我认为GCC仅在发出跳转到该标签的（第一个）跳转指令时递增其标签计数器。
这就是为什么标签本身在函数内部不是严格按数字顺序递增的原因，只是在文件整体上是如此。
GCC永远不会跨越函数边界跳转到这些内部标签。
.Lname标签是局部标签，不会进入目标文件/可执行文件的符号表。这就是为什么你在调试器中看不到它们，只看到函数名称。
"我认为它的ret适用于整个例程，而不是跳转到的那个"，是的。ret并不是魔法。ret只是pop %rip。jne不会推送返回地址，所以它不是函数调用，只是普通的分支。
顺便说一句，从函数中有两种退出方式被称为"尾部重复"优化。不是让一条路径跳转到另一条路径，它们都只是进行任何清理和ret。执行将通过其中一种方式进行，而不是两者都执行。
"但是"？这只是从汇编+链接编译器生成的汇编中获得的内容。
符号命名的分支目标会被（在这种情况下由汇编器）替换为数值目标地址。（实际上编码为相对位移，类似于jcc rel8。）汇编器能够在链接时之前执行此操作，因为跳转在与目标相同的文件中，而且它是相对的。

英文:

The label names like .L4 are auto-numbered by the compiler, every time it wants a branch target.

Clang numbers its labels by counting basic blocks (so the 4th basic block in the first function will have a label name like .LBB0_3), but I think GCC only increments its label counter when it emits the (first) jump instruction that jumps there.

That's why the labels themselves aren't in strictly increasing numerical order within a function, only overall within a file.

GCC never jumps across function boundaries to these internal labels.

.Lname labels are local labels that don't go into the symbol table of the object file / executable. That's why you don't see them in the debugger, just the function names.

> I asume its ret works for the whole routine, not the one jumped to,

Yes. ret isn't magic. ret is just pop %rip. jne doesn't push a return address so it's not a function call, just a normal branch.

BTW, having two ways out of a function is called "tail duplication" optimization. Instead of having one path jump to the other, they both just do whatever cleanup and ret. Execution will go through one or the other, not both.

> but I've also seen this code more like this when debugging the C routine with gdb:

"but"? That's just what you get from assembling + linking the compiler-generated asm.

The symbolic named branch target is replaced (by the assembler in this case) with numeric target addresses. (Actually encoded as relative displacements, like jcc rel8.) The assembler is able to do it without waiting for link-time because the jump is in the same file as the target, and because it's relative.

答案3

得分: -1

Sure, here are the translated code portions:

one:
    b .L77
    nop
    nop
.L77:
    b two
    nop
    nop
two:
    b .three
    nop
    nop
    nop
.three:
    nop
    nop

00000000 <one>:
   0:	ea000001 	b	<c>
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	ea000001 	b	18 <two>
  10:	e1a00000 	nop			; (mov r0, r0)
  14:	e1a00000 	nop			; (mov r0, r0)

00000018 <two>:
  18:	ea000002 	b	28 <.three>
  1c:	e1a00000 	nop			; (mov r0, r0)
  20:	e1a00000 	nop			; (mov r0, r0)
  24:	e1a00000 	nop			; (mov r0, r0)

00000028 <.three>:
  28:	e1a00000 	nop			; (mov r0, r0)
  2c:	e1a00000 	nop			; (mov r0, r0)

I've translated the code sections as requested.

英文:

one:
    b .L77
    nop
    nop
.L77:
    b two
    nop
    nop
two:
    b .three
    nop
    nop
    nop
.three:
    nop
    nop
    


Disassembly of section .text:

00000000 &lt;one&gt;:
   0:	ea000001 	b	c &lt;one+0xc&gt;
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	ea000001 	b	18 &lt;two&gt;
  10:	e1a00000 	nop			; (mov r0, r0)
  14:	e1a00000 	nop			; (mov r0, r0)

00000018 &lt;two&gt;:
  18:	ea000002 	b	28 &lt;.three&gt;
  1c:	e1a00000 	nop			; (mov r0, r0)
  20:	e1a00000 	nop			; (mov r0, r0)
  24:	e1a00000 	nop			; (mov r0, r0)

00000028 &lt;.three&gt;:
  28:	e1a00000 	nop			; (mov r0, r0)
  2c:	e1a00000 	nop			; (mov r0, r0)

the compiler generates assembly the assembly is fed to the assembler and turned into an object. The compiler will need to generate labels independent of the labels you created (function names, etc), so this particular one uses .Ln where n is a number in a way that it is unique within that assembly language program/module/file.

This assembler clearly keeps the other non .Ln labels in the binary/object but discards the .Ln labels. Then you use a separate tool a disassembler which chooses how it wants to represent the machine code. In this case we get an absolute address b c means b 0xC as well as a helper, 0xC is at the offset 0xC from one the nearest label. Clearly simply putting a dot in front of the label is not how to make it disappear.

but this

one:
    b .L77
    nop
    nop
.L77:
    b two
    nop
    nop
two:
    b .Lthree
    nop
    nop
    nop
.Lthree:
    nop
    nop
    

00000000 &lt;one&gt;:
   0:	ea000001 	b	c &lt;one+0xc&gt;
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	ea000001 	b	18 &lt;two&gt;
  10:	e1a00000 	nop			; (mov r0, r0)
  14:	e1a00000 	nop			; (mov r0, r0)

00000018 &lt;two&gt;:
  18:	ea000002 	b	28 &lt;two+0x10&gt;
  1c:	e1a00000 	nop			; (mov r0, r0)
  20:	e1a00000 	nop			; (mov r0, r0)
  24:	e1a00000 	nop			; (mov r0, r0)
  28:	e1a00000 	nop			; (mov r0, r0)
  2c:	e1a00000 	nop			; (mov r0, r0)

does make it disappear so one would assume that .Lx is a valid label name but the assembler does not put it in the symbol table of the output binary. The code is correct it just doesn't have all the labels the assembly language had which is fine, the machine code has no labels its only a human readable thing. This mechanism allows the toolchain to easily generate intermediate labels per file and not have to magically figure out how to avoid conflicts (wouldn't be possible).

This assembler (family, gnu assembler, gas) also has this feature which is not used by the compilers but by some lazy coders:

1:
    b 1f
    b 1b
    b 2f
1:
    nop
    nop
2:


00000000 &lt;.text&gt;:
   0:	ea000001 	b	c &lt;.text+0xc&gt;
   4:	eafffffd 	b	0 &lt;.text&gt;
   8:	ea000001 	b	14 &lt;.text+0x14&gt;
   c:	e1a00000 	nop			; (mov r0, r0)
  10:	e1a00000 	nop			; (mov r0, r0)

1f means label 1: forward in the code 1b means label 1 backward in the code (the first occurrence in that direction). you can use the same label name 1: or a small number of them 1: 2: 3: all through your code for the same purpose as .Lx but you don't even have to have unique labels. perhaps this works for something other than numbers I have not tried.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

.LX例程标签在x86-64汇编中用于什么？

问题

答案1

答案2

答案3

构建交叉编译器的必要性

如何在SuperH汇编中多次分支？

如何理解Fortran中没有参数和括号的子程序。

你在64位英特尔CPU中执行`idiv`之前是否将有符号整数转换为十六进制？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论