.LX例程标签在x86-64汇编中用于什么?

huangapple go评论68阅读模式
英文:

What are .LX routine tags for in x86-64 asm?

问题

我尝试在网上搜索了这个问题,但没有找到任何答案。我正在学习汇编条件跳转,并且正在处理这个C例程:

long absdiff (long x, long y) {
	long result;

	if (x > y)
		result = x-y;
	else
		result = y-x;

	return result;
}

我的笔记说它返回类似于以下的汇编代码:

absdiff:
	cmpq %rsi, %rdi
	jle  .L4
	movq %rdi, %rax
	subq %rsi, %rax
	ret
.L4:
	movq %rsi, %rax
	subq %rdi, %rax
	ret

我理解,如果 x <= y,例程将跳转到.L4,然后从该跳转的下一条指令返回并继续执行,直到ret,我知道这是错误的。由于.L4 中写入了 %rax,我认为它的 ret 对整个例程起作用,而不仅仅是从中跳转的那个。但在使用gdb调试C例程时,我还看到了类似于以下的代码:

0x1119 &lt;absdiff&gt;     mov   %rdi,%rax
0x111c &lt;absdiff+3&gt;   cmp   %rsi,%rdi
0x111f &lt;absdiff+6&gt;   jle   0x1125 &lt;absdiff+12&gt;
0x1121 &lt;absdiff+8&gt;   sub   %rsi,%rax
0x1124 &lt;absdiff+11&gt;  retq
0x1125 &lt;absdiff+12&gt;  sub   %rdi,%rsi
0x1128 &lt;absdiff+15&gt;  mov   %rsi,%rax
0x112b &lt;absdiff+18&gt;  retq

在这里,我理解该例程在不同的地方返回,就像你在C例程中编写不同的返回语句一样。所以我的问题是:汇编语言中的.LX例程标签的含义是什么,它们与从中跳转的例程有什么关系?

英文:

I've tried searching for this online but I haven't found any answers for it. I'm studying the assembly conditional jumps and am working with this C routine:

long absdiff (long x, long y) {
	long result;

	if (x &gt; y)
		result = x-y;
	else
		result = y-x;

	return result;
}

My notes say that it returns an asm code the likes of this one:

absdiff:
	cmpq %rsi, %rdi
	jle  .L4
	movq %rdi, %rax
	subq %rsi, %rax
	ret
.L4:
	movq %rsi, %rax
	subq %rdi, %rax
	ret

As I understand, the routine would jump to .L4 if x &lt;= y and then return to the next instruction from that jump and continue until ret, which I know is wrong. Since %rax is written in .L4, I asume its ret works for the whole routine, not the one jumped to, but I've also seen this code more like this when debugging the C routine with gdb:

0x1119 &lt;absdiff&gt;     mov   %rdi,%rax
0x111c &lt;absdiff+3&gt;   cmp   %rsi,%rdi
0x111f &lt;absdiff+6&gt;   jle   0x1125 &lt;absdiff+12&gt;
0x1121 &lt;absdiff+8&gt;   sub   %rsi,%rax
0x1124 &lt;absdiff+11&gt;  retq
0x1125 &lt;absdiff+12&gt;  sub   %rdi,%rsi
0x1128 &lt;absdiff+15&gt;  mov   %rsi,%rax
0x112b &lt;absdiff+18&gt;  retq

Here I understand that the routine returns on different points just like you would write different returns on a C routine. So my question is: What's the meaning of .LX routine tags in assembly language and what relationship do have with the routine they are jumped to from?

答案1

得分: 4

jle 指令执行的是一个 跳转 而不是 调用。 这会直接传递控制权,而不会将返回地址推送到堆栈上:它类似于 C 中的 goto,而不是像调用一样。 这意味着以下的 ret 会返回给 absdiff 的调用者,因为这仍然是堆栈上的顶部返回地址。

英文:

The jle instruction performs a jump rather than a call. This transfers control directly, without pushing a return address onto the stack: it is like a goto in C, rather than like a call. That means that the following ret returns to absdiff's caller, since that's still the top return address on the stack.

答案2

得分: 2

以下是翻译好的部分:

  1. 标签名称(例如.L4)由编译器自动编号,每次编译器需要一个分支目标时都会如此。

  2. Clang通过计算基本块的数量来为其标签编号(因此第一个函数中的第4个基本块将具有类似.LBB0_3的标签名称),但我认为GCC仅在发出跳转到该标签的(第一个)跳转指令时递增其标签计数器。

  3. 这就是为什么标签本身在函数内部不是严格按数字顺序递增的原因,只是在文件整体上是如此。

  4. GCC永远不会跨越函数边界跳转到这些内部标签。

  5. .Lname标签是局部标签,不会进入目标文件/可执行文件的符号表。这就是为什么你在调试器中看不到它们,只看到函数名称。

  6. "我认为它的ret适用于整个例程,而不是跳转到的那个",是的。ret并不是魔法。ret只是pop %ripjne不会推送返回地址,所以它不是函数调用,只是普通的分支。

  7. 顺便说一句,从函数中有两种退出方式被称为"尾部重复"优化。不是让一条路径跳转到另一条路径,它们都只是进行任何清理和ret。执行将通过其中一种方式进行,而不是两者都执行。

  8. "但是"?这只是从汇编+链接编译器生成的汇编中获得的内容。

  9. 符号命名的分支目标会被(在这种情况下由汇编器)替换为数值目标地址。 (实际上编码为相对位移,类似于jcc rel8。)汇编器能够在链接时之前执行此操作,因为跳转在与目标相同的文件中,而且它是相对的。

英文:

The label names like .L4 are auto-numbered by the compiler, every time it wants a branch target.

Clang numbers its labels by counting basic blocks (so the 4th basic block in the first function will have a label name like .LBB0_3), but I think GCC only increments its label counter when it emits the (first) jump instruction that jumps there.

That's why the labels themselves aren't in strictly increasing numerical order within a function, only overall within a file.

GCC never jumps across function boundaries to these internal labels.


.Lname labels are local labels that don't go into the symbol table of the object file / executable. That's why you don't see them in the debugger, just the function names.

> I asume its ret works for the whole routine, not the one jumped to,

Yes. ret isn't magic. ret is just pop %rip. jne doesn't push a return address so it's not a function call, just a normal branch.

BTW, having two ways out of a function is called "tail duplication" optimization. Instead of having one path jump to the other, they both just do whatever cleanup and ret. Execution will go through one or the other, not both.

> but I've also seen this code more like this when debugging the C routine with gdb:

"but"? That's just what you get from assembling + linking the compiler-generated asm.

The symbolic named branch target is replaced (by the assembler in this case) with numeric target addresses. (Actually encoded as relative displacements, like jcc rel8.) The assembler is able to do it without waiting for link-time because the jump is in the same file as the target, and because it's relative.

答案3

得分: -1

Sure, here are the translated code portions:

one:
    b .L77
    nop
    nop
.L77:
    b two
    nop
    nop
two:
    b .three
    nop
    nop
    nop
.three:
    nop
    nop
00000000 <one>:
   0:	ea000001 	b	<c>
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	ea000001 	b	18 <two>
  10:	e1a00000 	nop			; (mov r0, r0)
  14:	e1a00000 	nop			; (mov r0, r0)

00000018 <two>:
  18:	ea000002 	b	28 <.three>
  1c:	e1a00000 	nop			; (mov r0, r0)
  20:	e1a00000 	nop			; (mov r0, r0)
  24:	e1a00000 	nop			; (mov r0, r0)

00000028 <.three>:
  28:	e1a00000 	nop			; (mov r0, r0)
  2c:	e1a00000 	nop			; (mov r0, r0)

I've translated the code sections as requested.

英文:
one:
    b .L77
    nop
    nop
.L77:
    b two
    nop
    nop
two:
    b .three
    nop
    nop
    nop
.three:
    nop
    nop
    


Disassembly of section .text:

00000000 &lt;one&gt;:
   0:	ea000001 	b	c &lt;one+0xc&gt;
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	ea000001 	b	18 &lt;two&gt;
  10:	e1a00000 	nop			; (mov r0, r0)
  14:	e1a00000 	nop			; (mov r0, r0)

00000018 &lt;two&gt;:
  18:	ea000002 	b	28 &lt;.three&gt;
  1c:	e1a00000 	nop			; (mov r0, r0)
  20:	e1a00000 	nop			; (mov r0, r0)
  24:	e1a00000 	nop			; (mov r0, r0)

00000028 &lt;.three&gt;:
  28:	e1a00000 	nop			; (mov r0, r0)
  2c:	e1a00000 	nop			; (mov r0, r0)

the compiler generates assembly the assembly is fed to the assembler and turned into an object. The compiler will need to generate labels independent of the labels you created (function names, etc), so this particular one uses .Ln where n is a number in a way that it is unique within that assembly language program/module/file.

This assembler clearly keeps the other non .Ln labels in the binary/object but discards the .Ln labels. Then you use a separate tool a disassembler which chooses how it wants to represent the machine code. In this case we get an absolute address b c means b 0xC as well as a helper, 0xC is at the offset 0xC from one the nearest label. Clearly simply putting a dot in front of the label is not how to make it disappear.

but this

one:
    b .L77
    nop
    nop
.L77:
    b two
    nop
    nop
two:
    b .Lthree
    nop
    nop
    nop
.Lthree:
    nop
    nop
    

00000000 &lt;one&gt;:
   0:	ea000001 	b	c &lt;one+0xc&gt;
   4:	e1a00000 	nop			; (mov r0, r0)
   8:	e1a00000 	nop			; (mov r0, r0)
   c:	ea000001 	b	18 &lt;two&gt;
  10:	e1a00000 	nop			; (mov r0, r0)
  14:	e1a00000 	nop			; (mov r0, r0)

00000018 &lt;two&gt;:
  18:	ea000002 	b	28 &lt;two+0x10&gt;
  1c:	e1a00000 	nop			; (mov r0, r0)
  20:	e1a00000 	nop			; (mov r0, r0)
  24:	e1a00000 	nop			; (mov r0, r0)
  28:	e1a00000 	nop			; (mov r0, r0)
  2c:	e1a00000 	nop			; (mov r0, r0)

does make it disappear so one would assume that .Lx is a valid label name but the assembler does not put it in the symbol table of the output binary. The code is correct it just doesn't have all the labels the assembly language had which is fine, the machine code has no labels its only a human readable thing. This mechanism allows the toolchain to easily generate intermediate labels per file and not have to magically figure out how to avoid conflicts (wouldn't be possible).

This assembler (family, gnu assembler, gas) also has this feature which is not used by the compilers but by some lazy coders:

1:
    b 1f
    b 1b
    b 2f
1:
    nop
    nop
2:


00000000 &lt;.text&gt;:
   0:	ea000001 	b	c &lt;.text+0xc&gt;
   4:	eafffffd 	b	0 &lt;.text&gt;
   8:	ea000001 	b	14 &lt;.text+0x14&gt;
   c:	e1a00000 	nop			; (mov r0, r0)
  10:	e1a00000 	nop			; (mov r0, r0)

1f means label 1: forward in the code 1b means label 1 backward in the code (the first occurrence in that direction). you can use the same label name 1: or a small number of them 1: 2: 3: all through your code for the same purpose as .Lx but you don't even have to have unique labels. perhaps this works for something other than numbers I have not tried.

huangapple
  • 本文由 发表于 2020年1月3日 23:29:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581206.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定