能让gcc省略在堆栈上保留数据吗?

huangapple go评论59阅读模式
英文:

Can gcc omit reserving data on the stack?

问题

I'm using gcc 12.2.0 on x86_64 and compiling x64 code on there. I've run into an odd issue that is causing me problems and have reduced it down to a minimal reproducer:

#include <stdint.h>
#include <stdbool.h>

struct foobar_t {
    uint8_t data[512];
};

void my_memset(void *target) {
#if 1
    for (int i = 0; i < 256; i++) {
        ((uint16_t*)target)[i] = 0xabcd;
    }
#else
    for (int i = 0; i < 512; i++) {
        ((uint8_t*)target)[i] = 0xab;
    }
#endif
}

int main() {
    struct foobar_t foobar;
    my_memset(&foobar);
    if (foobar.data[123] == 0) {
        volatile int x = 0;
    }
    return 0;
}

When the #if 1 path is taken, I get a compiler warning:

$ gcc -O3 -fno-stack-protector -Wall -c -o x.o x.c
[...]
x.c:46:24: warning: foobar is used uninitialized [-Wuninitialized]
   46 |         if (foobar.data[123] == 0) {

That error completely disappears when I use the second code path (#if 0) where the only difference is that in the first there's 256 16-bit words set while in the second there are 512 bytes set.

In the case that I get the warning, the generated assembly also looks wrong:

0000000000000000 <my_memset>:
   0:  f3 0f 1e fa           endbr64
   4:  66 0f 6f 05 00 00 00  movdqa 0x0(%rip),%xmm0        # c <my_memset+0xc>
   c:  48 8d 87 00 02 00 00  lea    0x200(%rdi),%rax
  13:  0f 1f 44 00 00        nopl   0x0(%rax,%rax,1)
  18:  0f 11 07              movups %xmm0,(%rdi)
  1b:  48 83 c7 10           add    $0x10,%rdi
  1f:  48 39 f8              cmp    %rdi,%rax
  22:  75 f4                 jne    18 <my_memset+0x18>
  24:  c3                    ret

0000000000000030 <main>:
  30:  f3 0f 1e fa           endbr64
  34:  48 81 ec a0 01 00 00  sub    $0x1a0,%rsp
  3b:  66 0f 6f 05 00 00 00  movdqa 0x0(%rip),%xmm0        # 43 <main+0x13>
  43:  48 8d 44 24 98        lea    -0x68(%rsp),%rax
  48:  48 8d 94 24 98 01 00  lea    0x198(%rsp),%rdx
  50:  0f 29 00              movaps %xmm0,(%rax)
  53:  48 83 c0 10           add    $0x10,%rax
  57:  48 39 c2              cmp    %rax,%rdx
  5a:  75 f4                 jne    50 <main+0x20>
  5c:  80 7c 24 13 00        cmpb   $0x0,0x13(%rsp)
  61:  75 08                 jne    6b <main+0x3b>
  63:  c7 44 24 94 00 00 00  movl   $0x0,-0x6c(%rsp)
  6b:  31 c0                 xor    %eax,%eax
  6d:  48 81 c4 a0 01 00 00  add    $0x1a0,%rsp
  74:  c3                    ret

This only reserves 0x1a0 bytes on the stack, 416 bytes. That does not fit the structure! How can that be? What is the reason for this happening?

I've tried removing as much code as possible while still retaining the warning. If I disable optimization, the warning also goes away.

英文:

I'm using gcc 12.2.0 on x86_64 and compiling x64 code on there. I've run into an odd issue that is causing me problems and have reduced it down to a minimal reproducer:

#include &lt;stdint.h&gt;
#include &lt;stdbool.h&gt;

struct foobar_t {
	uint8_t data[512];
};

void my_memset(void *target) {
#if 1
	for (int i = 0; i &lt; 256; i++) {
		((uint16_t*)target)[i] = 0xabcd;
	}
#else
	for (int i = 0; i &lt; 512; i++) {
		((uint8_t*)target)[i] = 0xab;
	}
#endif
}

int main() {
	struct foobar_t foobar;
	my_memset(&amp;foobar);
	if (foobar.data[123] == 0) {
		volatile int x = 0;
	}
	return 0;
}

When the #if 1 path is taken, I get a compiler warning:

$ gcc -O3 -fno-stack-protector -Wall -c -o x.o x.c
[...]
x.c:46:24: warning: ‘foobar’ is used uninitialized [-Wuninitialized]
   46 |         if (foobar.data[123] == 0) {

That error completely disappears when I use the second code path (#if 0) where the only difference is that in the first there's 256 16-bit words set while in the second there are 512 bytes set.

In the case that I get the warning, the generated assembly also looks wrong:

0000000000000000 &lt;my_memset&gt;:
   0:	f3 0f 1e fa          	endbr64
   4:	66 0f 6f 05 00 00 00 	movdqa 0x0(%rip),%xmm0        # c &lt;my_memset+0xc&gt;
   c:	48 8d 87 00 02 00 00 	lea    0x200(%rdi),%rax
  13:	0f 1f 44 00 00       	nopl   0x0(%rax,%rax,1)
  18:	0f 11 07             	movups %xmm0,(%rdi)
  1b:	48 83 c7 10          	add    $0x10,%rdi
  1f:	48 39 f8             	cmp    %rdi,%rax
  22:	75 f4                	jne    18 &lt;my_memset+0x18&gt;
  24:	c3                   	ret


0000000000000030 &lt;main&gt;:
  30:	f3 0f 1e fa          	endbr64
  34:	48 81 ec a0 01 00 00 	sub    $0x1a0,%rsp
  3b:	66 0f 6f 05 00 00 00 	movdqa 0x0(%rip),%xmm0        # 43 &lt;main+0x13&gt;
  43:	48 8d 44 24 98       	lea    -0x68(%rsp),%rax
  48:	48 8d 94 24 98 01 00 	lea    0x198(%rsp),%rdx
  50:	0f 29 00             	movaps %xmm0,(%rax)
  53:	48 83 c0 10          	add    $0x10,%rax
  57:	48 39 c2             	cmp    %rax,%rdx
  5a:	75 f4                	jne    50 &lt;main+0x20&gt;
  5c:	80 7c 24 13 00       	cmpb   $0x0,0x13(%rsp)
  61:	75 08                	jne    6b &lt;main+0x3b&gt;
  63:	c7 44 24 94 00 00 00 	movl   $0x0,-0x6c(%rsp)
  6b:	31 c0                	xor    %eax,%eax
  6d:	48 81 c4 a0 01 00 00 	add    $0x1a0,%rsp
  74:	c3                   	ret

This only reserves 0x1a0 bytes on the stack, 416 bytes. That does not fit the structure! How can that be? What is the reason for this happening?

I've tried removing as much code as possible while still retaining the warning. If I disable optimization, the warning also goes away.

答案1

得分: 0

你的 #if 1 代码是非法的(未定义行为),因为它违反了严格别名规则。 粗略地说,在某些狭窄的例外情况下,你不能通过两种不同类型的指针访问同一内存。

因此,编译器有权假定通过一个指针类型对内存的访问不会被另一种指针类型的访问看到。 因此,它会认为 foobar 未初始化,因为它不考虑 uint16_t 对象可能会触及它的情况。

标准中有一个关于字符类型的例外情况,正好可以用字符指针来实现 memsetmemcpy 这样的操作。 因此,你的 #else 代码是合法的,实际上编译器能够识别 my_memset 代码确实初始化了 foobar,所以你不会收到警告。 (严格来说,你的代码应该使用 unsigned char 而不是 uint8_t - 在大多数编译器上它们被 typedef 为相同的类型,但是语言标准不保证这种情况。)

关于 "堆栈不足" 的问题,实际上是正常的,不是问题。 对象 foobar 位于堆栈上,偏移量从 rsp-0x68rsp+0x198,恰好是 512 字节,就像应该的一样。 看起来它的一部分在堆栈指针下面可能看起来奇怪,但这是可以的,因为它在 128 字节的红区内。

红区只能在叶子函数中使用(即不调用其他函数的函数),所以只有在 main 中调用 my_memset 被内联时才能使用。 当关闭优化时,这不会发生,因此在这种情况下看不到红区被使用。

在这个示例中,使用红区并没有太多好处。 主要的好处是在使用红区的函数中,通过使用红区,你可以完全避免调整堆栈指针。 在这里,堆栈指针仍然必须进行调整,所以与从堆栈指针中减去完整的 512 字节的更自然实现相比,我们并没有获得任何好处。 但是使用红区的代码在性能上仍然是有效的,并且在性能上是等效的,只是看起来有点奇怪。 这只是编译器堆栈布局算法的一个稍微奇怪的特性。

英文:

Your #if 1 code is illegal (undefined behavior) because it violates the strict aliasing rule. Very roughly speaking, subject to certain narrow exceptions, you must not access the same memory through pointers to two different types.

As such, the compiler is entitled to assume that accesses to memory through one pointer type aren't seen by accesses through another pointer type. So it's not surprising that it would think that foobar is uninitialized, since it doesn't consider the possibility that an access to a uint16_t object could touch it.

There is an exception in the standard for character types, precisely so that you can implement things like memset and memcpy using character pointers. So your #else code is legal, and in fact the compiler is able to recognize that the my_memset code does initialize foobar, and so you don't get the warning. (Strictly speaking your code ought to use unsigned char instead of uint8_t - they are typedef'd the same on most compilers, but the language standard does not guarantee that to be the case.)


The thing about "insufficient stack" is actually normal and not a problem. The object foobar is located on the stack from offset rsp-0x68 to rsp+0x198 which is precisely 512 bytes, just as it should be. It may look strange that part of it is below the stack pointer, but this is okay because it is within the 128-byte red zone.

The red zone is only usable in leaf functions (i.e. those which don't call other functions), so it can only be used in main if the call to my_memset is inlined. This isn't done when optimizations are off, so you don't see the red zone used in that case.

Using the red zone doesn't really accomplish much in this example. The main benefit is in functions where, by using the red zone, you avoid having to adjust the stack pointer at all. Here, the stack pointer would have to be adjusted anyway, so we haven't gained anything in comparison to the more natural implementation of subtracting a full 512 bytes from the stack pointer. But the code with the red zone is still perfectly valid and equivalent in terms of performance, it just looks funny. So this is just a slightly odd quirk of the compiler's stack layout algorithm.

huangapple
  • 本文由 发表于 2023年6月8日 23:03:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76433221.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定