英文:
Can gcc omit reserving data on the stack?
问题
I'm using gcc 12.2.0 on x86_64 and compiling x64 code on there. I've run into an odd issue that is causing me problems and have reduced it down to a minimal reproducer:
#include <stdint.h>
#include <stdbool.h>
struct foobar_t {
uint8_t data[512];
};
void my_memset(void *target) {
#if 1
for (int i = 0; i < 256; i++) {
((uint16_t*)target)[i] = 0xabcd;
}
#else
for (int i = 0; i < 512; i++) {
((uint8_t*)target)[i] = 0xab;
}
#endif
}
int main() {
struct foobar_t foobar;
my_memset(&foobar);
if (foobar.data[123] == 0) {
volatile int x = 0;
}
return 0;
}
When the #if 1
path is taken, I get a compiler warning:
$ gcc -O3 -fno-stack-protector -Wall -c -o x.o x.c
[...]
x.c:46:24: warning: ‘foobar’ is used uninitialized [-Wuninitialized]
46 | if (foobar.data[123] == 0) {
That error completely disappears when I use the second code path (#if 0
) where the only difference is that in the first there's 256 16-bit words set while in the second there are 512 bytes set.
In the case that I get the warning, the generated assembly also looks wrong:
0000000000000000 <my_memset>:
0: f3 0f 1e fa endbr64
4: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # c <my_memset+0xc>
c: 48 8d 87 00 02 00 00 lea 0x200(%rdi),%rax
13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
18: 0f 11 07 movups %xmm0,(%rdi)
1b: 48 83 c7 10 add $0x10,%rdi
1f: 48 39 f8 cmp %rdi,%rax
22: 75 f4 jne 18 <my_memset+0x18>
24: c3 ret
0000000000000030 <main>:
30: f3 0f 1e fa endbr64
34: 48 81 ec a0 01 00 00 sub $0x1a0,%rsp
3b: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # 43 <main+0x13>
43: 48 8d 44 24 98 lea -0x68(%rsp),%rax
48: 48 8d 94 24 98 01 00 lea 0x198(%rsp),%rdx
50: 0f 29 00 movaps %xmm0,(%rax)
53: 48 83 c0 10 add $0x10,%rax
57: 48 39 c2 cmp %rax,%rdx
5a: 75 f4 jne 50 <main+0x20>
5c: 80 7c 24 13 00 cmpb $0x0,0x13(%rsp)
61: 75 08 jne 6b <main+0x3b>
63: c7 44 24 94 00 00 00 movl $0x0,-0x6c(%rsp)
6b: 31 c0 xor %eax,%eax
6d: 48 81 c4 a0 01 00 00 add $0x1a0,%rsp
74: c3 ret
This only reserves 0x1a0 bytes on the stack, 416 bytes. That does not fit the structure! How can that be? What is the reason for this happening?
I've tried removing as much code as possible while still retaining the warning. If I disable optimization, the warning also goes away.
英文:
I'm using gcc 12.2.0 on x86_64 and compiling x64 code on there. I've run into an odd issue that is causing me problems and have reduced it down to a minimal reproducer:
#include <stdint.h>
#include <stdbool.h>
struct foobar_t {
uint8_t data[512];
};
void my_memset(void *target) {
#if 1
for (int i = 0; i < 256; i++) {
((uint16_t*)target)[i] = 0xabcd;
}
#else
for (int i = 0; i < 512; i++) {
((uint8_t*)target)[i] = 0xab;
}
#endif
}
int main() {
struct foobar_t foobar;
my_memset(&foobar);
if (foobar.data[123] == 0) {
volatile int x = 0;
}
return 0;
}
When the #if 1
path is taken, I get a compiler warning:
$ gcc -O3 -fno-stack-protector -Wall -c -o x.o x.c
[...]
x.c:46:24: warning: ‘foobar’ is used uninitialized [-Wuninitialized]
46 | if (foobar.data[123] == 0) {
That error completely disappears when I use the second code path (#if 0
) where the only difference is that in the first there's 256 16-bit words set while in the second there are 512 bytes set.
In the case that I get the warning, the generated assembly also looks wrong:
0000000000000000 <my_memset>:
0: f3 0f 1e fa endbr64
4: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # c <my_memset+0xc>
c: 48 8d 87 00 02 00 00 lea 0x200(%rdi),%rax
13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
18: 0f 11 07 movups %xmm0,(%rdi)
1b: 48 83 c7 10 add $0x10,%rdi
1f: 48 39 f8 cmp %rdi,%rax
22: 75 f4 jne 18 <my_memset+0x18>
24: c3 ret
0000000000000030 <main>:
30: f3 0f 1e fa endbr64
34: 48 81 ec a0 01 00 00 sub $0x1a0,%rsp
3b: 66 0f 6f 05 00 00 00 movdqa 0x0(%rip),%xmm0 # 43 <main+0x13>
43: 48 8d 44 24 98 lea -0x68(%rsp),%rax
48: 48 8d 94 24 98 01 00 lea 0x198(%rsp),%rdx
50: 0f 29 00 movaps %xmm0,(%rax)
53: 48 83 c0 10 add $0x10,%rax
57: 48 39 c2 cmp %rax,%rdx
5a: 75 f4 jne 50 <main+0x20>
5c: 80 7c 24 13 00 cmpb $0x0,0x13(%rsp)
61: 75 08 jne 6b <main+0x3b>
63: c7 44 24 94 00 00 00 movl $0x0,-0x6c(%rsp)
6b: 31 c0 xor %eax,%eax
6d: 48 81 c4 a0 01 00 00 add $0x1a0,%rsp
74: c3 ret
This only reserves 0x1a0 bytes on the stack, 416 bytes. That does not fit the structure! How can that be? What is the reason for this happening?
I've tried removing as much code as possible while still retaining the warning. If I disable optimization, the warning also goes away.
答案1
得分: 0
你的 #if 1
代码是非法的(未定义行为),因为它违反了严格别名规则。 粗略地说,在某些狭窄的例外情况下,你不能通过两种不同类型的指针访问同一内存。
因此,编译器有权假定通过一个指针类型对内存的访问不会被另一种指针类型的访问看到。 因此,它会认为 foobar
未初始化,因为它不考虑 uint16_t
对象可能会触及它的情况。
标准中有一个关于字符类型的例外情况,正好可以用字符指针来实现 memset
和 memcpy
这样的操作。 因此,你的 #else
代码是合法的,实际上编译器能够识别 my_memset
代码确实初始化了 foobar
,所以你不会收到警告。 (严格来说,你的代码应该使用 unsigned char
而不是 uint8_t
- 在大多数编译器上它们被 typedef
为相同的类型,但是语言标准不保证这种情况。)
关于 "堆栈不足" 的问题,实际上是正常的,不是问题。 对象 foobar
位于堆栈上,偏移量从 rsp-0x68
到 rsp+0x198
,恰好是 512 字节,就像应该的一样。 看起来它的一部分在堆栈指针下面可能看起来奇怪,但这是可以的,因为它在 128 字节的红区内。
红区只能在叶子函数中使用(即不调用其他函数的函数),所以只有在 main
中调用 my_memset
被内联时才能使用。 当关闭优化时,这不会发生,因此在这种情况下看不到红区被使用。
在这个示例中,使用红区并没有太多好处。 主要的好处是在使用红区的函数中,通过使用红区,你可以完全避免调整堆栈指针。 在这里,堆栈指针仍然必须进行调整,所以与从堆栈指针中减去完整的 512 字节的更自然实现相比,我们并没有获得任何好处。 但是使用红区的代码在性能上仍然是有效的,并且在性能上是等效的,只是看起来有点奇怪。 这只是编译器堆栈布局算法的一个稍微奇怪的特性。
英文:
Your #if 1
code is illegal (undefined behavior) because it violates the strict aliasing rule. Very roughly speaking, subject to certain narrow exceptions, you must not access the same memory through pointers to two different types.
As such, the compiler is entitled to assume that accesses to memory through one pointer type aren't seen by accesses through another pointer type. So it's not surprising that it would think that foobar
is uninitialized, since it doesn't consider the possibility that an access to a uint16_t
object could touch it.
There is an exception in the standard for character types, precisely so that you can implement things like memset
and memcpy
using character pointers. So your #else
code is legal, and in fact the compiler is able to recognize that the my_memset
code does initialize foobar
, and so you don't get the warning. (Strictly speaking your code ought to use unsigned char
instead of uint8_t
- they are typedef'd the same on most compilers, but the language standard does not guarantee that to be the case.)
The thing about "insufficient stack" is actually normal and not a problem. The object foobar
is located on the stack from offset rsp-0x68
to rsp+0x198
which is precisely 512 bytes, just as it should be. It may look strange that part of it is below the stack pointer, but this is okay because it is within the 128-byte red zone.
The red zone is only usable in leaf functions (i.e. those which don't call other functions), so it can only be used in main
if the call to my_memset
is inlined. This isn't done when optimizations are off, so you don't see the red zone used in that case.
Using the red zone doesn't really accomplish much in this example. The main benefit is in functions where, by using the red zone, you avoid having to adjust the stack pointer at all. Here, the stack pointer would have to be adjusted anyway, so we haven't gained anything in comparison to the more natural implementation of subtracting a full 512 bytes from the stack pointer. But the code with the red zone is still perfectly valid and equivalent in terms of performance, it just looks funny. So this is just a slightly odd quirk of the compiler's stack layout algorithm.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论