英文:
How to understand the following paragraph
问题
我在一篇博客上读到了这个信息:
违反类型规则:将int强制转换为float并对其进行解引用(将"int"访问为"float")是未定义行为。C语言要求这种类型转换通过memcpy进行(使用指针强制转换是不正确的,会导致未定义行为)。关于这一点的规则非常微妙,我不打算在这里详细说明(char*有一个例外,向量具有特殊属性,联合体会改变事情等等)。这种行为启用了一种称为"基于类型的别名分析"(TBAA)的分析,它被广泛用于编译器中的各种内存访问优化,可以显著提高生成的代码性能。例如,这个规则允许clang优化这个函数:
如何使用memcpy函数进行类型强制转换?还有关于char*的例外是什么?
我不明白如何使用memcpy函数进行类型强制转换?
英文:
I read this on a blog:
> Violating Type Rules: It is undefined behavior to cast an int* to a float* and dereference it (accessing the "int" as if it were a "float"). C requires that these sorts of type conversions happen through memcpy: using pointer casts is not correct and undefined behavior results. The rules for this are quite nuanced and I don't want to go into the details here (there is an exception for char*, vectors have special properties, unions change things, etc). This behavior enables an analysis known as "Type-Based Alias Analysis" (TBAA) which is used by a broad range of memory access optimizations in the compiler, and can significantly improve performance of the generated code. For example, this rule allows clang to optimize this function:
How can you use the memcpy function for type coercion? And what about the exception to char*?
I don't understand how to use the memcpy function for type coercion?
答案1
得分: 4
以下是翻译好的部分:
Suppose you have the float
value 1.25. And suppose you want to confirm that its actual IEEE-754 representation in hexadecimal is 3fa00000
. There are at least four different ways you might try to do this:
(1) Take a float
pointer and cast it to an integer pointer, and indirect on it:
(1) 取一个 float
指针并将其强制转换为整数指针,然后间接访问它:
float f = 1.25;
printf("%08x\n", *(uint32_t *)&f);
(This fragment quietly assumes 32-bit int
. For better portability, you could use printf("%08" PRIx32 "\n", *(uint32_t *)&f);
.)
(此代码片段假定使用了32位的 int
。为了更好的可移植性,你可以使用 printf("%08" PRIx32 "\n", *(uint32_t *)&f);
。)
(2) Use a union:
(2) 使用联合:
union {float f; uint32_t i;} u;
u.f = f;
printf("%08x\n", u.i);
(3) Use a char
pointer, and iterate/index:
(3) 使用 char
指针,并进行迭代/索引:
unsigned char *p = (unsigned char *)&f;
for(int i = 3; i >= 0; i--) printf("%02x", p[i]);
(Note that this code fragment assumes little-endian.)
(请注意,此代码片段假定使用 little-endian。)
(4) Use memcpy
:
(4) 使用 memcpy
:
uint32_t x;
memcpy(&x, &f, 4);
printf("%08x\n", x);
Now, the take-home lesson is that not all of these methods work reliably any more, because of the strict aliasing rule.
(现在,需要记住的教训是,并非所有这些方法都能可靠地工作,因为有了严格别名规则的存在。)
In particular, method (1) is flatly illegal. It's a textbook example of what the strict aliasing rule disallows.
(特别是,方法(1)是明显非法的。它是严格别名规则明令禁止的典型示例。)
I think you're still allowed to use a union as in method 2, but you may have to put on a language lawyer hat to convince yourself of it. (See also the comments on this answer below.)
(我认为你仍然被允许像方法2一样使用联合,但你可能需要戴上“语言律师”的帽子来说服自己。)
Methods (3) and (4), however, continue to work, because they take advantage of an explicit exception to the strict aliasing rule, namely that you are allowed to access the bits of an object using a punned pointer of the "wrong" type, as long as the "wrong type" is specifically a character pointer.
(然而,方法(3)和(4)继续工作,因为它们利用了严格别名规则的明确例外,即只要“错误类型”明确为字符指针,就可以使用“错误类型”的指针访问对象的位。)
So I think this is clear, but in answer to your specific questions:
(所以我认为这是清楚的,但回答你的具体问题:)
How can you use the
memcpy
function for type coercion?
如何使用
memcpy
函数进行类型强制转换?
As in method (4).
(就像方法(4)中那样。)
And what about the exception to
char *
?
以及关于对
char *
的例外情况呢?
That's the explicit exception in the strict aliasing rule that allows method (3) to work.
(这是严格别名规则中的明确例外,允许方法(3)起作用。)
The rules, by the way, are significantly different here in C than in C++. Strictly speaking, I believe, in C++ not even method (3) is legal, and the only way you're allowed to do this sort of thing any more is with method (4) and an implicit call to memcpy
. (However, I'm told that optimizing compilers tend to treat calls to memcpy
very specially these days, not only replacing explicit function calls with inline register moves, but sometimes even optimizing out the copy altogether, and doing something like method 1 or 2 internally, if they know they can get away with it.)
(顺便说一下,这里的规则在C和C++中有很大的不同。严格来说,我相信,在C++中,甚至方法(3)都是不合法的,而你现在唯一被允许做这种事情的方法是使用方法(4)和隐式调用 memcpy
。不过,我听说优化编译器现在倾向于非常特别地处理对 memcpy
的调用,不仅会用内联寄存器移动替换显式的函数调用,有时甚至会完全优化掉复制,内部执行类似方法1或方法2的操作,如果他们知道可以这样做的话。)
英文:
Suppose you have the float
value 1.25. And suppose you want to confirm that its actual IEEE-754 representation in hexadecimal is 3fa00000
. There are at least four different ways you might try to do this:
(1) Take a float
pointer and cast it to an integer pointer, and indirect on it:
float f = 1.25;
printf("%08x\n", *(uint32_t *)&f);
(This fragment quietly assumes 32-bit int
. For better portability, you could use printf("%08" PRIx32 "\n", *(uint32_t *)&f);
.)
(2) Use a union:
union {float f; uint32_t i;} u;
u.f = f;
printf("%08x\n", u.i);
(3) Use a char
pointer, and iterate/index:
unsigned char *p = (unsigned char *)&f;
for(int i = 3; i >= 0; i--) printf("%02x", p[i]);
(Note that this code fragment assumes little-endian.)
(4) Use memcpy
:
uint32_t x;
memcpy(&x, &f, 4);
printf("%08x\n", x);
Now, the take-home lesson is that not all of these methods work reliably any more, because of the strict aliasing rule.
In particular, method (1) is flatly illegal. It's a textbook example of what the strict aliasing rule disallows.
I think you're still allowed to use a union as in method 2, but you may have to put on a language lawyer hat to convince yourself of it. (See also the comments on this answer below.)
Methods (3) and (4), however, continue to work, because they take advantage of an explicit exception to the strict aliasing rule, namely that you are allowed to access the bits of an object using a punned pointer of the "wrong" type, as long as the "wrong type" is specifically a character pointer.
So I think this is clear, but in answer to your specific questions:
> How can you use the memcpy
function for type coercion?
As in method (4).
> And what about the exception to char *
?
That's the explicit exception in the strict aliasing rule that allows method (3) to work.
The rules, by the way, are significantly different here in C than in C++. Strictly speaking, I believe, in C++ not even method (3) is legal, and the only way you're allowed to do this sort of thing any more is with method (4) and an implicit call to memcpy
. (However, I'm told that optimizing compilers tend to treat calls to memcpy
very specially these days, not only replacing explicit function calls with inline register moves, but sometimes even optimizing out the copy altogether, and doing something like method 1 or 2 internally, if they know they can get away with it.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论