使用联合和位字段进行类型转换 uint64_t 时结果不一致。

huangapple go评论112阅读模式
英文:

Inconsistent results when type punning uint64_t with union and bit-field

问题

以下是翻译好的部分:

我在union中使用匿名的struct,如下所示:

  1. using time64_t = uint64_t;
  2. using bucket_t = uint64_t;
  3. union clock_test {
  4. time64_t _time64;
  5. struct {
  6. bucket_t _bucket5 : 10; // 桶:5 1024
  7. bucket_t _bucket4 : 8; // 桶:4 256
  8. bucket_t _bucket3 : 6; // 桶:3 64
  9. bucket_t _bucket2 : 6; // 桶:2 64
  10. bucket_t _bucket1 : 6; // 桶:1 64
  11. bucket_t _bucket0 : 6; // 桶:0 64
  12. };
  13. };

如果bucket_t = uint64_t,它的工作正常,但是如果using bucket_t = uint16_tuint32_t,我会得到令人困惑的结果。

我对所有情况使用相同的测试代码:

  1. clock_test clk;
  2. clk._time64 = 168839113046;

对于bucket_t = uint64_tclk是:

  1. _bucket5 342 // unsigned __int64
  2. _bucket4 26 // unsigned __int64
  3. _bucket3 38 // unsigned __int64
  4. _bucket2 15 // unsigned __int64
  5. _bucket1 29 // unsigned __int64
  6. _bucket0 2 // unsigned __int64

对于bucket_t = uint32_tclk是:

  1. _bucket 342 // unsigned int
  2. _bucket4 26 // unsigned int
  3. _bucket3 38 // unsigned int
  4. _bucket2 15 // unsigned int
  5. _bucket1 39 // unsigned int
  6. _bucket0 0 // unsigned int

对于bucket_t = uint16_tclk是:

  1. _bucket5 342 // unsigned short
  2. _bucket4 152 // unsigned short
  3. _bucket3 15 // unsigned short
  4. _bucket2 39 // unsigned short
  5. _bucket1 0 // unsigned short
  6. _bucket0 0 // unsigned short

如果需要更多信息,请查看上述的代码和结果。

英文:

I am using an anonymous struct in union as follows:

  1. using time64_t = uint64_t;
  2. using bucket_t = uint64_t;
  3. union clock_test {
  4. time64_t _time64;
  5. struct {
  6. bucket_t _bucket5 : 10; // bucket:5 1024
  7. bucket_t _bucket4 : 8; // bucket:4 256
  8. bucket_t _bucket3 : 6; // bucket:3 64
  9. bucket_t _bucket2 : 6; // bucket:2 64
  10. bucket_t _bucket1 : 6; // bucket:1 64
  11. bucket_t _bucket0 : 6; // bucket:0 64
  12. };
  13. };

If bucket_t = uint64_t, it works as expected, but with using bucket_t = uint16_t or uint32_t, I get puzzling results.

I use the same test code for all cases:

  1. clock_test clk;
  2. clk._time64 = 168839113046;

For bucket_t = uint64_t, clk is:

  1. _bucket5 342 // unsigned __int64
  2. _bucket4 26 // unsigned __int64
  3. _bucket3 38 // unsigned __int64
  4. _bucket2 15 // unsigned __int64
  5. _bucket1 29 // unsigned __int64
  6. _bucket0 2 // unsigned __int64

For bucket_t = uint32_t, clk is:

  1. _bucket 342 // unsigned int
  2. _bucket4 26 // unsigned int
  3. _bucket3 38 // unsigned int
  4. _bucket2 15 // unsigned int
  5. _bucket1 39 // unsigned int
  6. _bucket0 0 // unsigned int

For bucket_t = uint16_t, clk is:

  1. _bucket5 342 // unsigned short
  2. _bucket4 152 // unsigned short
  3. _bucket3 15 // unsigned short
  4. _bucket2 39 // unsigned short
  5. _bucket1 0 // unsigned short
  6. _bucket0 0 // unsigned short

...
use vscode + clang, See this issue clearly
使用联合和位字段进行类型转换 uint64_t 时结果不一致。

答案1

得分: 3

由于位字段成员通常不会被紧密排列,所以您得到不一致结果的原因在于这一点。
成员的类型很重要,可能会影响填充:

  1. // 二进制中的 168839113046
  2. // 假设 type punned with bucket_t = unsigned short (假设 16 位)
  3. 00100111 01001111 10011000 01101001 01010110
  4. | | | | | | | |
  5. | | | | | | 01 01010110 // _bucket5 = 342
  6. | | | | | | ###### // 填充至 16 位边界
  7. | | | | 10011000 // _bucket4 = 152
  8. | | 001111 // _bucket3 = 15
  9. | | ## // 填充至 16 位边界
  10. 100111 // _bucket2 = 39
  11. ... // _bucket1 = 0
  12. // _bucket0 = 0

每当编译器无法将另一个位字段成员放入相同的 16 位对象中时,它会插入填充并将其放入下一个对象中。
这会改变您读取的值,因为您在不同位置读取位。如果您的位字段成员都具有 64 位类型,这种情况就不会发生。

非标准和未定义行为

尽管如此,您的代码并不符合有效的 C++ 标准。

  1. 匿名结构体不是标准的 C++;它们只能工作是因为有 GCC 编译器扩展支持
  2. 位字段成员的布局和对齐完全是实现定义的,因此在不同编译器下可能不会得到相同的结果
  3. 像这样使用 union 进行类型转换是未定义行为;您只能访问联合的活动成员,有一些例外情况

为了获得一致的结果,使用位移和位掩码:

  1. time64_t data = 168839113046;
  2. (data >> 0) & ((1u << 10) - 1) // = 342
  3. (data >> 10) & ((1u << 8) - 1) // = 26
  4. (data >> 18) & ((1u << 6) - 1) // = 38
  5. // ...

这将在所有地方都为您提供一致的结果。它通过将 data 向右移动,然后使用位与运算符来屏蔽最低的 N 位。

英文:

The reason why you get inconsistent results is that bit-field members are normally not packed.
The type of the member matters, and may impact padding:

  1. // 168839113046 in binary
  2. // type punned with bucket_t = unsigned short (assuming 16-bit)
  3. 00100111 01001111 10011000 01101001 01010110
  4. | | | | | | | |
  5. | | | | | | 01 01010110 // _bucket5 = 342
  6. | | | | | | ###### // padding to 16-bit bounds
  7. | | | | 10011000 // _bucket4 = 152
  8. | | 001111 // _bucket3 = 15
  9. | | ## // padding to 16-bit bounds
  10. 100111 // _bucket2 = 39
  11. ... // _bucket1 = 0
  12. // _bucket0 = 0

Whenever the compiler can't fit another bit-field member into the same 16-bit object, it inserts padding and puts it into the next one.
This changes the values you read, because you're reading bits at different positions. If your bit-field members all had a 64-bit type, this wouldn't happen.

Non-Standard and Undefined Behavior

That being said, your code is just not valid C++.

  1. anonymous structs are not standard C++; they only work because of a GCC compiler extension
  2. the layout and alignment of bit-field members is completely implementation-defined, so you might not get the same results with different compilers
  3. using union for type punning like this is undefined behavior; you can only access the active member of the union, with some exceptions

To get consistent results, use shifts and masks:

  1. time64_t data = 168839113046;
  2. (data &gt;&gt; 0) &amp; ((1u &lt;&lt; 10) - 1) // = 342
  3. (data &gt;&gt; 10) &amp; ((1u &lt;&lt; 8) - 1) // = 26
  4. (data &gt;&gt; 18) &amp; ((1u &lt;&lt; 6) - 1) // = 38
  5. // ...

This will give you consistent results everywhere. It works by shifting the data to the right, and then using the bitwise AND operator to mask out the lowest N bits.

huangapple
  • 本文由 发表于 2023年7月3日 22:00:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76605488.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定