英文:
Inconsistent results when type punning uint64_t with union and bit-field
问题
以下是翻译好的部分:
我在union
中使用匿名的struct
,如下所示:
using time64_t = uint64_t;
using bucket_t = uint64_t;
union clock_test {
time64_t _time64;
struct {
bucket_t _bucket5 : 10; // 桶:5 1024
bucket_t _bucket4 : 8; // 桶:4 256
bucket_t _bucket3 : 6; // 桶:3 64
bucket_t _bucket2 : 6; // 桶:2 64
bucket_t _bucket1 : 6; // 桶:1 64
bucket_t _bucket0 : 6; // 桶:0 64
};
};
如果bucket_t = uint64_t
,它的工作正常,但是如果using bucket_t = uint16_t
或uint32_t
,我会得到令人困惑的结果。
我对所有情况使用相同的测试代码:
clock_test clk;
clk._time64 = 168839113046;
对于bucket_t = uint64_t
,clk
是:
_bucket5 342 // unsigned __int64
_bucket4 26 // unsigned __int64
_bucket3 38 // unsigned __int64
_bucket2 15 // unsigned __int64
_bucket1 29 // unsigned __int64
_bucket0 2 // unsigned __int64
对于bucket_t = uint32_t
,clk
是:
_bucket 342 // unsigned int
_bucket4 26 // unsigned int
_bucket3 38 // unsigned int
_bucket2 15 // unsigned int
_bucket1 39 // unsigned int
_bucket0 0 // unsigned int
对于bucket_t = uint16_t
,clk
是:
_bucket5 342 // unsigned short
_bucket4 152 // unsigned short
_bucket3 15 // unsigned short
_bucket2 39 // unsigned short
_bucket1 0 // unsigned short
_bucket0 0 // unsigned short
如果需要更多信息,请查看上述的代码和结果。
英文:
I am using an anonymous struct
in union
as follows:
using time64_t = uint64_t;
using bucket_t = uint64_t;
union clock_test {
time64_t _time64;
struct {
bucket_t _bucket5 : 10; // bucket:5 1024
bucket_t _bucket4 : 8; // bucket:4 256
bucket_t _bucket3 : 6; // bucket:3 64
bucket_t _bucket2 : 6; // bucket:2 64
bucket_t _bucket1 : 6; // bucket:1 64
bucket_t _bucket0 : 6; // bucket:0 64
};
};
If bucket_t = uint64_t
, it works as expected, but with using bucket_t = uint16_t
or uint32_t
, I get puzzling results.
I use the same test code for all cases:
clock_test clk;
clk._time64 = 168839113046;
For bucket_t = uint64_t
, clk
is:
_bucket5 342 // unsigned __int64
_bucket4 26 // unsigned __int64
_bucket3 38 // unsigned __int64
_bucket2 15 // unsigned __int64
_bucket1 29 // unsigned __int64
_bucket0 2 // unsigned __int64
For bucket_t = uint32_t
, clk
is:
_bucket 342 // unsigned int
_bucket4 26 // unsigned int
_bucket3 38 // unsigned int
_bucket2 15 // unsigned int
_bucket1 39 // unsigned int
_bucket0 0 // unsigned int
For bucket_t = uint16_t
, clk
is:
_bucket5 342 // unsigned short
_bucket4 152 // unsigned short
_bucket3 15 // unsigned short
_bucket2 39 // unsigned short
_bucket1 0 // unsigned short
_bucket0 0 // unsigned short
答案1
得分: 3
由于位字段成员通常不会被紧密排列,所以您得到不一致结果的原因在于这一点。
成员的类型很重要,可能会影响填充:
// 二进制中的 168839113046
// 假设 type punned with bucket_t = unsigned short (假设 16 位)
00100111 01001111 10011000 01101001 01010110
| | | | | | | |
| | | | | | 01 01010110 // _bucket5 = 342
| | | | | | ###### // 填充至 16 位边界
| | | | 10011000 // _bucket4 = 152
| | 001111 // _bucket3 = 15
| | ## // 填充至 16 位边界
100111 // _bucket2 = 39
... // _bucket1 = 0
// _bucket0 = 0
每当编译器无法将另一个位字段成员放入相同的 16 位对象中时,它会插入填充并将其放入下一个对象中。
这会改变您读取的值,因为您在不同位置读取位。如果您的位字段成员都具有 64 位类型,这种情况就不会发生。
非标准和未定义行为
尽管如此,您的代码并不符合有效的 C++ 标准。
- 匿名结构体不是标准的 C++;它们只能工作是因为有 GCC 编译器扩展支持
- 位字段成员的布局和对齐完全是实现定义的,因此在不同编译器下可能不会得到相同的结果
- 像这样使用
union
进行类型转换是未定义行为;您只能访问联合的活动成员,有一些例外情况
为了获得一致的结果,使用位移和位掩码:
time64_t data = 168839113046;
(data >> 0) & ((1u << 10) - 1) // = 342
(data >> 10) & ((1u << 8) - 1) // = 26
(data >> 18) & ((1u << 6) - 1) // = 38
// ...
这将在所有地方都为您提供一致的结果。它通过将 data
向右移动,然后使用位与运算符来屏蔽最低的 N 位。
英文:
The reason why you get inconsistent results is that bit-field members are normally not packed.
The type of the member matters, and may impact padding:
// 168839113046 in binary
// type punned with bucket_t = unsigned short (assuming 16-bit)
00100111 01001111 10011000 01101001 01010110
| | | | | | | |
| | | | | | 01 01010110 // _bucket5 = 342
| | | | | | ###### // padding to 16-bit bounds
| | | | 10011000 // _bucket4 = 152
| | 001111 // _bucket3 = 15
| | ## // padding to 16-bit bounds
100111 // _bucket2 = 39
... // _bucket1 = 0
// _bucket0 = 0
Whenever the compiler can't fit another bit-field member into the same 16-bit object, it inserts padding and puts it into the next one.
This changes the values you read, because you're reading bits at different positions. If your bit-field members all had a 64-bit type, this wouldn't happen.
Non-Standard and Undefined Behavior
That being said, your code is just not valid C++.
- anonymous structs are not standard C++; they only work because of a GCC compiler extension
- the layout and alignment of bit-field members is completely implementation-defined, so you might not get the same results with different compilers
- using
union
for type punning like this is undefined behavior; you can only access the active member of the union, with some exceptions
To get consistent results, use shifts and masks:
time64_t data = 168839113046;
(data >> 0) & ((1u << 10) - 1) // = 342
(data >> 10) & ((1u << 8) - 1) // = 26
(data >> 18) & ((1u << 6) - 1) // = 38
// ...
This will give you consistent results everywhere. It works by shifting the data
to the right, and then using the bitwise AND operator to mask out the lowest N bits.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论