2023年7月3日 22:00:23go评论112阅读模式

英文:

Inconsistent results when type punning uint64_t with union and bit-field

问题

以下是翻译好的部分：

我在union中使用匿名的struct，如下所示：

using time64_t = uint64_t;
using bucket_t = uint64_t;
union clock_test {
    time64_t _time64;
    struct {
        bucket_t _bucket5 : 10;     // 桶：5  1024
        bucket_t _bucket4 : 8;      // 桶：4  256
        bucket_t _bucket3 : 6;      // 桶：3  64
        bucket_t _bucket2 : 6;      // 桶：2  64
        bucket_t _bucket1 : 6;      // 桶：1  64
        bucket_t _bucket0 : 6;      // 桶：0  64
    };
};

如果bucket_t = uint64_t，它的工作正常，但是如果using bucket_t = uint16_t或uint32_t，我会得到令人困惑的结果。

我对所有情况使用相同的测试代码：

clock_test clk;
clk._time64 = 168839113046;

对于bucket_t = uint64_t，clk是：

_bucket5   342	// unsigned __int64
_bucket4	26	// unsigned __int64
_bucket3	38	// unsigned __int64
_bucket2	15	// unsigned __int64
_bucket1	29	// unsigned __int64
_bucket0	 2	// unsigned __int64

对于bucket_t = uint32_t，clk是：

_bucket    342	// unsigned int
_bucket4	26	// unsigned int
_bucket3	38	// unsigned int
_bucket2	15	// unsigned int
_bucket1	39	// unsigned int
_bucket0	 0	// unsigned int

对于bucket_t = uint16_t，clk是：

_bucket5    342	// unsigned short
_bucket4	152	// unsigned short
_bucket3	 15	// unsigned short
_bucket2	 39	// unsigned short
_bucket1	  0	// unsigned short
_bucket0	  0	// unsigned short

如果需要更多信息，请查看上述的代码和结果。

英文:

I am using an anonymous struct in union as follows:

using time64_t = uint64_t;
using bucket_t = uint64_t;
union clock_test {
    time64_t _time64;
    struct {
        bucket_t _bucket5 : 10;     // bucket：5  1024
        bucket_t _bucket4 : 8;      // bucket：4  256
        bucket_t _bucket3 : 6;      // bucket：3  64
        bucket_t _bucket2 : 6;      // bucket：2  64
        bucket_t _bucket1 : 6;      // bucket：1  64
        bucket_t _bucket0 : 6;      // bucket：0  64
    };
};

If bucket_t = uint64_t, it works as expected, but with using bucket_t = uint16_t or uint32_t, I get puzzling results.

I use the same test code for all cases:

clock_test clk;
clk._time64 = 168839113046;

For bucket_t = uint64_t, clk is:

_bucket5   342	// unsigned __int64
_bucket4	26	// unsigned __int64
_bucket3	38	// unsigned __int64
_bucket2	15	// unsigned __int64
_bucket1	29	// unsigned __int64
_bucket0	 2	// unsigned __int64

For bucket_t = uint32_t, clk is:

_bucket    342	// unsigned int
_bucket4	26	// unsigned int
_bucket3	38	// unsigned int
_bucket2	15	// unsigned int
_bucket1	39	// unsigned int
_bucket0	 0	// unsigned int

For bucket_t = uint16_t, clk is:

_bucket5    342	// unsigned short
_bucket4	152	// unsigned short
_bucket3	 15	// unsigned short
_bucket2	 39	// unsigned short
_bucket1	  0	// unsigned short
_bucket0	  0	// unsigned short

...
use vscode + clang, See this issue clearly

答案1

得分: 3

由于位字段成员通常不会被紧密排列，所以您得到不一致结果的原因在于这一点。
成员的类型很重要，可能会影响填充：

// 二进制中的 168839113046
// 假设 type punned with bucket_t = unsigned short (假设 16 位)
00100111 01001111 10011000 01101001 01010110
  |    |   |    | |      |       |         |
  |    |   |    | |      |       01 01010110 // _bucket5 = 342
  |    |   |    | |      | ######            // 填充至 16 位边界
  |    |   |    | 10011000                   // _bucket4 = 152
  |    |   001111                            // _bucket3 = 15
  |    | ##                                  // 填充至 16 位边界
  100111                                     // _bucket2 = 39
...                                          // _bucket1 = 0
                                             // _bucket0 = 0

每当编译器无法将另一个位字段成员放入相同的 16 位对象中时，它会插入填充并将其放入下一个对象中。
这会改变您读取的值，因为您在不同位置读取位。如果您的位字段成员都具有 64 位类型，这种情况就不会发生。

非标准和未定义行为

尽管如此，您的代码并不符合有效的 C++ 标准。

匿名结构体不是标准的 C++；它们只能工作是因为有 GCC 编译器扩展支持
位字段成员的布局和对齐完全是实现定义的，因此在不同编译器下可能不会得到相同的结果
像这样使用 union 进行类型转换是未定义行为；您只能访问联合的活动成员，有一些例外情况

为了获得一致的结果，使用位移和位掩码：

time64_t data = 168839113046;
(data >>  0) & ((1u << 10) - 1)     // = 342
(data >> 10) & ((1u <<  8) - 1)     // = 26
(data >> 18) & ((1u <<  6) - 1)     // = 38
// ...

这将在所有地方都为您提供一致的结果。它通过将 data 向右移动，然后使用位与运算符来屏蔽最低的 N 位。

英文:

The reason why you get inconsistent results is that bit-field members are normally not packed.
The type of the member matters, and may impact padding:

// 168839113046 in binary
// type punned with bucket_t = unsigned short (assuming 16-bit)
00100111 01001111 10011000 01101001 01010110
  |    |   |    | |      |       |         |
  |    |   |    | |      |       01 01010110 // _bucket5 = 342
  |    |   |    | |      | ######            // padding to 16-bit bounds
  |    |   |    | 10011000                   // _bucket4 = 152
  |    |   001111                            // _bucket3 = 15
  |    | ##                                  // padding to 16-bit bounds
  100111                                     // _bucket2 = 39
...                                          // _bucket1 = 0
                                             // _bucket0 = 0

Whenever the compiler can't fit another bit-field member into the same 16-bit object, it inserts padding and puts it into the next one.
This changes the values you read, because you're reading bits at different positions. If your bit-field members all had a 64-bit type, this wouldn't happen.

Non-Standard and Undefined Behavior

That being said, your code is just not valid C++.

anonymous structs are not standard C++; they only work because of a GCC compiler extension
the layout and alignment of bit-field members is completely implementation-defined, so you might not get the same results with different compilers
using union for type punning like this is undefined behavior; you can only access the active member of the union, with some exceptions

To get consistent results, use shifts and masks:

time64_t data = 168839113046;
(data &gt;&gt;  0) &amp; ((1u &lt;&lt; 10) - 1)     // = 342
(data &gt;&gt; 10) &amp; ((1u &lt;&lt;  8) - 1)     // = 26
(data &gt;&gt; 18) &amp; ((1u &lt;&lt;  6) - 1)     // = 38
// ...

This will give you consistent results everywhere. It works by shifting the data to the right, and then using the bitwise AND operator to mask out the lowest N bits.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用联合和位字段进行类型转换 uint64_t 时结果不一致。

问题

答案1

非标准和未定义行为

Non-Standard and Undefined Behavior

ImGui::SetNextWindowSizeConstraints() 不按预期工作

在C++中将数据集计算分配给工作线程

External dll function with .h file gives no return value, but gives an error. (DLL and function does work perfect in VBA)

C++：unique_ptr指向的类持有对所有者的引用，这是不好的吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。