获取一个 n 位掩码,在 n 等于类型的位宽时避免未定义行为?

huangapple go评论85阅读模式
英文:

Get an n-bit mask avoiding UB when n is equal to the bit width of the type?

问题

我有一个小的正整数 n,并且我使用无符号整数类型来存储一个包含 n 位的掩码。通常,需要构造一个所有 n 位都设置为 1 的掩码。例如,如果 n 为 5,则掩码将为 0b11111u

构造这样的掩码的典型方式是执行以下操作(此示例假定掩码使用 unsigned,但可以为任何无符号整数类型编写类似的代码):

unsigned all_set_mask = (1u << n) - 1u;

然而,如果 n 正好等于无符号整数类型的位宽,那么1u << n 就是未定义行为,如[expr.shift#1]所示:

操作数必须是整数或未作用域的枚举类型,且将执行整数提升。结果的类型与左操作数的提升类型相同。如果右操作数为负数,或者大于或等于提升后的左操作数的宽度,则行为是未定义的。

一个合理的解释是,"构造一个所有 n 位都设置为 1 的掩码" 理应允许在底层整数类型的位宽与位数相同的情况下进行操作,因此典型实现不支持所有合理的输入。

此外,在现代处理器上,当按位宽进行左移操作时,汇编左移指令是一个空操作,因此 all_set_mask 可能最终会变为 0,这在任何情况下都不是预期的答案。

是否有一种符合标准的方法可以重新编写它,而不需要使用 if 语句或复杂的位操作?我查看了 <bit> 但没有找到有用的信息。

英文:

I have a small positive integer n, and I use an unsigned integral type to store a mask containing n bits. Often, there is a need to construct a mask with all n bits set. For example, if n is 5, then the mask would be 0b11111u.

The typical way to construct such a mask is to do the following (this example assumes that the mask uses unsigned, but it is possible to write something for any unsigned integral type):

unsigned all_set_mask = (1u &lt;&lt; n) - 1u;

However, 1u &lt;&lt; n is undefined behaviour if n is exactly equal to the bit width of the unsigned integral type, as seen from [expr.shift#1]:

> The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the width of the promoted left operand.

A reasonable interpretation of "construct a mask with all n bits set" arguably should permit the case when we have exactly as many bits as the bit width of the underlying integral type, and so the typical implementation does not support all reasonable inputs.

Furthermore, on modern processors, the assembly left shift instruction is a no-op when shifting by the bit width, and so all_set_mask might end up being 0, which isn't the expected answer in any case.

Is there an standard-compliant way to rewrite it without resorting to an if-statement or complex bit twiddling? I looked into &lt;bit&gt; but did not see anything of use there.

答案1

得分: 2

以下是翻译好的部分:

简单的方法如下:

n == 0 ? 0 : 0xffffffff >> (32 - n)

我们可以稍微重写以在通常的体系结构中节省一条指令,其中移位计数取模于寄存器宽度:

n == 0 ? 0 : 0xffffffff >> (-n & 31)

然后为了去掉条件判断:

(unsigned)(-(signed)n >> 31) >> (-n & 31)

这应该在典型的CPU上编译为仅三条指令(这是汇编比C++更可读的罕见情况)。

请注意,这假定带符号右移是算术右移,这只从C++20开始才是严格正确的。

英文:

A simple way is the following:

n == 0 ? 0 : 0xffffffff &gt;&gt; (32 - n)

We can rewrite this slightly to save one instruction on typical architectures where shift counts are modulo register width:

n == 0 ? 0 : 0xffffffff &gt;&gt; (-n &amp; 31)

and then to get rid of the conditional:

(unsigned)(-(signed)n &gt;&gt; 31) &gt;&gt; (-n &amp; 31)

This should compile to just three instructions on typical CPUs (one of the rare cases where assembly is more readable than C++).

Note that this assumes that signed right-shifts are arithmetic shifts, which is pedantically correct only from C++20 on.

答案2

得分: 1

你可以无条件地使用任何无符号类型来执行此操作:

(((1u << (n + 1) / 2) - 1u) << n / 2) | ((1u << n / 2) - 1u);

16位无符号整数的示例:

#include <iostream>

int main() {
  for (int n = 0; n <= 16; ++n) {
    const uint16_t r = (((1u << (n + 1) / 2) - 1u) << n / 2) | ((1u << n / 2) - 1u);
    std::cout << std::hex << r << " ";
  }
}

// Output: 0 1 3 7 f 1f 3f 7f ff 1ff 3ff 7ff fff 1fff 3fff 7fff ffff
英文:

You can do this unconditionally with any unsigned type:

(((1u &lt;&lt; (n + 1) / 2) - 1u) &lt;&lt; n / 2) | ((1u &lt;&lt; n / 2) - 1u);

The example with 16-bit unsigned:

#include &lt;iostream&gt;

int main() {
  for (int n = 0; n &lt;= 16; ++n) {
    const uint16_t r = (((1u &lt;&lt; (n + 1) / 2) - 1u) &lt;&lt; n / 2) | ((1u &lt;&lt; n / 2) - 1u);
    std::cout &lt;&lt; std::hex &lt;&lt; r &lt;&lt; &quot; &quot;;
  }
}

// Output: 0 1 3 7 f 1f 3f 7f ff 1ff 3ff 7ff fff 1fff 3fff 7fff ffff

答案3

得分: 1

我建议设置所有位并向右移动掉不需要的位。

不过,你仍然需要测试当 `n`  `0` 的特殊情况,否则它会将所有位都移走,导致未定义的行为。
英文:

I suggest setting all bits and right shifting away the the unwanted bits.

You still need to test for the edge case when n is 0 though, otherwise it'll shift away all bits, with undefined behavior as a result.

template &lt;class T&gt;
    requires std::is_unsigned_v&lt;T&gt;
constexpr T all_set_mask(unsigned n) {
    if(n == 0) return T{};
    
    constexpr unsigned  bits = sizeof(T) * CHAR_BIT;
    // extra test for too many bits if needed:
    //if(n &gt; bits) return static_cast&lt;T&gt;(-1);
    
    return static_cast&lt;T&gt;(-1) &gt;&gt; (bits - n);
}

Demo

答案4

得分: 0

在x86-64上,_bextr_u64(UINT64_C(-1), 0, n)

英文:

On x86-64, _bextr_u64(UINT64_C(-1), 0, n).

huangapple
  • 本文由 发表于 2023年6月18日 23:01:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76501163.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定