英文:
Branchless way to add two UINT while avoiding overflow?
问题
我想知道是否有一种无分支的方法来执行此操作,或者只是一种通常更好的方法:
```c
uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
if (UINT16_MAX - delta < num)
{
return UINT16_MAX;
}
return num + delta;
}
我尝试使用min
/max
来思考,但即使我尝试使用(明确定义的)uint溢出,我能想到的最好方法也只是max(num + delta, num)
,如果溢出则返回原始数字a,否则返回结果。
但我想不出一种使用min/max/clamp从这个分割点到所需行为的方法(无需分支)。
<details>
<summary>英文:</summary>
I'm curious if there's a branchless way to do this, or perhaps just a generally better way:
```c
uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
if (UINT16_MAX - delta < num)
{
return UINT16_MAX;
}
return num + delta;
}
I tried thinking about it with min
/max
, but even if I try to use (well-defined) uint overflow, the best I can come up with is max(num + delta, num)
which would return the original number a if overflown, or the result otherwise.
But I can't think of a way with min/max/clamp to get from this split to the desired behaviour (without branching)
答案1
得分: 3
停止胡乱尝试,保持简单,依赖于优化器。
当然,除非您的优化器不够聪明。那么请继续胡乱尝试。
英文:
Stop fiddling around, keep it simple and rely on the optimizer.
Unless your optimizer is not up to sniff of course. Then by all means continue fiddling around.
答案2
得分: 1
EDIT 2: 我检查了我的具体平台(Xtensa-Esp32)。似乎所有版本都包含一个分支,即使是简单的布尔比较(a > b
)。所以在我的情况下,标题中最简单的if
语句可能是最好的选择,即使不同的方法本身也很有趣。
EDIT 1: 在这个问题上与ChatGPT玩耍后,也许可以更好地实现?
uint16_t add_no_overflow(uint16_t num, uint16_t delta) {
uint16_t sum = num + delta;
uint16_t did_overflow = (sum > num)-1; // 如果溢出则为0xFFFF,否则为0x0000
return (sum & ~did_overflow) | (UINT16_MAX & did_overflow);
}
在发布后立刻考虑到了这一点。橡皮鸭的魔力!
使用min
/max
的实现(不一定是无分支)
uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
uint16_t final_delta = min(UINT_16_T_MAX-num, delta);
return num + final_delta;
}
英文:
EDIT 2: I checked my specific platform (Xtensa-Esp32). Seems that all versions contain a branch, even simple boolean comparisons (a > b
)
So in my case the simplest if
statement in the title might be the best choice, even if the different methods are interesting on their own.
EDIT 1: Played with ChatGPT on this - after coaxing out the bugs maybe a better implementation?
uint16_t add_no_overflow(uint16_t num, uint16_t delta) {
uint16_t sum = num + delta;
uint16_t did_overflow = (sum > num)-1; // 0xFFFF if overflown, 0x0000 otherwise
return (sum & ~did_overflow) | (UINT16_MAX & did_overflow);
}
Thought about this immediately after posting. The magic of rubber ducking!
Implementation with min
/max
(not necessarily branchless)
uint16_t add_with_no_overflow(uint16_t num, uint16_t delta)
{
uint16_t final_delta = min(UINT_16_T_MAX-num, delta);
return num + final_delta;
}
答案3
得分: 0
这是我的尝试:
uint16_t add(uint16_t a, uint16_t b) {
uint32_t i = a + b;
return i & 0xffff | -(i >> 16);
}
单次加法最多只能溢出一位。因此,如果我们将它们作为32位数字相加并且溢出,结果的高16位将包含值1。
因此,我们将其向右移动16位,以获取0或1。然后我们取反它以获取0或-1(这会转换为无符号的最大值)。然后我们将结果与加法产生的16位进行or
运算。如果是0,那不会影响结果。如果它是转换为无符号的-1,那么所有位都将被设置,因此当我们将其与前一个结果进行or
运算时,仍然会得到所有位都被设置(这是无符号的最大值)。
对于ESP32,gcc 11.2生成以下代码:
add(unsigned short, unsigned short):
entry sp, 32
extui a2, a2, 0, 16
extui a3, a3, 0, 16
add.n a8, a2, a3
extui a2, a8, 16, 16
neg a2, a2
or a2, a2, a8
extui a2, a2, 0, 16
retw.n
视野中唯一的分支就是返回语句...
链接:https://godbolt.org/z/ezhxY56qx
当然,使用不同的编译器,或者可能甚至是相同编译器的不同标志,可能会生成分支。但至少在对半打多种不同处理器的几十种不同编译器进行快速测试时,我没有看到生成任何分支(尽管在6502编译器上编译确实失败了,显然根本不支持unsigned long
)。
英文:
Here's my attempt at it:
uint16_t add(uint16_t a, uint16_t b) {
uint32_t i = a + b;
return i & 0xffff | -(i >> 16);
}
A single addition can only overflow by at most one bit. So, if we add them as 32-bit numbers and they overflow, the upper 16 bits of the result will contain the value 1.
So, we shift that right 16 places, to get either 0 or 1. Then we negate it to get 0 or -1 (which converts to the maximum value as an unsigned). Then we or
the result with the 16 bits produced by the addition. If it's 0, that won't affect the result. If it was -1 converted to unsigned, that'll have all the bits set, so when we or
it with the previous result, we still get all bits set (which is the maximum value for an unsigned).
For the ESP32, gcc 11.2 produces the following:
add(unsigned short, unsigned short):
entry sp, 32
extui a2, a2, 0, 16
extui a3, a3, 0, 16
add.n a8, a2, a3
extui a2, a8, 16, 16
neg a2, a2
or a2, a2, a8
extui a2, a2, 0, 16
retw.n
The only branch in sight is the return statement...
https://godbolt.org/z/ezhxY56qx
Of course, with a different compiler, or possibly even a different set of flags to the same compiler, it could generate a branch. But at least in a quick test on a couple dozen or so different compilers for half a dozen or so processors, I didn't see any branches generated (though compilation did simply fail on the 6502 compiler, which apparently doesn't support an unsigned long
at all).
答案4
得分: 0
为什么不只是显而易见的呢
uint16_t add_no_over(uint16_t a, uint16_t b)
{
uint32_t sum = a + b;
return sum < 65535 ? sum : 65535;
}
它在esp32上使用gcc-11生成了以下汇编代码:
entry sp, 32
l32r a8, .LC0
extui a3, a3, 0, 16
extui a2, a2, 0, 16
add.n a2, a2, a3
minu a2, a2, a8
extui a2, a2, 0, 16
retw.n
<details>
<summary>英文:</summary>
Why not just the obvious
uint16_t add_no_over(uint16_t a, uint16_t b)
{
uint32_t sum = a + b;
return sum < 65535 ? sum : 65535;
}
It gives [the following assembly code][1] with gcc-11 for esp32 from godbolt:
entry sp, 32
l32r a8, .LC0
extui a3, a3, 0, 16
extui a2, a2, 0, 16
add.n a2, a2, a3
minu a2, a2, a8
extui a2, a2, 0, 16
retw.n
[1]: https://godbolt.org/z/sb4js1cTq
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论