2023年1月9日 05:13:48go评论77阅读模式

英文:

Why does the compiler not automatically remove branches?

问题

以下是翻译的部分：

当已知它会导致 CPU 减速时，为什么在第三个函数中插入了条件检查和跳转？
为什么这三个函数的编译结果不同？
什么阻止编译器将前两个函数编译成第三个函数？
前两个函数的字节码在执行 i == 1 的条件检查之前就执行了 j == 4 的条件检查，为什么？这不是在 i == 1 时浪费了计算吗？
如果编译器在跳转之前对 i == 1 进行了检查，为什么它在跳转之后再次进行检查？
为什么 testConditionsNoBranch 在 -O0 优化标志下仍然具有分支？按照代码编写，这段代码中没有分支。

请注意，以上问题是对给定代码的技术性分析，需要更深入的编译器和优化知识来回答。

英文:

I was curious about whether the compiler automatically removed branches when it could so I set up an easy case:

bool testConditionsIf(int i, int j) {
    if (i == 1){
        if (j == 2){
            return true;
        }
        return j == 3;
    }
    return  j == 4;
}

bool testConditionsTernary(int i, int j) {
    return i == 1 ? (j == 2 ? true : j == 3) : j == 4;
}

bool testConditionsNoBranch(int i, int j) {
    return (i == 1 &amp;&amp; (j == 2 || j == 3)) || (i != 1 &amp;&amp; j == 4);
}

Each function is semantically equivalent to the next. The second rewrites the first with ternaries instead of if statements. The third rewrites the second by using boolean logic to handle both sides of the if statement. Interestingly enough, with gcc the first two produce the same code, but the third does not. The first two are assembled with conditional checks and jumps, much like the code is written. The third has a single conditional check on 1 and then returns the boolean computation of each branch. So I have a few questions:

Why was a condition check + jump inserted into the third function when that is known to be a CPU slowdown?
Why dont all three compile to the same thing?
What is stopping the compiler from compiling the first two to the third?
The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like wasted work if i == 1?
If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?
Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.

https://www.godbolt.org/z/P69TGMsja

答案1

得分: 2

三元运算符与 `if` 相同（在编译后的代码方面），生成的代码可能包含分支。

> 使用 -O0 优化标志

禁用优化来测试代码质量是没有意义的。

你需要将其转换为一个算术表达式，很可能会被编译成没有任何分支的形式。

bool testConditionsNoBranch1(int i, int j) {
return (i == 1) * (j == 2) + (i == 1) * (j == 3) + (i != 1) * (j == 4);
}


它将被编译（使用 -Ofast）为

testConditionsNoBranch1(int, int):
cmp edi, 1
sete cl
xor eax, eax
cmp esi, 2
sete al
xor edx, edx
and eax, ecx
cmp esi, 3
sete dl
and edx, ecx
add eax, edx
cmp edi, 1
setne cl
xor edx, edx
cmp esi, 4
sete dl
and edx, ecx
add eax, edx
setne al
ret

英文:

The ternary operator is the same as if (resulting compiled code wise) and the code generated can contain branches.

> it with -O0 optimization flag

Testing code quality with optimizations disabled makes no sense.

You need to convert it into an arithmetic expression which is very likely to be compiled without any branches.

bool testConditionsNoBranch1(int i, int j) {
    return (i == 1) * (j == 2) + (i == 1) * (j == 3) + (i != 1) * (j == 4);
}

and it will be compiled (using -Ofast) to

testConditionsNoBranch1(int, int):
        cmp     edi, 1
        sete    cl
        xor     eax, eax
        cmp     esi, 2
        sete    al
        xor     edx, edx
        and     eax, ecx
        cmp     esi, 3
        sete    dl
        and     edx, ecx
        add     eax, edx
        cmp     edi, 1
        setne   cl
        xor     edx, edx
        cmp     esi, 4
        sete    dl
        and     edx, ecx
        add     eax, edx
        setne   al
        ret

答案2

得分: 0

当已知这会导致 CPU 减速时，为什么在第三个函数中插入了条件检查和跳转？

这个问题我无法回答。
为什么这三个函数不都编译成相同的内容？

这个问题很简单。这些编译器由不同的人/团队编写，它们必须遵循标准，但标准允许一些解释的余地（例如，如果我没记错，GCC 和 Clang 在作用域解析上存在分歧），而且编译器由于编写方式的不同存在一些限制。虽然它们都以相同的基本方式工作，但实现方式是相当不同的。
或者简而言之，它们之间的差异是因为它们不同。
是什么阻止编译器将前两者编译成第三者？

错失的优化机会？
可能是缺失的优化？
也有可能编译器考虑了第三者并决定不采用它。
Clang 在其优化过程中提供了很好的调试信息，那将是我首先查看的地方。
前两者的字节码在进行 i == 1 的条件检查之前会进行 j == 4 的条件检查，为什么？如果 i == 1，那不是看起来像是浪费的工作吗？

显然这是一种猜测，但编译器可能假设 i != 1。毕竟，它是一个整数。它有 1/4,294,967,295 的机会等于 1。
如果编译器在跳转之前进行了 i == 1 的检查，为什么在跳转后还要再次进行该检查？

跳转之后实际上是在检查是否等于 3。只是巧合的是 3 - 2 == 1。
为什么在 -O0 优化标志下，testConditionsNoBranch 中会有分支？根据写入的代码，该代码中没有分支。

逻辑运算符（|| 和 &&）需要评估多个条件。

bool b = (i == 1 && j ==2)
在语义上与

bool b = false;
if (i == 1)
if (j == 2)
b = true;
相同。

英文:

I can certainly take a stab at some of these

> 1. Why was a condition check + jump inserted into the third function when that is known to be a CPU slowdown?

This one I can't answer.
> 2. Why dont all three compile to the same thing?

This one is simple enough. The compilers are written by different people/teams, they have to follow the standard, but the standard allows some room for interpretation (for instance, GCC and Clang disagree on scope resolution if memory serves), and then there are limitations on what the compilers can do because of how they are written. While they all work in the same basic way, the implementations are quite different.
Or to put it shortly, they differ because they are different.
> 3. What is stopping the compiler from compiling the first two to the third?

Missed optimization opportunities?
Maybe missing optimizations?
Its also possible the compilers considered the 3rd and decided against it.
Clang offers great debugging info on its optimization passes, and that would be the first place i would look.
> 4. The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like
> wasted work if i == 1?

Obviously this is speculation, but the compiler is probably assuming i != 1. It is an int, Afterall. It has a 1 in 4,294,967,295 chance of being 1.
> 5. If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?

After the jump its actually checking for 3. Its just coincidence that 3 - 2 == 1
> 6. Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.

Logical operators (|| and &&) requires evaluating multiple conditions.

bool b = (i == 1 &amp;&amp; j ==2)

is semantically the same as

bool b = false;
if (i == 1)
    if (j == 2)
        b = true;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么编译器不会自动移除分支？

问题

答案1

答案2

为什么 C++20 的 `std::popcount` 只限于无符号类型？

在C++中可以使用popen()而不调用shell吗？

比特掩码检查是否比比较数字更高效？

SVT-AV1 Emscripten构建

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论