英文:
Why does the compiler not automatically remove branches?
问题
以下是翻译的部分:
- 当已知它会导致 CPU 减速时,为什么在第三个函数中插入了条件检查和跳转?
- 为什么这三个函数的编译结果不同?
- 什么阻止编译器将前两个函数编译成第三个函数?
- 前两个函数的字节码在执行 i == 1 的条件检查之前就执行了 j == 4 的条件检查,为什么?这不是在 i == 1 时浪费了计算吗?
- 如果编译器在跳转之前对 i == 1 进行了检查,为什么它在跳转之后再次进行检查?
- 为什么 testConditionsNoBranch 在 -O0 优化标志下仍然具有分支?按照代码编写,这段代码中没有分支。
请注意,以上问题是对给定代码的技术性分析,需要更深入的编译器和优化知识来回答。
英文:
I was curious about whether the compiler automatically removed branches when it could so I set up an easy case:
bool testConditionsIf(int i, int j) {
if (i == 1){
if (j == 2){
return true;
}
return j == 3;
}
return j == 4;
}
bool testConditionsTernary(int i, int j) {
return i == 1 ? (j == 2 ? true : j == 3) : j == 4;
}
bool testConditionsNoBranch(int i, int j) {
return (i == 1 && (j == 2 || j == 3)) || (i != 1 && j == 4);
}
Each function is semantically equivalent to the next. The second rewrites the first with ternaries instead of if statements. The third rewrites the second by using boolean logic to handle both sides of the if statement. Interestingly enough, with gcc the first two produce the same code, but the third does not. The first two are assembled with conditional checks and jumps, much like the code is written. The third has a single conditional check on 1 and then returns the boolean computation of each branch. So I have a few questions:
- Why was a condition check + jump inserted into the third function when that is known to be a CPU slowdown?
- Why dont all three compile to the same thing?
- What is stopping the compiler from compiling the first two to the third?
- The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like wasted work if i == 1?
- If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?
- Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.
答案1
得分: 2
三元运算符与 `if` 相同(在编译后的代码方面),生成的代码可能包含分支。
> 使用 -O0 优化标志
禁用优化来测试代码质量是没有意义的。
你需要将其转换为一个算术表达式,很可能会被编译成没有任何分支的形式。
bool testConditionsNoBranch1(int i, int j) {
return (i == 1) * (j == 2) + (i == 1) * (j == 3) + (i != 1) * (j == 4);
}
它将被编译(使用 -Ofast)为
testConditionsNoBranch1(int, int):
cmp edi, 1
sete cl
xor eax, eax
cmp esi, 2
sete al
xor edx, edx
and eax, ecx
cmp esi, 3
sete dl
and edx, ecx
add eax, edx
cmp edi, 1
setne cl
xor edx, edx
cmp esi, 4
sete dl
and edx, ecx
add eax, edx
setne al
ret
英文:
The ternary operator is the same as if
(resulting compiled code wise) and the code generated can contain branches.
> it with -O0 optimization flag
Testing code quality with optimizations disabled makes no sense.
You need to convert it into an arithmetic expression which is very likely to be compiled without any branches.
bool testConditionsNoBranch1(int i, int j) {
return (i == 1) * (j == 2) + (i == 1) * (j == 3) + (i != 1) * (j == 4);
}
and it will be compiled (using -Ofast) to
testConditionsNoBranch1(int, int):
cmp edi, 1
sete cl
xor eax, eax
cmp esi, 2
sete al
xor edx, edx
and eax, ecx
cmp esi, 3
sete dl
and edx, ecx
add eax, edx
cmp edi, 1
setne cl
xor edx, edx
cmp esi, 4
sete dl
and edx, ecx
add eax, edx
setne al
ret
答案2
得分: 0
-
当已知这会导致 CPU 减速时,为什么在第三个函数中插入了条件检查和跳转?
这个问题我无法回答。
-
为什么这三个函数不都编译成相同的内容?
这个问题很简单。这些编译器由不同的人/团队编写,它们必须遵循标准,但标准允许一些解释的余地(例如,如果我没记错,GCC 和 Clang 在作用域解析上存在分歧),而且编译器由于编写方式的不同存在一些限制。虽然它们都以相同的基本方式工作,但实现方式是相当不同的。
或者简而言之,它们之间的差异是因为它们不同。 -
是什么阻止编译器将前两者编译成第三者?
错失的优化机会?
可能是缺失的优化?
也有可能编译器考虑了第三者并决定不采用它。
Clang 在其优化过程中提供了很好的调试信息,那将是我首先查看的地方。 -
前两者的字节码在进行 i == 1 的条件检查之前会进行 j == 4 的条件检查,为什么?如果 i == 1,那不是看起来像是浪费的工作吗?
显然这是一种猜测,但编译器可能假设
i != 1
。毕竟,它是一个整数。它有 1/4,294,967,295 的机会等于 1。 -
如果编译器在跳转之前进行了 i == 1 的检查,为什么在跳转后还要再次进行该检查?
跳转之后实际上是在检查是否等于 3。只是巧合的是
3 - 2 == 1
。 -
为什么在 -O0 优化标志下,testConditionsNoBranch 中会有分支?根据写入的代码,该代码中没有分支。
逻辑运算符(
||
和&&
)需要评估多个条件。bool b = (i == 1 && j ==2)
在语义上与bool b = false;
if (i == 1)
if (j == 2)
b = true;
相同。
英文:
I can certainly take a stab at some of these
> 1. Why was a condition check + jump inserted into the third function when that is known to be a CPU slowdown?
This one I can't answer.
> 2. Why dont all three compile to the same thing?
This one is simple enough. The compilers are written by different people/teams, they have to follow the standard, but the standard allows some room for interpretation (for instance, GCC and Clang disagree on scope resolution if memory serves), and then there are limitations on what the compilers can do because of how they are written. While they all work in the same basic way, the implementations are quite different.
Or to put it shortly, they differ because they are different.
> 3. What is stopping the compiler from compiling the first two to the third?
Missed optimization opportunities?
Maybe missing optimizations?
Its also possible the compilers considered the 3rd and decided against it.
Clang offers great debugging info on its optimization passes, and that would be the first place i would look.
> 4. The byte code for the first two does a condition check for j == 4 before doing a condition check for i == 1, why? Doesnt that seem like
> wasted work if i == 1?
Obviously this is speculation, but the compiler is probably assuming i != 1
. It is an int, Afterall. It has a 1 in 4,294,967,295 chance of being 1.
> 5. If the compiler does a check for i == 1 before the jump, why is it doing that check again after the jump?
After the jump its actually checking for 3. Its just coincidence that 3 - 2 == 1
> 6. Why does testConditionsNoBranch have branches in it with -O0 optimization flag? The code has no branches in it as written.
Logical operators (||
and &&
) requires evaluating multiple conditions.
bool b = (i == 1 && j ==2)
is semantically the same as
bool b = false;
if (i == 1)
if (j == 2)
b = true;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论