英文:
Why does LLVM allocate a redundant variable?
问题
%1
corresponds to a temporary integer variable that is allocated to store the return value of the main
function, which in this case is zero. This temporary variable is used to hold the return value before it is returned by the ret
instruction.
英文:
Here's a simple C file with an enum definition and a main
function:
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
return 0;
}
It transpiles to the following LLVM IR:
define dso_local i32 @main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4
ret i32 0
}
%2
is evidently the d
variable, which gets 2 assigned to it. What does %1
correspond to if zero is returned directly?
答案1
得分: 5
这个 %1
寄存器是由 Clang 生成的,用于处理函数中的多个返回语句。想象一下,你正在编写一个计算整数阶乘的函数。与其这样写:
int factorial(int n){
int result;
if(n < 2)
result = 1;
else{
result = n * factorial(n-1);
}
return result;
}
你可能会这样写:
int factorial(int n){
if(n < 2)
return 1;
return n * factorial(n-1);
}
为什么?因为 Clang 会为你插入那个保存返回值的 result
变量。这就是 %1
变量的原因。看一下你的代码的略微修改版本的中间代码。
修改后的代码:
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
if(d) return 1;
return 0;
}
中间代码:
define dso_local i32 @main() #0 !dbg !15 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4, !dbg !22
%3 = load i32, i32* %2, align 4, !dbg !23
%4 = icmp ne i32 %3, 0, !dbg !23
br i1 %4, label %5, label %6, !dbg !25
5: ; preds = %0
store i32 1, i32* %1, align 4, !dbg !26
br label %7, !dbg !26
6: ; preds = %0
store i32 0, i32* %1, align 4, !dbg !27
br label %7, !dbg !27
7: ; preds = %6, %5
%8 = load i32, i32* %1, align 4, !dbg !28
ret i32 %8, !dbg !28
}
现在你看到 %1
变得有用了,对吧?大多数只有单个返回语句的函数会被 LLVM 的一个 passes 剥离掉这个变量。
英文:
This %1
register was generated by clang to handle multiple return statements in a function. Imagine you were writing a function to compute an integer's factorial. Instead of this
int factorial(int n){
int result;
if(n < 2)
result = 1;
else{
result = n * factorial(n-1);
}
return result;
}
You'd probably do this
int factorial(int n){
if(n < 2)
return 1;
return n * factorial(n-1);
}
Why? Because Clang will insert that result
variable that holds the return value for you. Yay. That's the reason for that %1
variable. Look at the ir for a slightly modified version of your code.
Modified code,
enum days {MON, TUE, WED, THU};
int main() {
enum days d;
d = WED;
if(d) return 1;
return 0;
}
IR,
define dso_local i32 @main() #0 !dbg !15 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
store i32 0, i32* %1, align 4
store i32 2, i32* %2, align 4, !dbg !22
%3 = load i32, i32* %2, align 4, !dbg !23
%4 = icmp ne i32 %3, 0, !dbg !23
br i1 %4, label %5, label %6, !dbg !25
5: ; preds = %0
store i32 1, i32* %1, align 4, !dbg !26
br label %7, !dbg !26
6: ; preds = %0
store i32 0, i32* %1, align 4, !dbg !27
br label %7, !dbg !27
7: ; preds = %6, %5
%8 = load i32, i32* %1, align 4, !dbg !28
ret i32 %8, !dbg !28
}
Now you see %1
making itself useful huh? Most functions with a single return statement will have this variable stripped by one of llvm's passes.
答案2
得分: 1
为什么这很重要 - 实际问题是什么?
我认为你寻找的更深入答案可能是:LLVM的架构基于相对简单的前端和许多通道。前端必须生成正确的代码,但不一定是优秀的代码。它们可以做最简单有效的事情。
在这种情况下,Clang生成了一些指令,结果证明它们没有被用于任何操作。通常情况下,这并不是问题,因为LLVM的某个部分将消除多余的指令。Clang信任这种情况会发生。Clang不需要避免发出无用的代码;它的实现可以侧重于正确性、简单性、可测试性等。
英文:
Why does this matter — what's the actual problem?
I think the deeper answer you're looking for might be: LLVM's architecture is based around fairly simple frontends and many passes. The frontends have to generate correct code, but it doesn't have to be good code. They can do the simplest thing that works.
In this case, Clang generates a couple of instructions that turn out not to be used for anything. That's generally not a problem, because some part of LLVM will get rid of superfluous instructions. Clang trusts that to happen. Clang doesn't need to avoid emitting dead code; its implementation may focus on correctness, simplicity, testability, etc.
答案3
得分: 1
因为Clang已完成语法分析,但LLVM甚至还没有开始优化。
Clang前端已生成了IR(中间表示),而不是机器码。这些变量是SSA(单一静态分配);它们尚未绑定到寄存器,并且实际上在优化之后将永远不会绑定,因为它们是多余的。
该代码在某种程度上是源代码的直接表示。这是Clang传递给LLVM进行优化的内容。基本上,LLVM从这里开始进行优化。实际上,对于版本10和x86_64,llc -O2 最终会生成:
main: # @main
xor eax, eax
ret
英文:
Because Clang is done with syntax analysis but LLVM hasn't even started with optimization.
The Clang front end has generated IR (Intermediate Representation) and not machine code. Those variables are SSAs (Single Static Assignments); they haven't been bound to registers yet and actually after optimization, never will be because they are redundant.
That code is a somewhat literal representation of the source. It is what clang hands to LLVM for optimization. Basically, LLVM starts with that and optimizes from there. Indeed, for version 10 and x86_64, llc -O2 will eventually generate:
main: # @main
xor eax, eax
ret
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论