英文:
Why is the 'auto' keyword useful for compiler writers in C?
问题
我目前正在阅读《Expert C Programming - Deep C Secrets》,刚刚看到了这段话:
存储类别说明符
auto
从未被需要过。它对于编译器编写者来说具有实际意义 — 它向符号表中添加一个条目,表示“在进入块时自动分配此存储”(与在编译时静态分配或在堆上动态分配相对)。对于所有其他程序员来说,auto
几乎毫无意义,因为它只能在函数内部使用,但函数中的数据声明默认具有此属性。
我看到有人在这里问了同样的问题,但他们没有得到答案,而且评论中提供的链接只是解释了为什么在C中存在这样一个关键字,它是从B继承的,并与C++11或C++11之前的版本有所不同。
我还是决定发帖,重点是关于 auto
关键字在编译器编写中某种程度上有用的部分,但这与符号表有何关联?
我非常坚持要求仅关于在C中编写编译器时潜在用途的问题(不是编写C编译器的代码)。
为了澄清,我提出了这个问题,是因为我想知道是否有一个示例代码中可以证明 auto
是有道理的,因为作者在编写编译器时说过会有这样的情况。
总的来说,我认为我已经理解了 auto
(它是从B继承的,在那里它是强制性的,但在C中是无用的),但我无法想象在使用它时有任何有用的例子(或者至少不是无用的)。
看起来真的没有任何理由使用 auto
,但是否存在与引述的说法相对应的任何旧源代码或类似的东西?
英文:
I'm currently reading "Expert C Programming - Deep C Secrets", and just came across this:
> The storage class specifier auto
is never needed. It is mostly meaningful to a compiler-writer
making an entry in a symbol table — it says "this storage is automatically allocated on entering the
block" (as opposed to statically allocated at compiletime, or dynamically allocated on the heap). auto
is pretty much meaningless to all other programmers, since it can only be used inside a function, but
data declarations in a function have this attribute by default.
I saw that someone asked about the same thing here, but they don't have any answer and the link given in comments only explains why there's such a keyword in C, inherited from B, and the differences with C++11 or pre-C++11.
I'm posting anyway to focus on the part stating that the auto
keyword is somehow useful in compiler writing, but what is the idea nor the connection with a symbol table?
I really insist on the fact that I ask only about a potential usage when programming a compiler in C (not coding a C compiler).
To clarify, I asked this question because I'd like to know if there's an example of code where auto
can be justified, because the author stated there would be, when writing compilers.
Here the whole point is that I think to have understood auto
(inherited from B, where it was mandatory, but useless in C), but I can't imagine any example when using it is useful (or at least not useless).
It really seems that there isn't any reason at all to use auto
, but is there any old source code or something like that corresponding to the quoted statements?
答案1
得分: 61
Author answer:
我刚刚给Van der Linden先生发了封电子邮件,以下是他的回答:
> 是的,我同意那些在堆栈溢出上回答的人。
我不能确定,因为我从未使用过语言B,但我认为“auto”最终进入C语言可能性很高。
>
> 即使在20世纪80年代我专业从事C语言的内核和编译器编程时,我也从未见过我能回忆起使用“auto”的任何代码。
>
> 关键要点是,auto关键字不添加任何额外的信息,因此是多余且不必要的。将其引入C语言是一个错误!
我还请他解释了一下他所说的编译器编写和符号表的内容。以下是他的回答:
> 假设你正在编写一个编译器,将C源代码转换为链接器对象(可以链接的目标文件)。
>
> 每当编译器的词法分析器(编译器的前端)找到一个字符序列,形成一个用户定义的符号(可能是变量、函数名、常量等),编译器都会将该名称存储在一个称为“符号表”的表中。它还会存储有关该符号的所有其他信息 - 如果它是一个变量,它将存储其类型,如果是一个常量,它将存储其值,如果是一个函数,它将注意到它可以被调用等等。它还会存储名称的作用域(已知此符号的代码行)。符号表是编译器的核心数据结构之一,其中的一些信息会传递到目标文件中。目标文件需要知道任何可由外部代码对象寻址的名称,以便链接器可以将它们与存储它的对象相关联。
>
> 然后,在以后的编译过程中,当编译器再次遇到相同的名称时,编译器会查看符号表,看它是否已经知道该名称的所有信息。关于名称的一项有用的信息之一是“编译器将为其分配存储器的位置”。只要符号在作用域内保持不变,就必须维护该存储器。因此,对于符号表来说,知道在运行时应该在哪里分配存储器是有用的。“auto”关键字告诉编译器:“这是一个变量,你应该将其存储在堆栈上,并且其作用域是声明它的函数内部”。
>
> 但是,编译器不需要被告知这一点,因为对于在函数内部声明的所有变量来说,这已经是真的。
我希望这个解释有意义。
我想我完全误解了他的陈述,认为在用C编写编译器时,“auto”关键字可能有一些用途,用于处理符号表的代码,但似乎他的意思是“auto”是无用的,但C编译器编写者必须处理它并了解它。
尽管如此,我还是请他确认了我的错误,而且这确实是我的误解:
> 或许最好的理解方式是:
> 1. “auto”在C语言中没有语义效果
> 2. 我们认为它可能来自B,但不能确定。
> 3. 它向编写C代码编译器的人传达信息。
> 4. 但是这个信息与编译器编写者已经知道的其他信息重复了。
> 5. 因此,编译器编写者可以注意其中任何一条信息以更新符号表
> 6. 或者确实,他们可以检查这两条信息是否一致,如果不一致,就发出错误消息。
英文:
Author answer:
I just emailed Mr Van der Linden, and here is what he said:
> Yes, I agree with the people who answered on stack overflow.
I don't know for certain, because I never used the language B, but it seems highly plausible to me that "auto" ended up in C because it was in B.
>
> Even when I was professionally kernel and compiler programming in C in the 1980's, I never saw any code that I can recall that used "auto".
>
> The key takeaway is that the auto keyword doesn't add any extra information, and thus is redundant and unneeded. It was a mistake to bring it into C!
I also asked for some explanation about what he meant by speaking about compiler writing and symbol table. Here is his response:
> Say you are writing a compiler that will translate C source code into linker objects (object files that can be linked).
>
> Whenever your lexer (front end of the compiler) finds a sequence of characters that form a user-defined symbol (might be a variable, might be a function name, might be a constant, etc), the compiler will store that name in a table called the "symbol table". It will also store everything else it knows about the symbol - if it is a variable, it will store its type, if a constant it will store the value, if a function it will note that it can be invoked, etc etc. It will also store the scope of the name (the lines of code in which this symbol is known). The symbol table is one of the core data structures of a compiler, and some of it is carried forward into the object file. The object file needs to know any names that are to be addressable by external code objects, so the linker can associate them the use of a name with the object in which it is stored.
>
> Then later, when the compiler comes across the same name, the compiler looks in the symbol table to see if it knows all about the name already. One of the useful items to store about a name is "where the compiler will allocate storage for it". That storage has to be maintained as long as the symbol remains in scope. So it is useful for the symbol table to know where it should allocate the storage at runtime. I gave 3 examples of different places where a variable might be stored. The "auto" keyword tells the compiler "this is a variable, and you should store this on the stack and its scope is the function it is declared in".
>
> Only, the compiler doesn't need to be told this, because this is already true for all variables declared within a function.
I hope this explanation makes sense.
I guess I completely misunderstood his statements by thinking that auto
may have some usages when writing a compiler in C, in the code dealing with symbol table, but it seems that he meant auto
is useless, but C compiler writers must handle it and understand it.
I nevertheless asked him to confirm my mistake, and it was indeed a misunderstanding of mine :
> Perhaps the best way to think about this is:
> 1. "auto" has no semantic effect in C
> 2. we think it came over from B, but don't know for sure.
> 3. It conveys info to someone writing a compiler for C code.
> 4. But that info is a duplicate of other info that the compile writer has.
> 5. So a compiler writer can take note of either piece of info to update the symbol table
> 6. Or indeed, they can check that the two pieces of info are consistent, and if not, issue an error message.
答案2
得分: 39
根据我对 C 编程的 40 多年经验,包括编译器工作,auto
关键字在 C 中已经无用了 50 年。
回答你的精确问题,为什么对于 C 编译器编写者来说 auto
关键字有用? 实际上它一点用也没有;C 编译器编写者只需将其解析为关键字并按存储类别说明符实现其语义。
它似乎是来自于 C 语言的前身 B,由贝尔实验室的 Ken Thompson 和 Dennis Ritchie 在六七十年代初开发。我从未使用过 B,我怀疑我在 1984 年在 Inria 见到的 Peter 也没有用过。
在 C23 之前,auto
只能用于指定函数范围内定义的自动存储类别。这是默认值,因此 auto
是完全多余的,只要指定了类型或其他限定符,就可以将其移除。从未存在需要它的情况,所以它在 C 标准中的包含只是源自 C 语言早期历史。
自 C++11 起,在 C++ 中就已经使用了 auto
来实现变量定义的类型推导,无论是使用自动存储还是不使用,编译器都会从初始化程序中检测出类型。
随着当前趋势推动 C 和 C++ 语言收敛到一个共同的子集,C23 为该关键字附加了模拟 C++ 语义但更受限制的新语义:
> 6.7.1 存储类别说明符
>
> auto
可以与除 typedef
之外的所有其他说明符一起使用;
>
> auto
只能出现在具有文件作用域的标识符的声明说明符中,或者与其他存储类别说明符一起出现,如果类型将从初始化程序中推断出,则使用它。
>
> 如果 auto
与另一个存储类别说明符一起出现,或者它在文件作用域的声明中出现,则在确定存储持续期或链接性时将忽略它。然后它只表示所声明的类型可以被推断。
类型推导被规定为:
> 6.7.9 类型推导
>
> 约束
>
> 1. 对于推断类型的声明,应包含存储类别说明符 auto
。
>
> 描述
>
> 2. 对于这样一个对象的定义,初始化声明符必须采用以下形式之一
>
> 直接声明符 = 赋值表达式<br>
> 直接声明符 = { 赋值表达式 }<br>
> 直接声明符 = { 赋值表达式 , }<br>
>
> 声明的类型是赋值表达式的类型,在 lvalue、数组到指针或函数到指针转换之后,另外由声明说明符中出现的限定符额外限定,并在有的情况下由属性修改。如果直接声明符不是形如 标识符 属性说明符序列<sub>opt</sub> 的形式,可能包含在平衡的括号对中,其行为是未定义的。
在 C++ 中类型可以非常复杂,几乎无法在变量定义中指定,因此类型推导非常有用。相反,在 C 中使用它可能会产生反效果,降低代码的可读性,鼓励懒惰和容易出错的做法。已经够糟糕了,通过 typedef 在指针后面隐藏了它们,现在你可以完全用 auto
关键字隐藏它们。
最后以一个不那么严肃的例子结束,我记得在一些复杂的面试测试中看到过它的使用,候选人被要求找出为什么这段代码无法编译:
#include <stdio.h>
#include <string.h>
int main(void) {
char word[80];
int auto = 0;
while (scanf("%79s", word) == 1) {
if (!strcmp(word, "car")
|| !strcmp(word, "auto")
|| !strcmp(word, "automobile"))
auto++;
}
printf("cars: %d\n", auto);
return 0;
}
英文:
As far as I can tell from 40+ years of C programming, including compiler work, the auto
keyword has been completely useless in C for 50 years.
To answer your precise question, Why is auto
keyword useful for compiler-writers in C? It isn't useful at all; C compiler writers are just required to parse it as a keyword and implement its semantics as a storage class specifier.
It seems to be a left over from B, the predecessor to the C language, developed by Ken Thompson and Dennis Ritchie at Bell Labs in the late sixties and early seventies. I have never used B and I doubt Peter, whom I met in 1984 at Inria, has either.
Before C23, auto
can only be used to specify automatic storage class for definitions in the scope of a function. This is the default, so auto
is fully redundant and as long as the type or another qualifier is specified, auto
can be removed. There isn't any case where it was needed, so its inclusion in the C Standard is only rooted in the early history of the C language.
auto
has been used in C++ since C++11 to enable type inference in variable definitions, with or without automatic storage, where the compiler detects the type from that of the initializer.
With the current trend pushing for convergence on a common subset for the C and C++ languages, new semantics have been attached to this keyword in C23 modelled after the C++ semantics, but more restricted:
> 6.7.1 Storage-class specifiers
>
> auto
may appear with all the others except typedef
;
>
> auto
shall only appear in the declaration specifiers of an identifier with file scope or along with other storage class specifiers if the type is to be inferred from an initializer.
>
> If auto
appears with another storage-class specifier, or if it appears in a declaration at file scope, it is ignored for the purposes of determining a storage duration of linkage. It then only indicates that the declared type may be inferred.
Type inference is specified as:
> 6.7.9 Type inference
>
> Constraints
>
>1 A declaration for which the type is inferred shall contain the storage-class specifier auto
.
>
> Description
>
> 2 For such a declaration that is the definition of an object the init-declarator shall have one of the forms
>
> direct-declarator = assignment-expression<br>
> direct-declarator = { assignment-expression }<br>
> direct-declarator = { assignment-expression , }<br>
>
> The declared type is the type of the assignment expression after lvalue, array to pointer or function to pointer conversion, additionally qualified by qualifiers and amended by attributes as they appear in the declaration specifiers, if any. If the direct declarator is not of the form identifier attribute-specifier-sequence<sub>opt</sub>, possibly enclosed in balanced pairs of parentheses, the behavior is undefined.
Type inference is very useful in C++ because types can be very complex and almost impossible to specify in variable definitions, especially with templates. Conversely, using it in C is probably counter productive, lessening code readability and encouraging laziness and error prone practices. It was already bad enough to hide pointers behind typedefs, now you can hide them completely with the auto
keyword.
To finish on a less serious note, I remember seeing it used in tricky interview tests, where the candidate is asked to find why this code does not compile:
#include <stdio.h>
#include <string.h>
int main(void) {
char word[80];
int auto = 0;
while (scanf("%79s", word) == 1) {
if (!strcmp(word, "car")
|| !strcmp(word, "auto")
|| !strcmp(word, "automobile"))
auto++;
}
printf("cars: %d\n", auto);
return 0;
}
答案3
得分: 19
auto
关键字源自 B 语言,在那里它实际上非常有用,并允许编译器区分本地名称和非本地名称(用 extrn
关键字标记):
main()
{
extrn printf;
auto x;
x = 25;
printf('%d', x);
}
当 B 语言演变为 C 语言时,它保留了高度的向后兼容性。在 B 语言中,基本上只有一种 "cell" 类型,所以在 C 语言中引入了类型注解作为可选特性。在 C89 及之前版本中,auto
用于引入本地名称:
main()
{
extern printf();
auto x; /* 默认情况下类型为 int */
x = 42;
printf("%d", x);
}
在语言焦点转向强制类型安全之后,auto
修饰符的需求完全消失,因为类型注解的存在允许区分本地名称声明。
英文:
The auto
keyword originates from the B language, where it was actually very useful, and allowed compiler to distinguish local names from non-local names (marked with extrn
keyword):
main()
{
extrn printf;
auto x;
x = 25;
printf('%d', x);
}
When the B language evolved into C, it preserved a high degree of backward compatibility. In B there was basically only a single "cell" type, so in C they've introduced type annotations as an optional feature. In C89 and prior, auto
had been used for the same purpose of introducing local names:
main()
{
extern printf();
auto x; /* type is int by default */
x = 42;
printf("%d", x);
}
After language focus shifted towards enforcing type safety, the need for the auto
specifier evaporated completely, since presence of type annotation allowed to distinguish local name declarations.
答案4
得分: 12
首先,auto
是 4 或 5 个存储类别说明符之一:auto
、register
、static
、extern
,以及从 C11 开始的 _Thread_local
。在 C 中,每个变量都有一个与之关联的存储类别说明符,如果未指定,则默认为 auto
。
从用户的角度来看,由于 auto
是默认值,很少需要明确指定它,可以说这样做只会产生噪音 - 如果通常不使用说明符,那么其他说明符会更加突出。
然而,从编译器编写者的角度来看,由于每个变量都有一个存储类别说明符,auto
的概念至关重要。可以想象,在编译器中存在一个枚举,列举了 4(或 5)种不同的说明符,并且每个变量声明都附有枚举值之一。
它出现在编译器中并不要求它出现在语言中,但它确实提供了一个论点:规律性。无论是否直接暴露出来,这个概念都存在,而且暴露它的成本很小,所以为什么不呢?
@BenVoigt 提到它可能在宏中有用,因为类型是由用户提供的,这可以防止用户指定另一个存储说明符,比如 static
,因为编译器不会接受两个存储说明符。
英文:
First of all auto
is one of 4 or 5 Storage-class specifiers: auto
, register
, static
, extern
, and from C11 on _Thread_local
. Every variable in C has one associated storage-class specifier from the above list, with auto
being the default if not specified.
From a user's perspective, due to auto
being the default, it is rarely<sup>1</sup> necessary to specify it, and arguably doing so is just noise -- the other specifiers stand out more if no specifier is generally used.
From a compiler writer's perspective, however, since every variable has a storage-class specifier, the concept of auto
is paramount, and putting yourself in their shoes, you can imagine that somewhere exists an enum
enumerating the 4 (or 5) different specifiers and each variable declaration having one of the enum values attached.
The fact that it appears in the compiler does not require that it appears in the language, but it does provide an argument for it: regularity. The concept exists regardless of whether it's directly exposed (or not) and there is little cost in exposing it, so might as well, no?
<sup>1</sup> @BenVoigt mentioned that it may be useful in macros, where the type is user-provided, as it prevents the user from specifying another storage specifier such as static
, since the compiler will not accept two storage specifiers.
答案5
得分: 0
在C语言中,auto关键字对大多数程序员来说并不是非常有用。然而,它对编译器编写者可能是有用的。
符号表是编译器用来跟踪程序中所有变量和函数的数据结构。当编译器遇到auto声明时,它知道该变量将被分配在堆栈上。这意味着编译器可以优化该变量的代码,例如避免将其存储在寄存器中。
例如,考虑以下函数:
void soso(int x) {
int y = x * 2;
// 如果编译器知道y被分配在堆栈上,它可以优化这段代码。
int z = y + 3;
}
如果编译器知道y被分配在堆栈上,它可以避免将y存储在寄存器中。这将节省内存并提高函数的性能。
当然,auto关键字并不总是必要的,以改善由编译器生成的代码的性能。然而,它对于希望优化其代码的编译器编写者可能是一个有用的工具。
以下是关于auto关键字的一些额外细节:
- 在C语言中,auto关键字不是必需的。编译器会自动假定在函数内部声明的任何变量都被分配在堆栈上。
- auto关键字可以用来在函数外部声明变量。然而,不建议这样做,因为它可能会使代码变得更难阅读和理解。
- 并非所有的C编译器都支持auto关键字。一些编译器可能只在特定情况下支持它。
英文:
The auto keyword in C is not very useful to most programmers. However, it can be useful to compiler writers.
The symbol table is a data structure that the compiler uses to keep track of all the variables and functions in a program. When the compiler sees an auto declaration, it knows that the variable will be allocated on the stack. This means that the compiler can optimize the code for that variable, such as by avoiding storing it in a register.
For example, consider the following function:
void soso(int x) {
int y = x * 2;
// The compiler could optimize this code if it knew that y was allocated on the stack.
int z = y + 3;
}
If the compiler knew that y was allocated on the stack, it could avoid storing y in a register. This would save memory and improve the performance of the function.
Of course, the auto keyword is not always necessary to improve the performance of compiler-generated code. However, it can be a useful tool for compiler writers who want to optimize their code.
Here are some additional details about the auto keyword:
The auto keyword is not necessary in C. The compiler will automatically assume that any variable declared inside a function is allocated on the stack.
The auto keyword can be used to declare variables outside of functions. However, this is not recommended, as it can make the code more difficult to read and understand.
The auto keyword is not available in all C compilers. Some compilers may only support it in certain situations.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论