2023年3月1日 15:13:31go评论93阅读模式

英文:

Why are strings considered tokens in C while arrays aren't?

问题

为什么字符串被视为一个标记，而数组不被视为一个标记？

英文:

This is a quite basic theoritical question. I started learning the C language. I came across the topic Tokens in C.

Quoting from geeksforgeeks.org,

> A token is the smallest element of a program that is meaningful to the compiler.Tokens can be classified as follows:
>
> 1. Keywords
> 2. Identifiers
> 3. Constants
> 4. Strings
> 5. Special Symbols
> 6. Operators

Why strings are considered as a token while arrays aren't?

答案1

得分: 4

Geeksforgeeks is almost as bad a source for learning as ChatGPT.

It is true that strings in C consists of null-terminated character arrays. But what it means to say is string literals, "these things". That is, a constant used to initialize character arrays or to use as a read-only string.

Similarly, "constants" does not refer to things like const int x=1; but rather just the number part 1 - this is what formal C means when it refers to an integer constant (sometimes also called "integer literal" although that term is strictly speaking not correct).

Note that tokens is mostly a concept that matters when writing macros, it's not a concept that beginners usually have to worry about. The formal grammar (C17 6.4), "lexical elements", groups everything in C in these groups/sub-chapters:

Keywords
Identifiers
Universal character names
Constants
String literals
Punctuators
Header names
Preprocessing numbers
Comments

英文:

Geeksforgeeks is almost as bad a source for learning as ChatGPT.

Keywords
Identifiers
Universal character names
Constants
String literals
Punctuators
Header names
Preprocessing numbers
Comments

答案2

得分: 4

一个标记是一个不可分割的解析单元。

; 是一个标记。
+ 是一个标记。
== 是一个标记。
十进制数字文字 4 是一个标记。
十进制数字文字 12 是一个标记。
字符串文字 "abc" 是一个标记。
标识符 foo 是一个标记。
标识符 int 是一个标记。

然而，

字符串不是标记，因为字符串不是代码片段。 (但请参阅上面的字符串文字)。
数组不是标记，因为数组不是代码片段。
数组声明 (例如 int a[4];) 不是标记，因为它们由多个其他标记组成。
数组初始化程序 (例如 { 4, 5, i+2 }) 不是标记，因为它们由多个其他标记组成。

通常可以在标记之间放置空格，但不能在标记内部放置空格。

12 不同于 1 2
"abc" 不同于 "a b c"。
foo 不同于 f o o。
i+2 与 i + 2 相同。
{4,5,i+2} 与 { 4, 5, i + 2 } 相同。

英文:

A token is an indivisible parsing unit.

; is a token.
+ is a token.
== is a token.
Decimal numeric literal 4 is a token.
Decimal numeric literal 12 is a token.
String literal "abc" is a token.
Identifier foo is a token.
Identifier int is a token.

However,

Strings aren't tokens because strings aren't pieces of code. (But see string literals above.)
Arrays aren't tokens because array aren't pieces of code.
Array declarations (e.g. int a[4];) aren't tokens because they are made of multiple other tokens.
Array initializers (e.g. { 4, 5, i+2 }) aren't tokens because they are made of multiple other tokens.

You can generally put spaces between tokens, but never within.

12 is not the same as 1 2
"abc" is not the same as "a b c".
foo is not the same as f o o.
i+2 is the same as i + 2.
{4,5,i+2} is the same as { 4, 5, i + 2 }.

答案3

得分: 3

当编译器处理源代码时，它首先将其分割成标记。示例：

printf(&quot;%d&quot;, 4 &lt;&lt; 2);

这会被转化为以下标记：

printf
(
"%d" —— 一个字符串字面值
,
4
<<
2
)
;

像int a[] = {1, 2, 3};这样的数组声明由多个标记组成，因此它本身不是一个标记。这里的a是一个标记，但它更一般地是一个标识符（"变量名"）。

关于printf()的侧记：该函数本身也会将其作为第一个参数接收的字符串进行一种标记化处理。唯一的区别是字符是否是%占位符，因此它要简单得多。但原理仍然保持不变。

英文:

When the compiler processes source code, it first splits them into tokens. Example:

printf(&quot;%d&quot;, 4 &lt;&lt; 2);

This is turned into the following tokens:

printf
(
"%d" -- a string literal
,
4
<<
2
)
;

An array declaration like int a[] = {1, 2, 3}; consists of multiple tokens, therefore it's not a token itself. The a here is a token though, but it's not specifically an array-token but more generally an identifier ("variable name").

Side note on printf(): That function itself will also kind-of tokenize the string it receives as first argument. The only distinction is whether a character is a % placeholder or not, so it's a much simpler. The principle stays the same though.

答案4

得分: 0

最好直接查看源文件。C语言的语法在C标准（C17）的附录A中被半正式地描述。第一段（A.1.1）指出：

  token:
      keyword（关键字）
      identifier（标识符）
      constant（常数）
      string-literal（字符串文字）
      punctuator（标点符号）

注意，“string-literal”（字符串文字）被特别提及为一个标记。

至于为什么：C语言是从自然语言的相同层次构建而成的：字母、单词和句子。当编译器读取程序文件时，它会读取文件的字符，并将它们分组为标记，就像我们在阅读书籍时将字母分组为单词一样。然后，它以与我们解释文本为一系列单词一样的方式解释程序为一系列标记。

将字符串文字作为标记只是C语言的设计者在描述语言的语法和语义时做出的合理决策。

英文:

It is often better to go directly to the source. The C language syntax is described semi-formally in the C Standard (C17), Annex A. The first paragraph (A.1.1) states:

  token:
      keyword
      identifier
      constant
      string-literal
      punctuator

Notice that "string-literal" is specially mentioned as a token.

As to why: the C language is built up from the same layers as a natural language: letters, words and sentences. When the compiler reads a program file, it reads the characters of the file and groups them into tokens the same way we group letters into words when reading a book. It then interprets the program as a sequence of tokens the same way as we interpret a text as a sequence of words.

A string-literal being a token is simply a decision taken by the designers of the C language because it makes sense when describing the syntax and semantics of the language.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在C中字符串被视为标记，而数组不被视为标记？

问题

答案1

答案2

答案3

答案4

Golang在拆分字符串时出现非法的rune字面错误。

How can I print a comma separated list on 1 line with the last number ending without a comma/space/new line using a FOR loop?

如何验证一个字段是空的或者是一个数字？

如何从整个字符串中提取特定的字符集并保存到数组或列表？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。