英文:
Why are string literals stored in different places depending on declaration as `char *` or `char[]`?
问题
我遇到了这个简单的问题,感到非常尴尬,因为出现了段错误:
#include <stdio.h>
int main()
{
char *test = "this is a string";
test[0] = 'q';
printf(test);
return 0;
}
但这个没有问题:
#include <stdio.h>
int main()
{
char test[] = "this is a string";
test[0] = 'q';
printf(test);
return 0;
}
在查看汇编代码后,我注意到在第一个情况中,字面值 "this is a string"
被声明为 .rodata
,这解释了段错误的原因。但在第二种情况下,字符串根本没有出现在汇编中,所以我认为它通过 .data 部分链接为可写。为什么会有这种行为差异?这是否显而易见,而我却愚蠢地没有理解?
英文:
I had an embarrassing struggle with this simple thing, as this was segfaulting:
#include <stdio.h>
int main()
{
char *test = "this is a string";
test[0] = 'q';
printf(test);
return 0;
}
but this was not:
#include <stdio.h>
int main()
{
char test[] = "this is a string";
test[0] = 'q';
printf(test);
return 0;
}
After looking at the assembly I noticed that in the first case, the literal "this is a string"
was declared in .rodata
, so that explains the segfault. But in the second case, the string wasn't in the assembly at all, so I assume it was being linked via the .data section as writable. Why this difference in behavior? Is this obvious and I'm being stupid?
答案1
得分: 5
这是明显的吗?它应该是因为这是 C 语言的一个重要特性:即使出于历史原因它没有声明为 const,字符串字面值也是不可变的。让我们仔细看一下:
char *test = "this is a string";
这个(错误地)声明了一个非 const 指向 const 字符数组的指针。从那时起,通过该指针更改字符会显式地引发未定义行为。在这里你得到了一个段错误,但编译器可以自由地忽略这个尝试,退出程序而不显示任何消息,甚至是<在此输入你最糟糕的噩梦>...
一个体面的编译器应该已经警告了你(警告不应该被忽视...)正确的语法是 const char *test = "this is a string";
。
char test[] = "this is a string";
这个(正确地)声明了一个字符数组,并通过字符串字面值的副本对其内容进行了初始化。语言足够友好,允许你在提供了初始化字面值字符串的情况下给出一个空的大小,并使用初始化程序的大小作为数组的大小。从那时起,更改数组的字符就是合法操作。
你应该记住的事情:
- 字符串字面值是 const 的。
- 数组和指针是不同的东西。数组保存一些数据,而指针只是指向某些其他地方已经存在的数据。简单地说,当你将数组用作值时,它会“衰变”为指针(这里的正确术语是“右值”...)。
英文:
Is this obvious? It should be because it is an important characteristic of C language: even if for historic reasons it is not declared as const, a string literal is not mutable. Let us look carefully:
char *test = "this is a string";
This (incorrectly) declares a non const pointer to a const character array. From that point, changing a character through that pointer explicitely invokes Undefined Behaviour. Here you got a segfault, but a compiler is free to ignore the attempt, exit the program without any message, or even <type your worst nightmare here>...
A decent compiler should have warned you of that (warnings are not to be ignored...) and the correct syntax is const char *test = "this is a string";
.
char test[] = "this is a string";
This (correctly) declares a character array, and initializes its content by a copy of the string literal. The language is kind enough to allow you to give an empty size if you provide an initializer literal string and use the size of the initializer for the size of the array. From that point changing a character of the array is a legal operation.
What you should remember:
- string literals are const
- arrays and pointers are different animals. An array holds some data while a pointer just point to some data existing elsewhere. Simply an array will decay to a pointer when you use it as a value (the correct wording is rvalue here...).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论