英文:
Why do we need a null terminator only in strings in C?
问题
I'm taking CS50X. I'm on week 2 now. my question is: why do we need a null character '\0' in strings (aka null terminated char arrays) to mark its end, while we don't need it in normal char arrays or a non-string data type array such as an int array , like while printing both arrays (the null terminated char array and the int array for example) what does mark the end of the second array?
在字符串中为什么需要一个空字符 '\0' 来标记其结尾,而在普通字符数组或非字符串数据类型数组(如整数数组)中不需要呢?比如在打印这两种数组时(空字符结尾的字符数组和整数数组),第二个数组是如何标记结尾的?
I tried to demonstrate how strings are implemented for myself with some code:
我尝试用一些代码来演示字符串是如何实现的:
this code worked printing "hi!" in the terminal
this also worked printing the three scores
Why in the first code did we need an additional place in the array for the null character? Couldn't we have used i < 3
instead as we did in the second code? A character array, like any other array, has a specific length, so what changed when we decided to treat string as a character array?
为什么在第一段代码中,我们需要数组中的额外位置来存储空字符?我们不能像在第二段代码中一样使用 i < 3
吗?字符数组和其他数组一样,具有特定的长度,那么当我们决定将字符串视为字符数组时,发生了什么变化?
英文:
I'm taking CS50X. I'm on week 2 now. my question is: why do we need a null character '\0' in strings (aka null terminated char arrays) to mark its end, while we don't need it in normal char arrays or a non-string data type array such as an int array , like while printing both arrays (the null terminated char array and the int array for example) what does mark the end of the second array?
I tried to demonstrate how strings are implemented for myself with some code:
this code worked printing "hi!" in the terminal
this also worked printing the three scores
Why in the first code did we need an additional place in the array for the null character? Couldn't we have used i < 3
instead as we did in the second code? A character array, like any other array, has a specific length, so what changed when we decided to treat string as a character array?
答案1
得分: 4
事实上,你不需要使用空字符终止符。它们只是C库选择用来表示字符串结尾的约定。
对于某些情况,这是一个糟糕的选择。一个例子是当字符串可能包含空字符时。另一个例子是当需要经常计算字符串长度时;唯一的方法是遍历整个(可能非常长的)字符串。
一个没有这些问题的方法是将字符串表示为一个字符数组(不以空字符结尾),并与之配对一个显式长度:
typedef struct string_s {
char *text;
size_t len;
} STRING;
实际上,你会发现用C编写的系统采用了这种方法。
缺点是它们无法使用标准库进行字符串连接、输入/输出等操作。此外,size_t
的大小可以高达8个字节,而终止空字符只有一个字节。当C语言被发明时,这个差异是一个相当大的问题。在某些应用程序中(如非常小的嵌入式处理器),这个差异仍然很重要。
英文:
The truth is that you don't need null terminators. They're just the convention that the C library chose to represent the end of the string.
For some purposes, it's a terrible choice. An example: when strings might contain nulls. Another: when string length must be computed often; the only way is to traverse the whole (potentially very long) string.
A method without these problems would be to represent a string as a char array (not null terminated) and an explicit length paired with it:
typedef struct string_s {
char *text;
size_t len;
} STRING;
And in fact you'll find systems written in C that do this.
The down side is that they can't use standard libraries for concatenation, i/o, etc. They need to supply their own. Also, size_t
is up to 8 bytes while a terminating null is only one. When C was invented, that difference was a fairly big deal. In some applications (like very small embedded processors), it still is.
答案2
得分: 3
为了表示长度。
当使用函数处理字符串或数组时,函数不能直接接收字符串或数组,而是会接收到指向字符串或数组的指针。这将是指向字符串或数组第一个字符的指针。
那么函数如何知道字符串或数组的长度呢?
对于字符串,函数通过检查数据并检测到空字符时知道字符串的结束。不需要向函数传递额外的参数。
foo_string(指向字符串开头的指针);
对于数组,调用者需要在指针之外向函数发送数组的元素计数(以任何指定的顺序)。函数无法使用数组的数据来确定结束位置,因为没有保留任何值来指示“结束”。
foo_array(数组的元素计数, 指向数组开头的指针);
如果发送两个参数是可以接受的,那么使用数组和大小。否则,对于文本,使用一个参数来表示字符串。
对于文本,自上世纪70年代以来,字符串是常用的方法。
如何返回值以指示字符串或数组是下一个关注的问题,在这里尚未讨论。
英文:
> Why do we need a null terminator
To indicate the length.
When using functions on 1) a string or 2) an array, the function cannot receive the string or the array. It can receive a pointer to the string or the array. It will be a pointer to the first character of the string or array.
Now how does the function know now long the string or array is?
With strings, the function knows the length by inspecting the data and when it detects a null character, it knows that is the end of the string. No additional parameter was needed to be sent to the function.
foo_string(pointer_to_string_beginning);
With arrays, the caller needs to send the element count of the array to the function in addition to the pointer (in either prescribed order). The function can not use the data of the array to know the end as no value is reserved to indicate the "end".
foo_array(element_count_of_the_array, pointer_to_array_beginning);
If sending 2 parameters is OK, use arrays and size. Else for text, use 1 parameter for a string.
For text, strings are the common approach used since the 1970s.
How to return values to indicate a a string or array in the next concern, not yet addressed here.
答案3
得分: 0
短答案:为了能够在较大的数组中存储短字符串。
解释:
假设您已经(以某种方式)分配了一个能够容纳M个字符的内存区域,并且您想要将一个字符串存储到该内存中。
如果字符串恰好有M个字符,您可以这样打印它:
for (i = 0; i < M; ++i) putchar(str[i]);
原则上这没有问题...您知道M的值,因为它与内存区域的大小有关(注意:这只在某些情况下成立,但现在让我们假设如此)。
但是如果您想要在该内存中存储并稍后打印一个具有N(N < M)个字符的字符串呢?
在打印时,您可以这样做:
for (i = 0; i < N; ++i) putchar(str[i]);
但是从哪里获取值N呢?
有时N为5(例如字符串“Hello”),有时N为13(例如字符串“stackoverflow”),等等。
一种解决方案是将N保存在一个单独的变量中,您在更改字符串时更新该变量。
另一种解决方案是使用特殊值来指示“字符串结束”,并将该特殊值作为字符串的一部分存储。
这两种解决方案都有利弊。
C语言的设计者决定采用第二种解决方案。因此,我们必须始终确保在处理C中的字符串时包括这个特殊值(NUL)。
现在,打印可以这样写:
for (i = 0; str[i] != 'for (i = 0; str[i] != '\0'; ++i) putchar(str[i]);
'; ++i) putchar(str[i]);
无论字符串的长度如何,这都将起作用。
顺便说一下:
有趣的阅读:https://stackoverflow.com/a/1258577/4386427
英文:
Short answer: To be able to store short strings in a bigger array.
Explanation:
Assume you have (one way or another) allocated a memory area capable of holding M characters and you want to store a string into that memory.
If the string has exactly M characters you can print it like:
for (i = 0; i < M; ++i) putchar(str[i]);
In principle it's not problem... You know the value M from the size of the memory area (note: this is only true in some cases but for now let's assume that).
But what if you want to store and later print a string with N (N < M) characters in that memory?
When printing it, you could of cause do:
for (i = 0; i < N; ++i) putchar(str[i]);
But from where do you get the value N?
Sometimes N is 5 (e.g. the string "Hello"), sometimes N is 13 (e.g. the string "stackoverflow"), and so on.
One solution would be to keep N in a seperate variable that you update whenever you change the string.
Another solution would be to use a sentinel value to indicate "End of string" and store that special value as part of the string.
There are pros and cons of both solutions.
The designers of C decided to go with the second solutions. So consequently we must always make sure to include the sentinel (the NUL) when dealing with strings in C.
The print can now be written:
for (i = 0; str[i] != 'for (i = 0; str[i] != '\0'; ++i) putchar(str[i]);
'; ++i) putchar(str[i]);
and it will work no matter what length the string has.
BTW:
Interresting read: https://stackoverflow.com/a/1258577/4386427
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论