英文:
Appending Characters to an Empty String in C
问题
我对C语言相对较新,所以任何帮助理解正在发生的事情都将非常棒!
我有一个名为Token
的结构体,定义如下:
//Token结构体
struct Token {
char type[16];
char value[1024];
};
我试图从文件中读取字符,并将从文件中读取的字符附加到Token.value
中,如下所示:
struct Token newToken;
char ch;
ch = fgetc(file);
strncat(newToken.value, &ch, 1);
这是有效的!
我的问题是,Token.value
以我不理解的几个值开头,然后是我附加的字符。当我将newToken.value
的结果打印到控制台时,我得到@�����TheCharactersIWantedToAppend
。如果可能的话,我可以想办法事后删除或绕过这些字符,但如果不必要,我宁愿不这样做。
在分析�
字符时,我发现它们依次为索引1到5的字符:\330, \377, \377, \377, \177
。我了解\377
在C中是EOF
的特殊字符,但在十进制中也是255。这些值是否构成内存地址?通过在strncat
中使用&ch
将地址添加到newToken.value
中吗?如果是这样,如何防止它们进入newToken.value
?
注意: 如果我使用strncat(newToken.value, ch, 1)
而不是strncat(newToken.value, &ch, 1)
(ch与&ch),我会得到分段错误。
英文:
I'm relatively new to C, so any help understanding what's going on would be awesome!!!
I have a struct called Token
that is as follows:
//Token struct
struct Token {
char type[16];
char value[1024];
};
I am trying to read from a file and append characters read from the file into Token.value
like so:
struct Token newToken;
char ch;
ch = fgetc(file);
strncat(newToken.value, &ch, 1);
THIS WORKS!
My problem is that Token.value
begins with several values I don't understand, preceding the characters that I appended. When I print the result of newToken.value
to the console, I get @�����TheCharactersIWantedToAppend
. I could probably figure out a band-aid solution to retroactively remove or work around these characters, but I'd rather not if I don't have to.
In analyzing the � characters, I see them as (in order from index 1-5): \330, \377, \377, \377, \177
. I read that \377
is a special character for EOF
in C, but also 255 in decimal? Do these values make up a memory address? Am I adding the address to newToken.value
by using &ch
in strncat
? If so, how can I keep them from getting into newToken.value
?
Note: I get a segmentation fault if I use strncat(newToken.value, ch, 1)
instead of strncat(newToken.value, &ch, 1)
(ch vs. &ch).
答案1
得分: 2
以下是您要翻译的部分:
I'll try to consolidate the answers already given in the comments.
This version of the code uses `strncat()`, as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to `strncat()` must be a string, and not a pointer to a single char) (Yes, a "string" is actually a `char[]` and the value passed to the function is a `char*`; but it must point to an array *of at least two chars*, the last one containing a `'I'll try to consolidate the answers already given in the comments.
This version of the code uses `strncat()`, as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to `strncat()` must be a string, and not a pointer to a single char) (Yes, a "string" is actually a `char[]` and the value passed to the function is a `char*`; but it must point to an array *of at least two chars*, the last one containing a `'\0'`.)
Please be aware that `strncat()`, `strncpy()` and all related functions are tricky. They don't write more than N chars. But `strncpy()` only adds the final `'\0'` to the target string when the source has ***less*** than N chars; and `strncat()` ***always*** adds it, even if it the source has exactly N chars or more (edited; thanks, @Clifford).
'`.)
Please be aware that `strncat()`, `strncpy()` and all related functions are tricky. They don't write more than N chars. But `strncpy()` only adds the final `'I'll try to consolidate the answers already given in the comments.
This version of the code uses `strncat()`, as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to `strncat()` must be a string, and not a pointer to a single char) (Yes, a "string" is actually a `char[]` and the value passed to the function is a `char*`; but it must point to an array *of at least two chars*, the last one containing a `'\0'`.)
Please be aware that `strncat()`, `strncpy()` and all related functions are tricky. They don't write more than N chars. But `strncpy()` only adds the final `'\0'` to the target string when the source has ***less*** than N chars; and `strncat()` ***always*** adds it, even if it the source has exactly N chars or more (edited; thanks, @Clifford).
'` to the target string when the source has ***less*** than N chars; and `strncat()` ***always*** adds it, even if it the source has exactly N chars or more (edited; thanks, @Clifford).
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
', but this syntax fills the entire array with '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
This other version uses an `index` variable and writes each singe char directly into the "current" position of the target string, without using `strncat()`. I think is simpler and more secure, because it doesn't mix the confusing semantics of single chars and strings.
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
}
Edited: fgetc()
returns an int
and we should check for EOF before casting it to a char
(thanks, @chqrlie).
<details>
<summary>英文:</summary>
I'll try to consolidate the answers already given in the comments.
This version of the code uses `strncat()`, as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to `strncat()` must be a string, and not a pointer to a single char) (Yes, a "string" is actually a `char[]` and the value passed to the function is a `char*`; but it must point to an array *of at least two chars*, the last one containing a `'
<details>
<summary>英文:</summary>
I'll try to consolidate the answers already given in the comments.
This version of the code uses `strncat()`, as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to `strncat()` must be a string, and not a pointer to a single char) (Yes, a "string" is actually a `char[]` and the value passed to the function is a `char*`; but it must point to an array *of at least two chars*, the last one containing a `'\0'`.)
Please be aware that `strncat()`, `strncpy()` and all related functions are tricky. They don't write more than N chars. But `strncpy()` only adds the final `'\0'` to the target string when the source has ***less*** than N chars; and `strncat()` ***always*** adds it, even if it the source has exactly N chars or more (edited; thanks, @Clifford).
'`.)
Please be aware that `strncat()`, `strncpy()` and all related functions are tricky. They don't write more than N chars. But `strncpy()` only adds the final `'
<details>
<summary>英文:</summary>
I'll try to consolidate the answers already given in the comments.
This version of the code uses `strncat()`, as yours, but solving the problems noted by Nick (we must initialize the target) and Dúthomhas (the second parameter to `strncat()` must be a string, and not a pointer to a single char) (Yes, a "string" is actually a `char[]` and the value passed to the function is a `char*`; but it must point to an array *of at least two chars*, the last one containing a `'\0'`.)
Please be aware that `strncat()`, `strncpy()` and all related functions are tricky. They don't write more than N chars. But `strncpy()` only adds the final `'\0'` to the target string when the source has ***less*** than N chars; and `strncat()` ***always*** adds it, even if it the source has exactly N chars or more (edited; thanks, @Clifford).
'` to the target string when the source has ***less*** than N chars; and `strncat()` ***always*** adds it, even if it the source has exactly N chars or more (edited; thanks, @Clifford).
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
char type[16];
char value[1024];
};
struct Token newToken;
newToken.value[0] = ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
'; // A ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
', but this syntax fills the entire array with ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty"
int aux;
char source[2] = ""; // A literal "" has a single char with value '\0', but this syntax fills the entire array with '\0's
while ((aux = fgetc(file)) != EOF) {
source[0] = (char)aux;
strncat(newToken.value, source, 1); // This appends AT MOST 1 CHAR (and always adds a final '\0')
}
strncat(newToken.value, "", 1); // As the source string is empty, it just adds a final '\0' (superfluous in this case)
printf(newToken.value);
}
return 0;
' (superfluous in this case)
printf(newToken.value);
}
return 0;
}
This other version uses an `index` variable and writes each singe char directly into the "current" position of the target string, without using `strncat()`. I think is simpler and more secure, because it doesn't mix the confusing semantics of single chars and strings.
#include <stdio.h>
#include <string.h>
int main() {
FILE* file = stdin; // fopen("test.txt", "r");
if (file) {
struct Token {
int index = 0;
char type[16];
char value[1024]; // Max size is 1023 chars + '\0'
};
struct Token newToken;
newToken.value[0] = ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
'; // A ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = ' struct Token newToken;
newToken.value[0] = '\0'; // A '\0' at the first position means "empty". This is not really necessary anymore
int aux;
while ((aux = fgetc(file)) != EOF)
// Index will stop BEFORE 1024-1 (value[1022] will be the last "real" char, leaving space for a final '\0')
if (newToken.index < sizeof newToken.value -1)
newToken.value[newToken.index++] = (char)aux;
newToken.value[newToken.index++] = '\0';
printf(newToken.value);
}
return 0;
';
printf(newToken.value);
}
return 0;
}
**Edited**: `fgetc()` returns an `int` and we should check for EOF *before* casting it to a `char` (thanks, @chqrlie).
</details>
# 答案2
**得分**: 1
以下是翻译的内容:
你正在附加未初始化的字符串,因此可能包含_任何内容_。字符串的结束由NUL(0)字符表示,在你的示例中,在6个字节后恰好有一个,但`value`数组中可能没有任何字符,因此代码存在严重缺陷,会导致不确定的行为。
你需要将`newToken`实例初始化为空字符串。例如:
```C
struct Token newToken = { "", "" };
或者初始化整个结构为零:
struct Token newToken = { 0 };
关键是,C语言不会初始化非静态对象,除非有明确的初始值。
此外,使用strncat()
非常低效,具有取决于目标字符串长度的不确定执行时间(请参考https://www.joelonsoftware.com/2001/12/11/back-to-basics/)。
在这种情况下,你最好维护已添加的字符数,并直接将字符和终止符写入数组。例如:
size_t index;
int ch = 0;
do
{
ch = fgetc(file);
if (ch != EOF)
{
newToken.value[index] = (char)ch;
index++;
newToken.value[index] = 'size_t index;
int ch = 0;
do
{
ch = fgetc(file);
if (ch != EOF)
{
newToken.value[index] = (char)ch;
index++;
newToken.value[index] = '\0';
}
} while (ch != EOF &&
index < sizeof(newToken.value) - 1);
';
}
} while (ch != EOF &&
index < sizeof(newToken.value) - 1);
英文:
You are appending string that is not initialised, so can contain anything. The end I'd a string is indicated by a NUL(0) character, and in your example there happened to be one after 6 bytes, but there need not be any within the value
array, so the code is seriously flawed, and will result in non-deterministic behaviour.
You need to initialise the newToken
instance to empty string. For example:
struct Token newToken = { "", "" } ;
or to zero initialise the whole structure:
struct Token newToken = { 0 } ;
The point is that C does not initialise non-static objects without an explicit initialiser.
Furthermore using strncat()
is very inefficient and has non-deterministic execution time that depends on the length of the destination string (see https://www.joelonsoftware.com/2001/12/11/back-to-basics/).
In this case you would do better to maintain a count of the number of characters added, and write the character and terminator directly to the array. For example:
size_t index ;
int ch = 0 ;
do
{
ch = fgetc(file);
if( ch != EOF )
{
newToken.value[index] = (char)ch ;
index++ ;
newToken.value[index] = '\0' ;
}
} while( ch != EOF &&
index < size of(newToken.value) - 1 ) ;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论