英文:
Invalid uninitialized jump or move memory error while trying to split a char32_t string into tokens manually
问题
你的程序存在内存错误。你可以尝试在动态分配内存之后,确保初始化这些内存。在sp
函数中,你可以使用memset
来初始化分配的内存块,如下所示:
tokens[i] = (char32_t *)malloc(sizeof(char32_t) * (tok_len + 1));
if (tokens[i] == NULL) {
exit(112);
}
memset(tokens[i], 0, (tok_len + 1) * sizeof(char32_t));
这将初始化分配的内存块为零,避免了未初始化的值导致的内存错误。
另外,请确保在程序结束前释放分配的内存,以免发生内存泄漏。在你的驱动代码中,已经释放了内存块,这是正确的做法。
这些更改应该有助于解决Valgrind报告的内存错误。希望这对你有所帮助。
英文:
I am trying to split a char32_t
string into tokens separated by a delimiter. I am not using any strtok or other std library function because, it is gurrented that input string and the delimiter will be mulltibyte unicode string.
Here is the function I have written:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <uchar.h>
#include <wchar.h>
char32_t **sp(char32_t *str, char32_t delim, int *len) {
*len = 1;
char32_t *s = str;
while (*s != U'\0') {
if (*s == delim) {
(*len)++;
}
s++;
}
char32_t **tokens = (char32_t **)malloc((*len) * sizeof(char32_t *));
if (tokens == NULL) {
exit(111);
}
char32_t * p = str;
int i = 0;
while (*p != U'\0') {
int tok_len = 0;
while (p[tok_len] != U'\0' && p[tok_len] != delim) {
tok_len++;
}
tokens[i] = (char32_t *)malloc(sizeof(char32_t) * (tok_len + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy(tokens[i], p, tok_len * sizeof(char32_t));
tokens[i][tok_len] = U'\0';
p += tok_len + 1;
i++;
}
return tokens;
}
And here is the driver code
int main() {
char32_t *str = U"Hello,World,mango,hey,";
char32_t delim = U',';
int len = 0;
char32_t ** tokens = sp(str, delim, &len);
wprintf(L"len -> %d\n", len);
for (int i = 0; i < len; i++) {
if (tokens[i]) {
wprintf(L"[%d] %ls\n" , i , tokens[i]);
}
free(tokens[i]);
}
free(tokens);
}
Here is the output:
len -> 5
[0] Hello
[1] World
[2] mango
[3] hey
[4] (null)
But when I check the program with valgrind it show multiple memory errors
valgrind -s --leak-check=full --track-origins=yes ./x3
==7703== Memcheck, a memory error detector
==7703== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==7703== Using Valgrind-3.20.0 and LibVEX; rerun with -h for copyright info
==7703== Command: ./x3
==7703==
tok -> 5
tok -> 5
tok -> 5
tok -> 3
len -> 5
[0] Hello
[1] World
[2] mango
[3] hey
==7703== Conditional jump or move depends on uninitialised value(s)
==7703== at 0x48FDAF8: __wprintf_buffer (vfprintf-process-arg.c:396)
==7703== by 0x48FF421: __vfwprintf_internal (vfprintf-internal.c:1459)
==7703== by 0x490CFAE: wprintf (wprintf.c:32)
==7703== by 0x1093C9: main (main.c:51)
==7703== Uninitialised value was created by a heap allocation
==7703== at 0x4841888: malloc (vg_replace_malloc.c:393)
==7703== by 0x1091FC: sp (main.c:17)
==7703== by 0x109384: main (main.c:47)
==7703==
[4] (null)
==7703== Conditional jump or move depends on uninitialised value(s)
==7703== at 0x4844225: free (vg_replace_malloc.c:884)
==7703== by 0x1093DA: main (main.c:52)
==7703== Uninitialised value was created by a heap allocation
==7703== at 0x4841888: malloc (vg_replace_malloc.c:393)
==7703== by 0x1091FC: sp (main.c:17)
==7703== by 0x109384: main (main.c:47)
==7703==
==7703==
==7703== HEAP SUMMARY:
==7703== in use at exit: 0 bytes in 0 blocks
==7703== total heap usage: 7 allocs, 7 frees, 5,248 bytes allocated
==7703==
==7703== All heap blocks were freed -- no leaks are possible
==7703==
==7703== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
==7703==
==7703== 1 errors in context 1 of 2:
==7703== Conditional jump or move depends on uninitialised value(s)
==7703== at 0x4844225: free (vg_replace_malloc.c:884)
==7703== by 0x1093DA: main (main.c:52)
==7703== Uninitialised value was created by a heap allocation
==7703== at 0x4841888: malloc (vg_replace_malloc.c:393)
==7703== by 0x1091FC: sp (main.c:17)
==7703== by 0x109384: main (main.c:47)
==7703==
==7703==
==7703== 1 errors in context 2 of 2:
==7703== Conditional jump or move depends on uninitialised value(s)
==7703== at 0x48FDAF8: __wprintf_buffer (vfprintf-process-arg.c:396)
==7703== by 0x48FF421: __vfwprintf_internal (vfprintf-internal.c:1459)
==7703== by 0x490CFAE: wprintf (wprintf.c:32)
==7703== by 0x1093C9: main (main.c:51)
==7703== Uninitialised value was created by a heap allocation
==7703== at 0x4841888: malloc (vg_replace_malloc.c:393)
==7703== by 0x1091FC: sp (main.c:17)
==7703== by 0x109384: main (main.c:47)
==7703==
==7703== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
I am unable to figure out what is the problem. any help will be appreciated
I have also tried with unicode strings the same error also occurs.
答案1
得分: 3
valgrind
出现这些错误是因为您的程序在 main()
函数中的 for
循环的最后一次迭代中访问了未初始化的内存(即在访问 tokens[4]
时,len
的值为 5
):
for (int i = 0; i < len; i++) {
if (tokens[i]) {
wprintf(L"[%d] %ls\n" , i , tokens[i]);
}
free(tokens[i]);
}
malloc
函数分配内存并将其保留为未初始化状态。在 sp()
函数中,当您的程序分配内存时,它是未初始化的:
char32_t **tokens = (char32_t **)malloc((*len) * sizeof(char32_t *));
sp()
函数的 while
循环分配并复制某些值到已分配内存的所有 tokens
数组成员,除了最后一个成员,并将其保留为未初始化状态。在 main()
中,您的程序访问了这个未初始化的成员,因此 valgrind
报告了错误。
为了解决这个问题,在 sp()
函数中,分配完内存给 tokens
后:
要么将 tokens
数组的最后一个指针成员设为 NULL
:
tokens[*len - 1] = NULL; // 这是修复问题所需的最低限度更改
或者将所有指针设为 NULL
:
for (int i = 0; i < *len; ++i) {
tokens[i] = NULL;
}
或者使用 calloc
来分配内存给 tokens
,这将确保所有分配的指针都初始化为 NULL
:
char32_t **tokens = calloc((*len), sizeof(char32_t *));
使用上述任何一种解决方案,valgrind
的输出应该是没有泄漏的。
另外,您的代码中还有一个问题,当输入字符串的最后一个字符不是分隔符字符时,程序会访问超出其长度的输入字符串,导致未定义行为。请查看 sp()
函数的 while
循环中的这个语句:
p += tok_len + 1;
假设输入字符串是 U"Hello,World,mango,hey"
(请注意字符串的最后一个字符不是逗号 ,
)。在迭代输入字符串时,嵌套的 while
循环条件将在 p[tok_len]
等于 U'\0'
时返回 false
,然后下面的语句 p += tok_len + 1;
会使指针 p
指向超出输入字符串的内存。外部的 while
循环条件会尝试解引用 p
,这将导致未定义行为。
将 sp()
函数的 while
循环中的这个语句:
p += tok_len + 1;
替换为:
p += tok_len;
p += (*p != 'p += tok_len;
p += (*p != '\0') ? 1 : 0;
') ? 1 : 0;
这将首先使指针 p
指向当前标记在输入字符串的末尾后面一个字符,如果该字符不是空终止字符,将添加 1 到指针 p
,否则不添加。
while
循环体可以以更好的方式实现,并可以处理其他场景,比如处理输入字符串中的空格或处理只包含分隔符的输入字符串等。我将这些改进留给您来完成。
编辑:
这是您的要求 - 如果输入字符串的最后一个字符是分隔符,则 tokens
数组的最后一个成员应该指向空字符串,而不是 NULL
。您无需在循环后处理此作为特殊情况,如您在评论中所示。您可以在处理输入字符串并从中提取标记的循环体中处理这一点,如下所示:
char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc((*len) * sizeof(char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc(sizeof(char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy(tokens[i], &str[start], sizeof(char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1;
i++;
}
} while (str[end++] != U'\0');
return tokens;
}
'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc((*len) * sizeof(char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc((*len) * sizeof(char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc(sizeof(char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy(tokens[i], &str[start], sizeof(char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1;
i++;
}
} while (str[end++] != U'\0');
return tokens;
}
')) {
tokens[i] = malloc(sizeof(char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy(tokens[i], &str[start], sizeof(char32_t) * (end - start));
tokens[i][end - start] = U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc((*len) * sizeof(char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc(sizeof(char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy(tokens[i], &str[start], sizeof(char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1;
i++;
}
} while (str[end++] != U'\0');
return tokens;
}
';
start = end + 1;
i++;
}
} while (str[end++] != U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc((*len) * sizeof(char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc(sizeof(char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy(tokens[i], &str[start], sizeof(char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1;
i++;
}
} while (str[end++] != U'\0');
return tokens;
}
');
return tokens;
}
一些测试案例:
输入字符串:
char32_t *str = U"Hello,World,mango,hey,";
输出:
# ./a.out
len -> 5
[0] Hello
[1] World
[2] mango
[3] hey
[4]
输入字符串:
char32_t *str = U"Hello,World,mango,hey";
输出:
# ./a.out
len -> 4
[0] Hello
[1] World
[2] mango
[3] hey
输入字符串:
char32_t *str = U",,, , u";
输出:
# ./a.out
len -> 5
[0]
[1]
[2]
[3]
[4] u
输入字符串:
char32_t *str = U" ";
输出:
# ./a.out
len -> 1
[0]
英文:
valgrind
is giving those errors because your program is accessing uninitialised memory in last iteration of this for
loop in main()
function (i.e. while accessing tokens[4]
, when len
value is 5
):
for (int i = 0; i < len; i++) {
if (tokens[i]) {
wprintf(L"[%d] %ls\n" , i , tokens[i]);
}
free(tokens[i]);
}
malloc
function allocate memory and leave it uninitialised. Here, in sp()
function, when your program allocating memory it is uninitialised:
char32_t **tokens = (char32_t **)malloc((*len) * sizeof(char32_t *));
The while
loop of sp()
function allocate and copy some value to allocated memory for all the members of tokens
array except the last member and leaves it uninitialised. In the main()
, your program is accessing that uninitialised member and hence the valgrind
reporting the error.
To fix the problem, in sp()
function, after allocating memory to tokens
-
Either make last pointer member of tokens array NULL
:
// this is the bare minimum change required to fix the problem
tokens [*len - 1] = NULL;
Or, make all pointers NULL
for (int i = 0; i < *len; ++i) {
tokens[i] = NULL;
}
Or, use calloc
to allocate memory to tokens
, which will ensure all the allocated pointers initialised to NULL
:
char32_t **tokens = calloc((*len), sizeof(char32_t *));
With any of the above mentioned solutions, valgrind
output:
# valgrind -s --leak-check=full --track-origins=yes ./a.out
==9761== Memcheck, a memory error detector
==9761== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==9761== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==9761== Command: ./a.out
==9761==
len -> 5
[0] Hello
[1] World
[2] mango
[3] hey
==9761==
==9761== HEAP SUMMARY:
==9761== in use at exit: 0 bytes in 0 blocks
==9761== total heap usage: 7 allocs, 7 frees, 5,248 bytes allocated
==9761==
==9761== All heap blocks were freed -- no leaks are possible
==9761==
==9761== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
Found one more problem in your code, when the input string does not have delimiter character as the last character, you program end up accessing input string beyond it's length which results in undefined behaviour. Look at this statement of while
loop of sp()
function:
p += tok_len + 1;
Assume input string is - U"Hello,World,mango,hey"
[note the last character of string is not delimiter ,
]. The nested while loop condition will result in false
when the p[tok_len]
equal to U'\0'
while iterating input string and the below statement p += tok_len + 1;
will make the pointer p
pointing to memory just beyond the input string. The outer while
loop condition attempt to dereference the p
and it will lead to undefined behaviour.
Replace this statement of while
loop of sp()
function:
p += tok_len + 1;
with this
p += tok_len;
p += (*p != ' p += tok_len;
p += (*p != '\0') ? 1 : 0;
') ? 1 : 0;
This will first make the pointer p
pointing to one character past the end of current tokens in the input string and if that character is not null terminating character then only 1
will be added to pointer p
, otherwise not.
The while
loop body can be implemented in a much better way and can also be equipped to handle scenarios like, for e.g., taking care of spaces when the words in the input string have space(s) in between them, input string with only delimiters etc. I am leaving it up to you to improve the implementation and to take care of other scenarios.
EDIT:
This is your requirement - if the last character of input string is delimiter then the last member of tokens
array should point to empty string, instead of being NULL
.
You don't need to handle this as a special scenario after the loop, as you have shown in comment. You can handle this in loop body which is processing the input string and extracting the tokens from it, like this:
char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc ((*len) * sizeof (char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc (sizeof (char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy (tokens[i], &str[start], sizeof (char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1; i++;
}
} while (str[end++] != U'\0');
return tokens;
}
'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc ((*len) * sizeof (char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc ((*len) * sizeof (char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc (sizeof (char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy (tokens[i], &str[start], sizeof (char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1; i++;
}
} while (str[end++] != U'\0');
return tokens;
}
')) {
tokens[i] = malloc (sizeof (char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy (tokens[i], &str[start], sizeof (char32_t) * (end - start));
tokens[i][end - start] = U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc ((*len) * sizeof (char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc (sizeof (char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy (tokens[i], &str[start], sizeof (char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1; i++;
}
} while (str[end++] != U'\0');
return tokens;
}
';
start = end + 1; i++;
}
} while (str[end++] != U'char32_t **sp(const char32_t *str, const char32_t delim, int *len) {
*len = 1;
for (int i = 0; str[i] != U'\0'; ++i) {
if (str[i] == delim) (*len)++;
}
char32_t **tokens = malloc ((*len) * sizeof (char32_t *));
if (tokens == NULL) {
exit(111);
}
int start = 0, end = 0, i = 0;
do {
if ((str[end] == delim) || (str[end] == U'\0')) {
tokens[i] = malloc (sizeof (char32_t) * (end - start + 1));
if (tokens[i] == NULL) {
exit(112);
}
memcpy (tokens[i], &str[start], sizeof (char32_t) * (end - start));
tokens[i][end - start] = U'\0';
start = end + 1; i++;
}
} while (str[end++] != U'\0');
return tokens;
}
');
return tokens;
}
Few test cases:
Input string:
char32_t *str = U"Hello,World,mango,hey,";
Output:
# ./a.out
len -> 5
[0] Hello
[1] World
[2] mango
[3] hey
[4]
Input string:
char32_t *str = U"Hello,World,mango,hey";
Output:
# ./a.out
len -> 4
[0] Hello
[1] World
[2] mango
[3] hey
Input string:
char32_t *str = U",,, , u";
Output:
# ./a.out
len -> 5
[0]
[1]
[2]
[3]
[4] u
Input string:
char32_t *str = U" ";
Output:
# ./a.out
len -> 1
[0]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论