segmentation fault when accessing pointer element

huangapple go评论57阅读模式
英文:

segmentation fault when accessing pointer element

问题

我一直在尝试将一个字符串解析成单独的标记,用于我的项目的命令行界面。我已经创建了这个函数来执行此操作:

char **string_parser(char *input) {
    char **output = (char **) malloc(sizeof(input));
    int word_num = 0;
    int word_index = 0;

    for(int i = 0; i < strlen(input); i++) {
        if(input[i] == ' ') {
            output[word_num][word_index] = '\0'
            word_index = 0;
            word_num++;
            continue;
        }

        if(input[i] == '\0') {
            output[word_num][word_index] = '\0'
            break;
        }

        output[word_num][word_index] = input[i];
        word_index++;
    }

    return output;
}

但是它在一次迭代后出现分段错误,

我一直在调用这个函数:
char *input = "this is a parser test.";
非常感谢任何帮助。

英文:

I have been attempting to parse a string into separate tokens for a command line interface for a project of mine, I have created this function to do it:

char **string_parser(char *input) {
    char **output = (char **) malloc(sizeof(input));
    int word_num = 0;
    int word_index = 0;

    for(int i = 0; i < strlen(input); i++) {
        if(input[i] == ' ') {
            output[word_num][word_index] = '
char **string_parser(char *input) {
char **output = (char **) malloc(sizeof(input));
int word_num = 0;
int word_index = 0;
for(int i = 0; i < strlen(input); i++) {
if(input[i] == ' ') {
output[word_num][word_index] = '\0';
word_index = 0;
word_num++;
continue;
}
if(input[i] == '\0') {
output[word_num][word_index] = '\0';
break;
}
output[word_num][word_index] = input[i];
word_index++;
}
return output;
}
'; word_index = 0; word_num++; continue; } if(input[i] == '
char **string_parser(char *input) {
char **output = (char **) malloc(sizeof(input));
int word_num = 0;
int word_index = 0;
for(int i = 0; i < strlen(input); i++) {
if(input[i] == ' ') {
output[word_num][word_index] = '\0';
word_index = 0;
word_num++;
continue;
}
if(input[i] == '\0') {
output[word_num][word_index] = '\0';
break;
}
output[word_num][word_index] = input[i];
word_index++;
}
return output;
}
') { output[word_num][word_index] = '
char **string_parser(char *input) {
char **output = (char **) malloc(sizeof(input));
int word_num = 0;
int word_index = 0;
for(int i = 0; i < strlen(input); i++) {
if(input[i] == ' ') {
output[word_num][word_index] = '\0';
word_index = 0;
word_num++;
continue;
}
if(input[i] == '\0') {
output[word_num][word_index] = '\0';
break;
}
output[word_num][word_index] = input[i];
word_index++;
}
return output;
}
'; break; } output[word_num][word_index] = input[i]; word_index++; } return output; }

but it segmentation faults after 1 iteration,

I have been calling the function on:
char *input = "this is a parser test.";
any help is greatly appreciated.

答案1

得分: 1

以下是翻译好的部分:

这段内存分配

    char **output = (char **) malloc(sizeof(input));

是不正确的。它只为类型为 `char *` 的一个对象分配了内存。

而且,分配的内存是未初始化的。因此,至少这个语句(以及类似的语句)

    output[word_num][word_index] = '\0';

会引发未定义的行为。

您需要为源字符串中的单词分配与单词数相同的指针。并且对于每个单词,您还需要分配一个字符数组来存储提取的单词。

 for 循环中调用函数 `strlen`

    for(int i = 0; i < strlen(input); i++) {

是多余且低效的。

请注意,函数参数应该声明为 `const` 限定符,因为函数内部不会更改源字符串。

以下是演示程序,展示了函数实现的一种方法。这个函数有缺点,因为它没有检查内存分配是否成功。您需要自己进行检查。分配的指针数组中的最后一个指针设置为 `NULL`,以允许确定有效提取的单词数。

```c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char ** string_parser( const char *input ) 
{
    char **output = NULL;
    size_t word_num = 0;

    for (const char *delim = " \t"; *input != '\0'; )
    {
        input += strspn( input, delim );

        if (*input)
        {
            size_t n = strcspn( input, delim );

            output = realloc( output, ( word_num + 1 ) * sizeof( char * ) );

            output[word_num] = malloc( n + 1 );

            memcpy( output[word_num], input, n );
            output[word_num][n] = '\0';
            ++word_num;

            input += n;
        }
    }

    output = realloc( output, ( word_num + 1 ) * sizeof( char * ) );

    output[word_num] = NULL;

    return output;
}

int main( void )
{
    const char *input = "this is a parser test.";

    char **output = string_parser( input );

    for (char **p = output; *p != NULL; ++p)
    {
        puts( *p );
    }

    for (char **p = output; *p != NULL; ++p)
    {
        free( *p );
    }
    free( output );
}

程序输出为

this
is
a
parser
test.

<details>
<summary>英文:</summary>
This memory allocation
char **output = (char **) malloc(sizeof(input));
is incorrect. It allocates memory only for one object of the type `char *`.
Moreover the allocated memory is uninitialized. So at least this statement (and similar statements)
output[word_num][word_index] = &#39;\0&#39;;
invokes undefined behavior.
You need to allocate as many pointers as there are words in the source string. And for each word you also need to allocate a character array to store the extracted word.
And calling the function `strlen` in the for loop
for(int i = 0; i &lt; strlen(input); i++) {
is redundant and inefficient.
Pay attention to that the function parameter should be declared with qualifier `const` because the source string is not changed within the function.
Here is a demonstration program that shows an approach to the function implementation. The function has drawbacks because it does not check that memory allocations were successfull. You will need to do that yourself. The last pointer in the allocated array of pointers is set to `NULL` to allow to determine the number of valid extracted words.
#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;
#include &lt;string.h&gt;
char ** string_parser( const char *input ) 
{
char **output = NULL;
size_t word_num = 0;
for (const char *delim = &quot; \t&quot;; *input != &#39;\0&#39;; )
{
input += strspn( input, delim );
if (*input)
{
size_t n = strcspn( input, delim );
output = realloc( output, ( word_num + 1 ) * sizeof( char * ) );
output[word_num] = malloc( n + 1 );
memcpy( output[word_num], input, n );
output[word_num][n] = &#39;\0&#39;;
++word_num;
input += n;
}
}
output = realloc( output, ( word_num + 1 ) * sizeof( char * ) );
output[word_num] = NULL;
return output;
}
int main( void )
{
const char *input = &quot;this is a parser test.&quot;;
char **output = string_parser( input );
for (char **p = output; *p != NULL; ++p)
{
puts( *p );
}
for (char **p = output; *p != NULL; ++p)
{
free( *p );
}
free( output );
}
The program output is
this
is
a
parser
test.
</details>
# 答案2
**得分**: 0
首先,我建议你查看在C程序中解析命令行参数的常见方法。例如,查阅StackOverflow -- https://stackoverflow.com/questions/9642732/parsing-command-line-arguments-in-c
接下来,关于你代码中的错误,你使用malloc()分配了一些字节,然后在写入这些数组时将它们视为某些内存地址,出现在以下代码行:
output[word_num][word_index] = '\0';
但是由malloc()分配的内存包含随机字节,它们构成了随机地址,尝试写入这些地址会导致分段错误。
我猜想你的代码的想法是将字符串拆分成单词。为了实现这一点,我建议使用一个标准函数。如果你一定要实现自己的标记化器,你必须仔细考虑调用代码如何释放标记化器分配的内存用于存储标记。
<details>
<summary>英文:</summary>
First of all, I suggest that you check the common approaches to parsing command line parameters in C programs. For example, searching StackOverflow -- https://stackoverflow.com/questions/9642732/parsing-command-line-arguments-in-c
Next, concerning the bugs in your code, you malloc() some bytes and then treat them as pointers to some memory addresses when writing to these arrays at line  
output[word_num][word_index] = &#39;\0&#39;;
But as the memory allocated by malloc() contains random bytes, they make up random addresses and an attempt to write to these addresses causes segmentation fault.
I guess the idea behind your code is to split the string into words. To do that, I suggest using a standard function. 
If you absolutely must implement your own tokenizer, you have to think carefully of how the calling code would free up the memory allocated for tokens by the tokenizer.
</details>

huangapple
  • 本文由 发表于 2023年6月19日 03:11:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76502148.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定