strtok() 有时会导致堆栈溢出吗?

huangapple go评论53阅读模式
英文:

strtok() sometimes(??) causing stack smashing?

问题

I have translated the provided text into Chinese as requested. Here it is:

使用 Kubuntu 22.04 LTS、Kate v22.04.3 和 gcc v11.3.0,我编写了一个小程序来研究字符串标记化中 strtok() 的使用,如下所示:

#include <stdio.h>
#include <string.h>

int main(void)
{
   char inString[] = "";         // 从键盘读入的字符串。
   char * token    = "";         // 输入字符串中的单词(标记)。
   char delimiters[] = " ,";     // 分隔单词(标记)的项。
   
   // 解释程序的性质。
   printf("This program reads in a string from the keyboard"
          "\nand breaks it into separate words (tokens) which"
          "\nare then output one token per line.\n");
   printf("\nEnter a string: ");
   scanf("%s", inString);
   
   /* 获取第一个标记 */
   token = strtok(inString, delimiters);
   
   /* 遍历其他标记。 */
   while (token != NULL)
   {
      printf("%s", token);
      printf("\n");
      
      // 获取下一个标记。
      token = strtok(NULL, delimiters);
   }
   return 0;
}

从我查看的各种网页中,似乎我已经正确格式化了 strtok() 函数调用。在第一次运行时,程序产生以下输出。

$ ./ex6_2
This program reads in a string from the keyboard
and breaks it into separate words (tokens) which
are then output one token per line.

Enter a string: fred ,  steve ,   nick
f
ed

第二次运行时,它产生了以下输出。

$ ./ex6_2
This program reads in a string from the keyboard
and brakes it into separate words (tokens) which
are then output one token per line.

Enter a string: steve ,  barney ,   nick
s
eve
*** stack smashing detected ***: terminated
Aborted (core dumped)

随后的运行显示,如果第一个单词/标记只包含四个字符,那么程序似乎会运行,就像上面的第一个案例一样。但是,如果第一个单词/标记包含五个或更多字符,则会发生栈破坏。

鉴于您使用 "char *" 来访问标记,为什么:

a) 第一个标记(在每种情况下)在第二个字符处拆分?

b) 后续标记(在每种情况下)没有输出?

c) 首个大于四个字符的单词/标记会导致栈破坏?

Stuart

英文:

Using Kubuntu 22.04 LTS, Kate v22.04.3, and gcc v11.3.0, I have developed a small program to investigate the use of strtok() for tokenising strings, which is shown below.

#include &lt;stdio.h&gt;
#include &lt;string.h&gt;

int main(void)
{
   char inString[] = &quot;&quot;;         // string read in from keyboard.
   char * token    = &quot;&quot;;         // A word (token) from the input string.
   char delimiters[] = &quot; ,&quot;;     // Items that separate words (tokens).
   
   // explain nature of program.
   printf(&quot;This program reads in a string from the keyboard&quot;
          &quot;\nand breaks it into separate words (tokens) which&quot;
          &quot;\nare then output one token per line.\n&quot;);
   printf(&quot;\nEnter a string: &quot;);
   scanf(&quot;%s&quot;, inString);
   
   /* get the first token */
   token = strtok(inString, delimiters);
   
   /* Walk through other tokens. */
   while (token != NULL)
   {
      printf(&quot;%s&quot;, token);
      printf(&quot;\n&quot;);
      
      // Get next token.
      token = strtok(NULL, delimiters);
   }
   return 0;
}

From the various web pages that I have viewed, it would seem that I have formatted the strtok() function call correctly. On the first run, the program produces the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and breaks it into separate words (tokens) which
are then output one token per line.

Enter a string: fred ,  steve ,   nick
f
ed

On the second run, it produced the following output.

$ ./ex6_2
This program reads in a string from the keyboard
and brakes it into separate words (tokens) which
are then output one token per line.

Enter a string: steve ,  barney ,   nick
s
eve
*** stack smashing detected ***: terminated
Aborted (core dumped)

Subsequent runs showed that the program sort of ran, as in the first case above, if the first word/token contained only four characters. However, if the first word/token contained five or more characters then stack smashing occurred.

Given that "char *" is used to access the tokens, why :-

a) is the first token (in each case) split at the second character ?

b) are the subsequent tokens (in each case) not output ?

c) does a first word/token of greater than four characters cause stack smashing?

Stuart

答案1

得分: 4

The declaration

char inString[] = &quot;&quot;;

is equivalent to:

char inString[1] = &quot;&quot;;

This means that you are allocating an array of only a single element, so it only has space for storing a single character.

The function call

scanf(&quot;%s&quot;, inString);

requires that the function argument inString points to a memory buffer that is sufficiently large to store the matched input. Your program is violating this requirement, as the memory buffer has only space for a single character (the terminating null character). It can therefore only store strings with a length of zero.

By violating the requirement, your program is invoking undefined behavior, which means that anything can happen, including the strange behavior that you observed. The function scanf is probably overflowing the buffer inString, overwriting other important data on your program's stack, causing it to misbehave. This is called "stack smashing".

To fix this, you should give the array inString more space, for example by changing the line

char inString[] = &quot;&quot;;

to:

char inString[200] = &quot;&quot;;

However, in that case, if the user enters more than 200 characters of input as a single word, then you will have the same problem again, and your program may crash. Therefore, you may want to additionally limit the number of characters matched by scanf to 199 characters (200 including the terminating null character). That way, you can ensure that the user will not be able to crash your program.

You can add such a limit like this:

scanf(&quot;%199s&quot;, inString);

Note, however, that the %s specifier will only match a single word. If you want to read an entire line of input, you may want to use the function fgets instead of scanf.

英文:

The declaration

char inString[] = &quot;&quot;;

is equivalent to:

char inString[1] = &quot;&quot;;

This means that you are allocating an array of only a single element, so it only has space for storing a single character.

The function call

scanf(&quot;%s&quot;, inString);

requires that the function argument inString points to a memory buffer that is sufficiently large to store the matched input. Your program is violating this requirement, as the memory buffer has only space for a single character (the terminating null character). It can therefore only store strings with a length of zero.

By violating the requirement, your program is invoking undefined behavior, which means that anything can happen, including the strange behavior that you observed. The function scanf is probably overflowing the buffer inString, overwriting other important data on your program's stack, causing it to misbehave. This is called "stack smashing".

To fix this, you should give the array inString more space, for example by changing the line

char inString[] = &quot;&quot;;

to:

char inString[200] = &quot;&quot;;

However, in that case, if the user enters more than 200 characters of input as a single word, then you will have the same problem again and your program may crash. Therefore, you may want to additionally limit the number of characters matched by scanf to 199 characters (200 including the terminating null character). That way, you can ensure that the user will not be able to crash your program.

You can add such a limit like this:

scanf(&quot;%199s&quot;, inString);

Note, however, that the %s specifier will only match a single word. If you want to read an entire line of input, you may want to use the function fgets instead of scanf.

答案2

得分: 3

这个字符数组的声明:

char inString[] = "";

等同于:

char inString[1] = { '
char inString[1] = { '\0' };
'
};

这意味着它声明了一个只能存储空字符串的数组。因此,使用以下方式尝试在这个字符数组中读取字符串:

scanf("%s", inString);

会导致未定义的行为。

你需要指定更大数量的元素。例如:

enum { N = 100 };
char inString[N] = "";

这个指针的初始化:

char * token = "";

不太合理。最好这样写:

char * token = NULL;

这个 scanf 的调用:

scanf("%s", inString);

只能读取一个由空白字符分隔的字符序列作为一个单词。

取而代之,可以这样写:

scanf(" %99[^\n]", inString);

在分隔符列表中包括制表符 '\t' 是有意义的:

const char *delimiters = " \t,";

而不是这些 printf 的调用:

printf("%s", token);
printf("\n");

更简单的写法是:

puts(token);
英文:

This declaration of a character array

char inString[] = &quot;&quot;;   

is equivalent to

char inString[1] = { &#39;
char inString[1] = { &#39;\0&#39; };; 
&#39; };;

That is it declares an array with only one element that is able to store only an empty string. So any attempt to read a string in this character array using this call of scanf

scanf(&quot;%s&quot;, inString);

invokes undefined behavior.

You need to specify the number of elements much more greater. For example

enum { N = 100 };
char inString[N] = &quot;&quot;;   

This initialization of a pointer

char * token    = &quot;&quot;;

does not make a great sense. It is better to write for example

char * token = NULL;

This call of scanf

scanf(&quot;%s&quot;, inString);

can read only one word that is a sequence of characters separated by white space characters.

Instead write for example

scanf( &quot; %99[^\n]&quot;, inString);

It makes sense to include the tab character '\t' in the list of delimiters

const char *delimiters = &quot; \t,&quot;;

Instead of these calls of printf

  printf(&quot;%s&quot;, token);
  printf(&quot;\n&quot;);

it will be simpler to write

puts( token );

huangapple
  • 本文由 发表于 2023年4月19日 21:25:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055089.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定