英文:
argc/argv random data/behavior
问题
这是我的最小可重现示例:
#include <stdio.h>
int main(int argc, char* argv[])
{
printf("这是 argc 的内容:%d\n", argc);
int i;
for (i = 0; i < argc; i++) {
printf("argv = %d = %s\n", i, argv[i]);
}
return 0;
}
当我将循环中的 argc
更改为一个数字,比如 10
时,代码在达到 10
之前崩溃:
$ ./argc one two three
这是 argc 的内容:4
argv = 0 = ./argc
argv = 1 = one
argv = 2 = two
argv = 3 = three
argv = 4 = (null)
argv = 5 = SHELL=/bin/bash
argv = 6 = SESSION_MANAGER=local/wajih:@/tmp/.ICE-unix/1230,unix/wajih:/tmp/.ICE-unix/1230
argv = 7 = QT_ACCESSIBILITY=1
argv = 8 = COLORTERM=truecolor
argv = 9 = XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
例如,如果我将循环中的 argc
更改为 100
,我会得到一个非常长的错误消息,以此结束:
argv = 54 = GDMSESSION=ubuntu
argv = 55 = DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
argv = 56 = LC_NUMERIC=ar_AE.UTF-8
argv = 57 = _=./argc
argv = 58 = OLDPWD=/home/wajih
argv = 59 = (null)
Segmentation fault (core dumped).
我想了解发生这种情况的原因。
英文:
Here is my minimal reproducible example:
#include <stdio.h>
int main( int argc, char* argv[])
{
printf (" this is the contents of argc:%d\n",argc);
int i;
for (i = 0; i < argc ; i++){
printf(" argv = %d = %s\n",i,argv[i]);
}
return 0;
}
When I change argc
in the for loop into a number, lets say 10
, the code crashes before it reaches 10
:
$ ./argc one two three
this is the contents of argc:4
argv = 0 = ./argc
argv = 1 = one
argv = 2 = two
argv = 3 = three
argv = 4 = (null)
argv = 5 = SHELL=/bin/bash
argv = 6 = SESSION_MANAGER=local/wajih:@/tmp/.ICE-unix/1230,unix/wajih:/tmp/.ICE-unix/1230
argv = 7 = QT_ACCESSIBILITY=1
argv = 8 = COLORTERM=truecolor
argv = 9 = XDG_CONFIG_DIRS=/etc/xdg/xdg-ubuntu:/etc/xdg
If I for example, change argc
in the for loop to a 100
; I get a very long error message, which ends with this:
argv = 54 = GDMSESSION=ubuntu
argv = 55 = DBUS_SESSION_BUS_ADDRESS=unix:path=/run/user/1000/bus
argv = 56 = LC_NUMERIC=ar_AE.UTF-8
argv = 57 = _=./argc
argv = 58 = OLDPWD=/home/wajih
argv = 59 = (null)
Segmentation fault (core dumped).
I want to understand the reason this happens.
答案1
得分: 5
这可能更容易理解通过一个类比。
假设我住在一个狭长的房子里。房子被分成10个房间,它们都是相同大小的,并且它们都排成一条直线。
假设我对机器人感兴趣。假设我建造了一个小型机器人在我的房子里四处走动,拍摄每个房间的照片。因为我的房子的房间都是在一条直线上,机器人的导航任务相当简单。
一旦我的机器人软件完美运行,我要求机器人对我房子中的所有20个房间进行完整的摄影调查。 (哎呀,我犯了个错误.) 机器人开始沿着房子的主轴行驶,依次拍摄每个房间的照片。
在它拍摄了前10个房间的照片后,有一声巨响,因为机器人穿过了房子的尽头墙壁。它对“第11个房间”的照片是破碎的木头和灰泥。对于“第12个房间”的照片是我房子尽头外面的花园。但然后又有一声巨响,机器人继续拍照,不可思议的是,它们看起来像是房子的内部!
原来是因为机器人开到了我邻居的房子,现在正在那里拍照。
从这个愚蠢的小故事中,我们可以学到两件事:
- 如果我的房子有10个房间,而我让我的简单机器人拍摄20个房间的照片,可能会发生一些奇怪、不可预测和错误的事情。
- 尽管发生的事情会是奇怪、不可预测和错误的,但根据情况,其中的一些小片段似乎可以有某种意义。在这种情况下,我的机器人对我房子的“第15个房间”的照片看起来就像卧室,尽管它看起来不像我房子里的任何卧室,而那两个人在那里的床上所做的事情看起来也不像我房子里发生的任何事情...
但类比的另一个重要方面是,显然你不能依赖于其中的任何部分,因为太多的情况是超出你的控制的。机器人可能在穿过墙壁时损坏得如此严重,以至于无法继续拍照。如果刚好在我房子尽头的花园后面有一条街道,机器人可能会被卡车撞到。如果刚好在我房子尽头的花园后面有一个悬崖,机器人可能会掉进海里。等等。
C语言,就像我故事中的简单机器人一样,没有任何内置的保护措施来防止数组越界。如果你尝试访问一个10个元素数组的第15个元素,你通常得到的不是一个错误消息,说“数组越界”。相反,你得到的是一些奇怪、不可预测和错误的东西——除非根据情况,其中的一些部分似乎有某种隐含的意义,这可能导致你浪费时间试图弄清楚它,或者在Stack Overflow上询问。但与其那样做,你可能想花时间研究一个更好的障碍检测或避障算法,而不是那只机器人
另请参阅关于超出数组边界的这些先前的SO问题:
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
和
14.
英文:
It might be easier to understand what's going on here with an analogy.
Suppose I live in a long, narrow house. The house is divided into 10 rooms, but they're all the same size and they're all arranged in a straight line.
Suppose I'm interested in robotics. Suppose I build a little robot to drive around inside my house, taking pictures of each room. Because my house's rooms are all laid out in a straight line, the robot's navigation task is pretty simple.
Once I've got the robot's software working perfectly, I ask the robot to make a complete photographic survey of all 20 rooms in my house. (Oops, I made a mistake, there.) And the robot starts driving along the main axis of the house taking pictures of each room in turn.
After it takes pictures of the first 10 rooms, there's a crashing sound as the robot drives through the end wall of the house. Its pictures of the "11th room" are of splintered wood and plaster. Its pictures of the "12th room" are of the garden outside the end of my house. But then there's another crashing sound, and the robot keeps taking pictures, and somehow, remarkably, they look like the insides of a house again!
It turns out that's because the robot has driven into my neighbor's house and is now taking pictures there.
From this silly little story we can learn two things:
- If there are 10 rooms in my house, and I ask my simpleminded robot to take pictures of 20 rooms, something strange, unpredictable, and wrong is probably going to happen.
- Even though what happens is going to be strange, unpredictable, and wrong, little bits of it can seem to make some kind of sense, depending on circumstances. In this case, my robot's picture of the "15th room" of my house looked just like a bedroom, although it didn't look like any bedroom in my house, and what the two people were doing in bed there didn't look like anything that happens in my house, either...
But the other important aspect of the analogy is that you obviously can't depend on any of it, because too many of the circumstances are outside of your control. The robot might have damaged itself so badly driving through walls that it couldn't continue taking pictures. If there happened to be a street just past the garden at the end of my house, the robot might have gotten run over by a truck. If there happened to be a cliff just past the garden at the end of my house, the robot might have fallen into the ocean. Etc.
C, like the simpleminded robot in my story, does not have any built-in protections against running off the end of arrays. If you try to access the 15th element of a 10-element array, what you don't typically get is an error message saying "Array bounds exceeded." What you get instead is something strange, unpredictable, and wrong — except that, depending on circumstances, there might seem to be some kind of hidden meaning, which might lead you to waste time trying to figure it out, or asking about it on Stack Overflow. But rather than doing that, you might want to spend your time working on a better obstacle detection or collision avoidance algorithm for the robot, instead.
See also these previous SO questions on the topic of exceeding the bounds of arrays:
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
and
14.
答案2
得分: 3
The argv
指针在程序内存中有一个非常特定的位置。
当你运行一个二进制文件时,总会有一个入口点。在C语言中,这个入口点就是main()
函数。但是,为了在该位置启动二进制文件,操作系统必须首先执行一些操作,比如复制环境变量、从操作系统请求和偏移内存等。因为这个过程是完全确定的(按照操作系统的规定),所以你实际上可以期望在这些参数之后读取环境变量。
这个原则对于计算机安全非常重要。如果攻击者设法泄漏内存段中的指针,他们可以覆盖一些环境变量(比如PATH),将其指向他们自己的二进制文件。hackmd有一个非常好的例子:HackMD: 环境变量攻击。
图片来源:COMPILER, ASSEMBLER, LINKER AND LOADER:
A BRIEF STORY
英文:
The argv
pointer has a very specific location in the program's memory.
When you run a binary, there is always some entry point. In C, that is in the main()
function. But, in order to prepare the environment for the binary to start at that location, the OS has to do some things first.
It has to copy over environment variables, request and offset memory from the OS, etc. Because this process is completely deterministic (per OS), you can actually expect to read the environment variables just after these arguments.
This principle is fundamental to computer security. If an attacker manages to leak a pointer in this segment of memory, they can overwrite some environment variable (i.e. PATH), to point to their own binary first. hackmd has a really nice example of this: HackMD: Environment variables attack.
Image source: COMPILER, ASSEMBLER, LINKER AND LOADER:
A BRIEF STORY
答案3
得分: 2
您正在调用未定义的行为。C标准规定argv[argc]将是一个空指针,并且尝试访问argv[i],其中i < 0或i > argc都是未定义的行为。
"未定义行为"意味着任何事情都可能发生。如果您要求解释,那么除了"它是未定义的行为"之外没有其他解释。编译器有权生成代码,在将您的所有资金发送到我的银行帐户后完全清除您的硬盘。请不要这样做。您正在做一些不允许的事情,这就是完整的答案。
英文:
You are invoking undefined behaviour. The C Standard says that argv[argc] will be a null pointer, and that trying to access argv[i] for i < 0 or i > argc is undefined behaviour.
"Undefined behaviour" means anything can happen. If you ask for an explanation, there is none other than "it is undefined behaviour". It is legal for the compiler to produce code that completely erases your hard drive after sending all your money to my bank account. Don't do it. You are doing things that you are not allowed to do, and that's the complete answer.
答案4
得分: 1
在C语言中,超出数组末尾会导致未定义行为。你将获得的结果会根据编译器、操作系统、所使用的shell以及许多其他因素而变化。
在这种特定情况下,你正在列出环境变量,因为你的main
函数不仅接收argv
中的参数,还接收envp
中的环境变量列表,而且纯属巧合,这些值恰好位于argv
数组之后。只需记住,不能依赖这一点。
main(int argc, char *argv[], char *envp[]);
总之,不要超出数组的末尾。这将导致不良后果™。
如果你的程序需要使用环境变量的值,必须通过envp
数组来访问,而不要滥用argv
数组的未定义行为。
英文:
Going past the end of an array will give you undefined behavior in C. The results you would get would vary depending on the compiler, the operating system, the shell you use, and a lot of other factors.
In this specific case, you are listing environment variables, because your main
function is passed not just the arguments in argv
but also a list of environment variables in envp
, and just out of coincidence, those values are placed right after the argv
array. Just remember that you can never trust that to be true.
main(int argc, char *argv[], char *envp[]);
In summary, don't go past the end of the array. It will lead to Bad Things™.
If your program needs to use the values of environment variables, you must to so through the envp
array, and not abuse undefined behavior through the argv
array.
答案5
得分: 0
Most Unix System provides a 3rd argument to main
function.
int main( int argc, char *argv[], char *envp[]);
It is called environment variables.
In the above case it prints the contents of the 3rd argument - envp
. But it will not show the same behavior always. Printing data from argv
after argc
count has undefined behavior
英文:
Most Unix System provides a 3rd argument to main
function.
int main( int argc, char *argv[], char *envp[]);
It is called environment variables.
In the above case it prints the contents of the 3rd argument - envp
. But it will not show the same behavior always. Printing data from argv
after argc
count has undefined behavior
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论