Little endian and overwriting memory location 小端序和覆写内存位置

huangapple go评论54阅读模式
英文:

Little endian and overwriting memory location

问题

I'll provide a translation of the non-code part of your text:

首先,我要明确一点,我知道这段代码存在未定义行为(UB)。但是,我仍然不确定它是如何工作的。

所以这会输出“Hello”和“bye”。当然,这是未定义行为,但我想知道这里发生了什么。
我曾经以为我理解小端和大端是如何工作的,但显然情况并非如此,因为我已经看了这个问题30分钟了,仍然不明白发生了什么。

如果可能的话,我真的会非常感激一张图解...

我的理解:

  • 'x' 和 's' 在内存中是顺序排列的,具有以下输出。
  • 覆盖 x[3] 会覆盖 s。

我不明白的是:

  • 'bye' 为什么会被打印出来,以及发生了什么内存方面的情况。让我们假设它不是未定义行为...
  • 为什么在 'bye' 的 'e' 后面会插入一个空终止符。
  • 所有这些与字节序有什么关系。

谢谢。

英文:

First of all I’ll make it clear that I know this piece of code is UB. Still I am not sure how this works.

#include <stdio.h>

int x[] = {1, 2, 3};
char s[] = "hello";

int main()
{
    puts(s);
    x[3] = ('e' << 16) + ('y' << 8) + 'b';
    puts(s);
}

So this prints Hello and bye. Certainly it is UB but I would like to know what is happening here.
I thought I understood how little endian and big endian worked but obviously that’s not the case since I’ve been looking at this for the past 30 minutes and still don’t get what’s going on.

I would REALLY appreciate a diagram if possible…

My understanding:

  • ‘x’ and ‘s’ are laid out seauentially in memory with this output.
  • overwriting x[3] overwrites s.

I don’t get:

  • how and why ‘bye’ is printed. What is happening to the memory. Let’s assume it’s not UB…
  • How and why a null terminator is inserted after ‘e’ in ‘bye’.
  • what endianness has to do with all of this.

Thanks.

答案1

得分: 5

这个表达式:

('e' << 16) + ('y' << 8) + 'b'

具有int类型,假设ASCII编码,其十六进制值为0x00657962。然后,您使用类型为int的lvalue存储它。假设int为32位且小端字节序,它以上述字节的反向方式表示,即0x62 0x79 0x65 0x00。当将这些字节解释为字符串时,您得到字符bye后跟一个空终止字节,这反映了打印的内容。

英文:

This expression:

(&#39;e&#39; &lt;&lt; 16) + (&#39;y&#39; &lt;&lt; 8) + &#39;b&#39;

has type int with a value (assuming ASCII) in hex of 0x00657962. You then store it using an lvalue of type int. Assuming int is 32 bits and little endian byte ordering, it is represented with the above bytes reversed, i.e. 0x62 0x79 0x65 0x00. When those bytes are interpreted as a string, you have the characters bye followed by a null terminating byte, which reflects what is printed.

答案2

得分: 4

Without padding and assuming that x and s are adjacent in memory, x[3] is the address of the fourth element of a three-element array, and with the above assumptions, the memory looks like this

|x[0]|x[1]|x[2]|s[0]|s[1]|s[1]|s[3]|......
            or |       x[3]        | &lt;- doesn&#39;t really exist, UB to write to it

That also assumes int values are 4 bytes. And note the sizes of elements of x less than 3 are not to scale.

In little-endian format, the code

x[3] = (&#39;e&#39; &lt;&lt; 16) + (&#39;y&#39; &lt;&lt; 8) + &#39;b&#39;;

sets the bytes of x[3] to

| &#39;b&#39; | &#39;y&#39; | &#39;e&#39; | 0  |

or, lined up again:

|x[0]|x[1]|x[2]|s[0]|s[1]|s[1]|s[3]|...
英文:

Without padding and assuming that x and s are adjacent in memory, x[3] is the address of the fourth element of a three-element array, and with the above assumptions, the memory looks like this

|x[0]|x[1]|x[2]|s[0]|s[1]|s[1]|s[3]|......
            or |       x[3]        | &lt;- doesn&#39;t really exist, UB to write to it

That also assumes int values are 4 bytes. And note the sizes of elements of x less than 3 are not to scale.

In little-endian format, the code

x[3] = (&#39;e&#39; &lt;&lt; 16) + (&#39;y&#39; &lt;&lt; 8) + &#39;b&#39;;

sets the bytes of x[3] to

| &#39;b&#39; | &#39;y&#39; | &#39;e&#39; | 0  |

or, lined up again:

|x[0]|x[1]|x[2]|s[0]|s[1]|s[1]|s[3]|...
               | &#39;b&#39;| &#39;y&#39;| &#39;e&#39;| 0  |

huangapple
  • 本文由 发表于 2023年5月15日 07:23:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76250075.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定