2023年5月22日 01:15:35go评论138阅读模式

英文:

How does the compiler know how to correctly handle bitfields when reading binary data from a file in C?

问题

I understand your question. The C language handles bitfields like the ones in your struct by automatically aligning them to the appropriate boundary based on the data type of the bitfield. In your case, both lp_off and lp_len are defined as 15-bit bitfields.

When you access these bitfields, the compiler knows to extract only the specified number of bits (in this case, 15 bits) and interprets them as an integer value. The trailing 0-bits are not considered part of the value because the compiler knows the size of the bitfield.

So, in your print statements, when you access lp2->lp_off and lp2->lp_len, the compiler correctly interprets the first 15 bits as the integer value, ignoring any trailing 0-bits.

This behavior is part of how C defines bitfields, ensuring that only the specified number of bits is used when converting to an integer type.

英文:

I'm trying to learn C by writing a postgres clone and struggling to understand how the compiler handles bitfields.

Specifically, my question surrounds the bitfields in postgres' line pointer struct:

typedef struct ItemIdData {
  unsigned  lp_off:15,	 /* offset to tuple (from start of page) */
	    lp_flags:2,	 /* state of line pointer, see below */
	    lp_len:15;	 /* byte length of tuple */
} ItemIdData;

In my clone, I'm able to set these fields to correct values and write them to disk, but I don't understand how the compiler is able know which of the 15 bits are relevant when converting to an integer type. Here's a full example showing what I'm doing:

#include &lt;stdlib.h&gt;
#include &lt;stdio.h&gt;
#include &lt;fcntl.h&gt;
#include &lt;sys/stat.h&gt;
#include &lt;unistd.h&gt;

typedef struct ItemIdData {
  unsigned  lp_off:15,	 /* offset to tuple (from start of page) */
	    lp_flags:2,	 /* state of line pointer, see below */
	    lp_len:15;	 /* byte length of tuple */
} ItemIdData;

int main() {
  ItemIdData* lp = malloc(sizeof(ItemIdData));

  lp-&gt;lp_off = 205;
  lp-&gt;lp_len = 51;

  int fd = open(&quot;line_pointer.data&quot;,
                O_RDWR |    // Read/Write mode
                  O_CREAT,  // Create file if it doesn&#39;t exist
                S_IWUSR |   // User write permission
                  S_IRUSR   // User read permission
  );

  write(fd, lp, 4);

  close(fd);
  free(lp);

  ItemIdData* lp2 = malloc(sizeof(ItemIdData));

  int fd2 = open(&quot;line_pointer.data&quot;,
                O_RDWR |    // Read/Write mode
                  O_CREAT,  // Create file if it doesn&#39;t exist
                S_IWUSR |   // User write permission
                  S_IRUSR   // User read permission
  );

  read(fd2, lp2, 4);

  printf(&quot;lp_off: %d\n&quot;, lp2-&gt;lp_off);
  printf(&quot;lp_len: %d\n&quot;, lp2-&gt;lp_len);

  close(fd2);
  free(lp2);

  return EXIT_SUCCESS;
}

Compiling and running the program, I get:

$ gcc -o lp main.c &amp;&amp; ./lp
lp_off: 205
lp_len: 51

Using xxd -b, I inspected the binary contents of the data file and see these four bytes:

11001101 00000000 01100110 00000000

lp_off is the first 15 bits: 11001101 0000000, which somehow correctly converts to 205 in decimal

lp_len is the last 15 bits: 1100110 00000000, which somehow correctly converts to 51 in decimal

This is what I'm failing to to understand: how does the compiler know that the trailing 0-bits are not part of the value when converting to an int in the print statements above?

I'm coding this on Ubuntu-20.04 running inside WSL on a Windows 10 machine, if that matters.

答案1

得分: 1

位字段分配了15位给lp_off，其中存储了205，以及lp_len，其中存储了51。205的二进制表示是11001101。你看到的额外的0位只是高位位，因为存储的值不够大。

考虑到你使用的是小端机器，前8个最不显著位位于第一个字节中，剩余的7位位于下一个字节中。如果尝试将32767存储到lp_off中，你将看到前15位都是1。

英文:

The bit field allocates 15 bits for lp_off in which 205 is stored and lp_len in which 51 is stored. 205 in binary is 11001101. The extra 0 bits you are seeing are just the higher order bits, not set because the stored value is not big enough.

Given that you are on a little endian machine, the first 8 least significant bits are in the first byte, and the remaining 7 are in the next byte. If you try storing 32767 in the lp_off, you will see all 1 bits for the first 15 bits.

答案2

得分: 1

以下是翻译好的部分：

结构体中的32位以小端模式存储，正如user16217248所说，但如果偏移量存储为32767，文件中的前15位并不全为1。

由于该结构体为15位 + 2位 + 15位，可以在逻辑上定义如下：

其中o表示偏移量，f表示标志，l表示长度。
但由于它是小端模式，它将被存储在内存中，并写入磁盘，如下所示：

例如，将32767写入偏移量，并将长度设置为0，将产生以下结果：

11111111 01111111 11111110 11111111

因为标志的最低有效位首先存储在包含它的字节（结构体的第二个字节）中，标志的最高有效位最后存储在包含它的字节（结构体的第三个字节）中。

因此，如果标志=1，它将是：

11111111 11111111 11111110 11111111

如果标志=2，

11111111 01111111 11111111 11111111

英文:

The 32 bits in the struct are stored in little endian as user16217248 says, but it is not true that the first 15 bits in the file will be all ones if offset stores 32767.

Since the struct is 15 bits + 2 bits + 15 bits it may be defined logically like this:

Where o is offset, f is flag, and l is len.
But since it is little endian it will be stored in memory and written to disk like this:

so for instance writing 32767 to offset and length with 0 to flag will yeild:

11111111 01111111 11111110 11111111

because the least significant bit of the flag is stored first in the byte (2nd byte of struct) containing it, and the most significant bit of the flag is stored last in the byte (3rd byte of struct) containing it.

So, if flag = 1 it would have been:

11111111 11111111 11111110 11111111

and flag = 2,

11111111 01111111 11111111 11111111

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

编译器如何在C语言中从文件读取二进制数据时正确处理位字段？

问题

答案1

答案2

我的硬件定时器为什么不使用HAL库递增计数？

文件指针在C中使用fread读取最后一行时变为NULL。

C程序函数ctime()返回的时间/日期不正确 – 日期偏差3天。

在Go程序中编译XDP源代码时找不到头文件。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论