编译器如何在C语言中从文件读取二进制数据时正确处理位字段?

huangapple go评论62阅读模式
英文:

How does the compiler know how to correctly handle bitfields when reading binary data from a file in C?

问题

I understand your question. The C language handles bitfields like the ones in your struct by automatically aligning them to the appropriate boundary based on the data type of the bitfield. In your case, both lp_off and lp_len are defined as 15-bit bitfields.

When you access these bitfields, the compiler knows to extract only the specified number of bits (in this case, 15 bits) and interprets them as an integer value. The trailing 0-bits are not considered part of the value because the compiler knows the size of the bitfield.

So, in your print statements, when you access lp2->lp_off and lp2->lp_len, the compiler correctly interprets the first 15 bits as the integer value, ignoring any trailing 0-bits.

This behavior is part of how C defines bitfields, ensuring that only the specified number of bits is used when converting to an integer type.

英文:

I'm trying to learn C by writing a postgres clone and struggling to understand how the compiler handles bitfields.

Specifically, my question surrounds the bitfields in postgres' line pointer struct:

typedef struct ItemIdData {
  unsigned  lp_off:15,	 /* offset to tuple (from start of page) */
	    lp_flags:2,	 /* state of line pointer, see below */
	    lp_len:15;	 /* byte length of tuple */
} ItemIdData;

In my clone, I'm able to set these fields to correct values and write them to disk, but I don't understand how the compiler is able know which of the 15 bits are relevant when converting to an integer type. Here's a full example showing what I'm doing:

#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>

typedef struct ItemIdData {
  unsigned  lp_off:15,	 /* offset to tuple (from start of page) */
	    lp_flags:2,	 /* state of line pointer, see below */
	    lp_len:15;	 /* byte length of tuple */
} ItemIdData;

int main() {
  ItemIdData* lp = malloc(sizeof(ItemIdData));

  lp->lp_off = 205;
  lp->lp_len = 51;

  int fd = open("line_pointer.data",
                O_RDWR |    // Read/Write mode
                  O_CREAT,  // Create file if it doesn't exist
                S_IWUSR |   // User write permission
                  S_IRUSR   // User read permission
  );

  write(fd, lp, 4);

  close(fd);
  free(lp);

  ItemIdData* lp2 = malloc(sizeof(ItemIdData));

  int fd2 = open("line_pointer.data",
                O_RDWR |    // Read/Write mode
                  O_CREAT,  // Create file if it doesn't exist
                S_IWUSR |   // User write permission
                  S_IRUSR   // User read permission
  );

  read(fd2, lp2, 4);

  printf("lp_off: %d\n", lp2->lp_off);
  printf("lp_len: %d\n", lp2->lp_len);

  close(fd2);
  free(lp2);

  return EXIT_SUCCESS;
}

Compiling and running the program, I get:

$ gcc -o lp main.c && ./lp
lp_off: 205
lp_len: 51

Using xxd -b, I inspected the binary contents of the data file and see these four bytes:

11001101 00000000 01100110 00000000

lp_off is the first 15 bits: 11001101 0000000, which somehow correctly converts to 205 in decimal

lp_len is the last 15 bits: 1100110 00000000, which somehow correctly converts to 51 in decimal

This is what I'm failing to to understand: how does the compiler know that the trailing 0-bits are not part of the value when converting to an int in the print statements above?

I'm coding this on Ubuntu-20.04 running inside WSL on a Windows 10 machine, if that matters.

答案1

得分: 1

位字段分配了15位给lp_off,其中存储了205,以及lp_len,其中存储了51。205的二进制表示是11001101。你看到的额外的0位只是高位位,因为存储的值不够大。

考虑到你使用的是小端机器,前8个最不显著位位于第一个字节中,剩余的7位位于下一个字节中。如果尝试将32767存储到lp_off中,你将看到前15位都是1。

英文:

The bit field allocates 15 bits for lp_off in which 205 is stored and lp_len in which 51 is stored. 205 in binary is 11001101. The extra 0 bits you are seeing are just the higher order bits, not set because the stored value is not big enough.

Given that you are on a little endian machine, the first 8 least significant bits are in the first byte, and the remaining 7 are in the next byte. If you try storing 32767 in the lp_off, you will see all 1 bits for the first 15 bits.

答案2

得分: 1

以下是翻译好的部分:

结构体中的32位以小端模式存储,正如user16217248所说,但如果偏移量存储为32767,文件中的前15位并不全为1。

由于该结构体为15位 + 2位 + 15位,可以在逻辑上定义如下:

编译器如何在C语言中从文件读取二进制数据时正确处理位字段?

其中o表示偏移量,f表示标志,l表示长度。
但由于它是小端模式,它将被存储在内存中,并写入磁盘,如下所示:

编译器如何在C语言中从文件读取二进制数据时正确处理位字段?

例如,将32767写入偏移量,并将长度设置为0,将产生以下结果:

11111111 01111111 11111110 11111111

因为标志的最低有效位首先存储在包含它的字节(结构体的第二个字节)中,标志的最高有效位最后存储在包含它的字节(结构体的第三个字节)中。

因此,如果标志=1,它将是:

11111111 11111111 11111110 11111111

如果标志=2,

11111111 01111111 11111111 11111111

英文:

The 32 bits in the struct are stored in little endian as user16217248 says, but it is not true that the first 15 bits in the file will be all ones if offset stores 32767.

Since the struct is 15 bits + 2 bits + 15 bits it may be defined logically like this:

编译器如何在C语言中从文件读取二进制数据时正确处理位字段?

Where o is offset, f is flag, and l is len.
But since it is little endian it will be stored in memory and written to disk like this:

编译器如何在C语言中从文件读取二进制数据时正确处理位字段?

so for instance writing 32767 to offset and length with 0 to flag will yeild:

11111111 01111111 11111110 11111111

because the least significant bit of the flag is stored first in the byte (2nd byte of struct) containing it, and the most significant bit of the flag is stored last in the byte (3rd byte of struct) containing it.

So, if flag = 1 it would have been:

11111111 11111111 11111110 11111111

and flag = 2,

11111111 01111111 11111111 11111111

huangapple
  • 本文由 发表于 2023年5月22日 01:15:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76301065.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定