英文:
How does the compiler know how to correctly handle bitfields when reading binary data from a file in C?
问题
I understand your question. The C language handles bitfields like the ones in your struct by automatically aligning them to the appropriate boundary based on the data type of the bitfield. In your case, both lp_off
and lp_len
are defined as 15-bit bitfields.
When you access these bitfields, the compiler knows to extract only the specified number of bits (in this case, 15 bits) and interprets them as an integer value. The trailing 0-bits are not considered part of the value because the compiler knows the size of the bitfield.
So, in your print statements, when you access lp2->lp_off
and lp2->lp_len
, the compiler correctly interprets the first 15 bits as the integer value, ignoring any trailing 0-bits.
This behavior is part of how C defines bitfields, ensuring that only the specified number of bits is used when converting to an integer type.
英文:
I'm trying to learn C by writing a postgres clone and struggling to understand how the compiler handles bitfields.
Specifically, my question surrounds the bitfields in postgres' line pointer struct:
typedef struct ItemIdData {
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
In my clone, I'm able to set these fields to correct values and write them to disk, but I don't understand how the compiler is able know which of the 15 bits are relevant when converting to an integer type. Here's a full example showing what I'm doing:
#include <stdlib.h>
#include <stdio.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <unistd.h>
typedef struct ItemIdData {
unsigned lp_off:15, /* offset to tuple (from start of page) */
lp_flags:2, /* state of line pointer, see below */
lp_len:15; /* byte length of tuple */
} ItemIdData;
int main() {
ItemIdData* lp = malloc(sizeof(ItemIdData));
lp->lp_off = 205;
lp->lp_len = 51;
int fd = open("line_pointer.data",
O_RDWR | // Read/Write mode
O_CREAT, // Create file if it doesn't exist
S_IWUSR | // User write permission
S_IRUSR // User read permission
);
write(fd, lp, 4);
close(fd);
free(lp);
ItemIdData* lp2 = malloc(sizeof(ItemIdData));
int fd2 = open("line_pointer.data",
O_RDWR | // Read/Write mode
O_CREAT, // Create file if it doesn't exist
S_IWUSR | // User write permission
S_IRUSR // User read permission
);
read(fd2, lp2, 4);
printf("lp_off: %d\n", lp2->lp_off);
printf("lp_len: %d\n", lp2->lp_len);
close(fd2);
free(lp2);
return EXIT_SUCCESS;
}
Compiling and running the program, I get:
$ gcc -o lp main.c && ./lp
lp_off: 205
lp_len: 51
Using xxd -b
, I inspected the binary contents of the data file and see these four bytes:
11001101 00000000 01100110 00000000
lp_off
is the first 15 bits: 11001101 0000000
, which somehow correctly converts to 205 in decimal
lp_len
is the last 15 bits: 1100110 00000000
, which somehow correctly converts to 51 in decimal
This is what I'm failing to to understand: how does the compiler know that the trailing 0-bits are not part of the value when converting to an int in the print statements above?
I'm coding this on Ubuntu-20.04 running inside WSL on a Windows 10 machine, if that matters.
答案1
得分: 1
位字段分配了15位给lp_off
,其中存储了205,以及lp_len
,其中存储了51。205的二进制表示是11001101。你看到的额外的0位只是高位位,因为存储的值不够大。
考虑到你使用的是小端机器,前8个最不显著位位于第一个字节中,剩余的7位位于下一个字节中。如果尝试将32767存储到lp_off
中,你将看到前15位都是1。
英文:
The bit field allocates 15 bits for lp_off
in which 205 is stored and lp_len
in which 51 is stored. 205 in binary is 11001101. The extra 0 bits you are seeing are just the higher order bits, not set because the stored value is not big enough.
Given that you are on a little endian machine, the first 8 least significant bits are in the first byte, and the remaining 7 are in the next byte. If you try storing 32767 in the lp_off
, you will see all 1 bits for the first 15 bits.
答案2
得分: 1
以下是翻译好的部分:
结构体中的32位以小端模式存储,正如user16217248所说,但如果偏移量存储为32767,文件中的前15位并不全为1。
由于该结构体为15位 + 2位 + 15位,可以在逻辑上定义如下:
其中o表示偏移量,f表示标志,l表示长度。
但由于它是小端模式,它将被存储在内存中,并写入磁盘,如下所示:
例如,将32767写入偏移量,并将长度设置为0,将产生以下结果:
11111111 01111111 11111110 11111111
因为标志的最低有效位首先存储在包含它的字节(结构体的第二个字节)中,标志的最高有效位最后存储在包含它的字节(结构体的第三个字节)中。
因此,如果标志=1,它将是:
11111111 11111111 11111110 11111111
如果标志=2,
11111111 01111111 11111111 11111111
英文:
The 32 bits in the struct are stored in little endian as user16217248 says, but it is not true that the first 15 bits in the file will be all ones if offset stores 32767.
Since the struct is 15 bits + 2 bits + 15 bits it may be defined logically like this:
Where o is offset, f is flag, and l is len.
But since it is little endian it will be stored in memory and written to disk like this:
so for instance writing 32767 to offset and length with 0 to flag will yeild:
11111111 01111111 11111110 11111111
because the least significant bit of the flag is stored first in the byte (2nd byte of struct) containing it, and the most significant bit of the flag is stored last in the byte (3rd byte of struct) containing it.
So, if flag = 1 it would have been:
11111111 11111111 11111110 11111111
and flag = 2,
11111111 01111111 11111111 11111111
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论