英文:
Missing optimization: mov al, [mem] to bitfield-insert a new low byte into an integer
问题
我想替换整数中的最低字节。在x86上,这正好是 `mov al, [mem]`,但我似乎无法让编译器输出这个。我是不是漏掉了一个明显的被识别的代码模式,我是否理解错了什么,还是说这只是一个被忽视的优化?
GCC实际上使用了 `al`,但只是用于清零。
Clang几乎逐字地编译了这两者。
```asm
GCC:
mov eax, DWORD PTR [rdi]
movzx edx, BYTE PTR [rsi]
xor al, al
or eax, edx
ret
Clang:
mov ecx, -256
and ecx, dword ptr [rdi]
movzx eax, byte ptr [rsi]
or eax, ecx
ret
英文:
I want to replace the lowest byte in an integer. On x86 this is exactly mov al, [mem]
but I can't seem to get compilers to output this. Am I missing an obvious code pattern that is recognized, am I misunderstanding something, or is this simply a missed optimization?
unsigned insert_1(const unsigned* a, const unsigned char* b)
{
return (*a & ~255) | *b;
}
unsigned insert_2(const unsigned* a, const unsigned char* b)
{
return *a >> 8 << 8 | *b;
}
GCC actually uses al
but just for zeroing.
mov eax, DWORD PTR [rdi]
movzx edx, BYTE PTR [rsi]
xor al, al
or eax, edx
ret
Clang compiles both practically verbatim
mov ecx, -256
and ecx, dword ptr [rdi]
movzx eax, byte ptr [rsi]
or eax, ecx
ret
答案1
得分: 7
在x86上,这就是mov al, [mem]
,但我似乎无法让编译器输出这个。
尝试这个,不涉及算术操作:
unsigned insert_4(const unsigned* a, const unsigned char* b)
{
unsigned int t = *a;
unsigned char *tcp = (unsigned char *) & t;
tcp[0] = *b;
return t;
}
insert_4(unsigned int const*, unsigned char const*):
mov eax, DWORD PTR [rdi]
mov al, BYTE PTR [rsi]
ret
有点混乱,我知道,但编译器很擅长消除间接引用和对局部变量取地址的操作(虽然经过了几次尝试)。
使用联合体的另一种方法:
unsigned insert_5(const unsigned* a, const unsigned char* b)
{
union {
unsigned int ui;
unsigned char uc;
} u;
u.ui = *a;
u.uc = *b;
return u.ui;
}
请注意,这些解决方案是特定于端序的,但似乎符合您的需求,如有需要可以调整为其他端序。
英文:
> On x86 this is exactly mov al, [mem]
but I can't seem to get compilers to output this.
Try this one, arithmetic-free:
unsigned insert_4(const unsigned* a, const unsigned char* b)
{
unsigned int t = *a;
unsigned char *tcp = (unsigned char *) & t;
tcp[0] = *b;
return t;
}
insert_4(unsigned int const*, unsigned char const*):
mov eax, DWORD PTR [rdi]
mov al, BYTE PTR [rsi]
ret
A bit screwy, I know but the compilers are good at removing indirection and address taken for local variables (took a couple of tries though..).
An alternative using union:
unsigned insert_5(const unsigned* a, const unsigned char* b)
{
union {
unsigned int ui;
unsigned char uc;
} u;
u.ui = *a;
u.uc = *b;
return u.ui;
}
Note, these solutions are endian-specific, though it seems like what you're looking for, and, as needed can be adjusted for the other endian.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论