缺失的优化:mov al, [mem] 以将新的低字节插入整数的位域

huangapple go评论69阅读模式
英文:

Missing optimization: mov al, [mem] to bitfield-insert a new low byte into an integer

问题

我想替换整数中的最低字节。在x86上,这正好是 `mov al, [mem]`,但我似乎无法让编译器输出这个。我是不是漏掉了一个明显的被识别的代码模式,我是否理解错了什么,还是说这只是一个被忽视的优化?

GCC实际上使用了 `al`,但只是用于清零。

Clang几乎逐字地编译了这两者

```asm
GCC:
        mov     eax, DWORD PTR [rdi]
        movzx   edx, BYTE PTR [rsi]
        xor     al, al
        or      eax, edx
        ret

Clang:
        mov     ecx, -256
        and     ecx, dword ptr [rdi]
        movzx   eax, byte ptr [rsi]
        or      eax, ecx
        ret
英文:

I want to replace the lowest byte in an integer. On x86 this is exactly mov al, [mem] but I can't seem to get compilers to output this. Am I missing an obvious code pattern that is recognized, am I misunderstanding something, or is this simply a missed optimization?

unsigned insert_1(const unsigned* a, const unsigned char* b)
{
    return (*a & ~255) | *b;
}
unsigned insert_2(const unsigned* a, const unsigned char* b)
{
    return *a >> 8 << 8 | *b;
}

GCC actually uses al but just for zeroing.

        mov     eax, DWORD PTR [rdi]
        movzx   edx, BYTE PTR [rsi]
        xor     al, al
        or      eax, edx
        ret

Clang compiles both practically verbatim

        mov     ecx, -256
        and     ecx, dword ptr [rdi]
        movzx   eax, byte ptr [rsi]
        or      eax, ecx
        ret

答案1

得分: 7

在x86上,这就是mov al, [mem],但我似乎无法让编译器输出这个。

尝试这个,不涉及算术操作:

unsigned insert_4(const unsigned* a, const unsigned char* b)
{
    unsigned int t = *a;
    unsigned char *tcp = (unsigned char *) & t;
    tcp[0] = *b;
    return t;
}
insert_4(unsigned int const*, unsigned char const*):
        mov     eax, DWORD PTR [rdi]
        mov     al, BYTE PTR [rsi]
        ret

有点混乱,我知道,但编译器很擅长消除间接引用和对局部变量取地址的操作(虽然经过了几次尝试)。

使用联合体的另一种方法:

unsigned insert_5(const unsigned* a, const unsigned char* b)
{
    union {
        unsigned int ui;
        unsigned char uc;
    } u;
    u.ui = *a;
    u.uc = *b;
    return u.ui;
}

请注意,这些解决方案是特定于端序的,但似乎符合您的需求,如有需要可以调整为其他端序。

英文:

> On x86 this is exactly mov al, [mem] but I can't seem to get compilers to output this.

Try this one, arithmetic-free:

unsigned insert_4(const unsigned* a, const unsigned char* b)
{
    unsigned int t = *a;
    unsigned char *tcp = (unsigned char *) & t;
    tcp[0] = *b;
    return t;
}


insert_4(unsigned int const*, unsigned char const*):
        mov     eax, DWORD PTR [rdi]
        mov     al, BYTE PTR [rsi]
        ret

A bit screwy, I know but the compilers are good at removing indirection and address taken for local variables (took a couple of tries though..).

godbolt x86-64 gcc 13.1 -O3


An alternative using union:

unsigned insert_5(const unsigned* a, const unsigned char* b)
{
    union {
        unsigned int ui;
        unsigned char uc;
    } u;
    u.ui = *a;
    u.uc = *b;
    return u.ui;
}

godbolt x86-64 gcc 13.1 -O3


Note, these solutions are endian-specific, though it seems like what you're looking for, and, as needed can be adjusted for the other endian.

huangapple
  • 本文由 发表于 2023年6月22日 18:38:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76531041.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定