将有符号的32位整数存储到无符号的64位整数中。

huangapple go评论68阅读模式
英文:

Store signed 32-bit in unsigned 64-bit int

问题

基本上,我想要的是将一个有符号的32位整数储存在一个无符号的64位整数中(储存在最右边的32位),因为我想要使用最左边的32位来进行其他用途。

我现在正在做的是一个简单的强制类型转换和掩码操作:

#define packInt32(X) ((uint64_t)X | INT_MASK)

但这种方法有一个明显的问题:如果 X 是正整数(最高位未设置),一切都正常。但如果是负数,就会变得混乱。

问题是:

如何以最快速和最高效的方式实现上述目标,同时支持负数?

英文:

Basically, what I want is to "store" a signed 32-bit int inside (in the 32 rightmost bits) an unsigned 64-bit int - since I want to use the leftmost 32 bits for other purposes.

What I'm doing right now is a simple cast and mask:

#define packInt32(X) ((uint64_t)X | INT_MASK)

But this approach has an obvious issue: If X is a positive int (the first bit is not set), everything goes fine. If it's negative, it becomes messy.


The question is:

How to achieve the above, also supporting negative numbers, in the fastest and most-efficient way?

答案1

得分: 9

"mess"一词是因为您将一个小的有符号类型转换为一个大的无符号类型而发生的。
在这个转换过程中,首先使用符号扩展调整大小。这就是导致问题的原因。

您可以先将(有符号)整数转换为相同大小的无符号类型。然后将其转换为64位不会触发符号扩展:

#define packInt32(X) ((uint64_t)(uint32_t)(X) | INT_MASK)
英文:

The "mess" you mention happens because you cast a small signed type to a large unsigned type.
During this conversion the size is adjusted first with applying sign extension. This is what causes your trouble.

You can simply cast the (signed) integer to an unsigned type of same size first. Then casting to 64 bit will not trigger sign extension:

#define packInt32(X) ((uint64_t)(uint32_t)(X) | INT_MASK)

答案2

得分: 5

你需要屏蔽掉除了低位的32位以外的所有位。你可以通过按位与操作来实现:

#define packInt32(X) (((uint64_t)(X) & 0xFFFFFFFF) | INT_MASK)
英文:

You need to mask out any bits besides the low order 32 bits. You can do that with a bitwise AND:

#define packInt32(X) (((uint64_t)(X) & 0xFFFFFFFF) | INT_MASK)

答案3

得分: 1

负的32位整数会被扩展为64位。

#include <stdint.h>
uint64_t movsx(int32_t X) { return X; }

在x86-64上的movsx:

movsx:
        movsx   rax, edi
        ret

屏蔽掉高32位将导致其仅被零扩展:

#include <stdint.h>
uint64_t mov(int32_t X) { return (uint64_t)X & 0xFFFFFFFF; }
// 或者 uint64_t mov(int32_t X) { return (uint64_t)(uint32_t)X; }

在x86-64上的mov:

mov:
        mov     eax, edi
        ret

无论哪种方法都不会丢失低32位的信息,因此任何一种方法都是将32位整数存储为64位整数的有效方式。

对于普通的mov,x86-64代码要短一个字节(3字节对4字节)。我认为速度差异不会很大,但如果有差异,我预计普通的mov会略胜一筹。

这里可以查看示例代码。

英文:

A negative 32-bit integer will get sign-extended into 64-bits.

#include &lt;stdint.h&gt;
uint64_t movsx(int32_t X) { return X; }

movsx on x86-64:

movsx:
        movsx   rax, edi
        ret

Masking out the higher 32-bits will remove cause it to be just zero-extended:

#include &lt;stdint.h&gt;
uint64_t mov(int32_t X) { return (uint64_t)X &amp; 0xFFFFFFFF; }
//or uint64_t mov(int32_t X) { return (uint64_t)(uint32_t)X; }

mov on x86-64:

mov:
        mov     eax, edi
        ret

https://gcc.godbolt.org/z/fihCmt

Neither method loses any info from the lower 32-bits, so either method is a valid way of storing a 32-bit integer into a 64-bit one.

The x86-64 code for a plain mov is one byte shorter (3 bytes vs 4). I don't think there should be much of a speed difference, but if there is one, I'd expect the plain mov to win by a tiny bit.

答案4

得分: 1

另一种选择是在读回时解开符号扩展和上限值的混合,但这可能会变得混乱。

另一种选择是构建一个带有位打包字的联合体。然后将问题推迟给编译器来优化:

union {
  int64_t merged;
  struct {
     int64_t field1:32,
             field2:32;
  };
};

第三种选择是自己处理符号位。存储一个15位的绝对值和一个1位的符号位。虽然效率不高,但如果你遇到不支持2的补码的处理器,其中负数不能安全地转换为无符号数,这种方法更有可能合法。这种处理器非常罕见,所以我自己不会担心这个问题。

英文:

One option is to untangle the sign-extension and the upper value when it is read back, but that can be messy.

Another option is to construct a union with a bit-packed word. This then defers the problem to the compiler to optimise:

union {
  int64_t merged;
  struct {
     int64_t field1:32,
             field2:32;
  };
};

A third option is to deal with the sign bit yourself. Store a 15-bit absolute value and a 1-bit sign. Not super-efficient, but more likely to be legal if you should ever encounter a non-2's-complement processor where negative signed values can't be safely cast to unsigned. They are rare as hens teeth, so I wouldn't worry about this myself.

答案5

得分: 0

假设对64位值的唯一操作是将其转换回32位(可能是存储/显示它),则无需应用掩码。编译器将在将其转换为64位时对32位属性进行符号扩展,并且在将64位值转换回32位时将选择最低的32位。

#define packInt32(X) ((uint64_t)(X))
#define unpackInt32(X) ((int)(X))

或者更好的方式是使用(内联)函数:

inline uint64_t packInt32(int x) { return ((uint64_t) x); }
inline int unpackInt32(uint64_t x) { return ((int) x); }
英文:

Assuming that the only operation on the 64 bit value will be to convert it back to 32 (and potentially, storing/displaying it), there is no need to apply a mask. The compiler will sign extend the 32 bit attributes when casting it to 64 bit, and will pick the lowest 32 bit when casting the 64 bit value back to 32 bit.

#define packInt32(X) ((uint64_t)(X))
#define unpackInt32(X) ((int)(X))

Or better, using (inline) functions:

inline uint64_t packInt32(int x) { return ((uint64_t) x) ; }
inline int unpackInt32(uint64_t x) { return ((int) x) ; }

huangapple
  • 本文由 发表于 2020年1月6日 21:16:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612828.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定