英文:
How to check if a register contains a zero byte without SIMD instructions
问题
在x64架构中,给定一个64位通用寄存器(不是xmm寄存器),其中填充了一字节无符号值。如何在不使用SSE指令的情况下同时检查它是否为零值?
是否有一种并行的方式来进行检查,而不是以4位步长迭代寄存器?
我尝试使用特定的64位掩码进行比较,但未成功。
英文:
Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE instructions?
Is there a way to do so in a parallel way, without iterating over the register in 4 bit steps?
I tried to compare it with certain 64-bit masks but it is not working.
答案1
得分: 1
以下是翻译好的部分:
技术上,你可以这样做:
<!-- language-all: lang-cpp -->
// 如果整数中的任意一个字节为0,则为true
bool anyZeroByte(uint64_t v)
{
// 计算每个字节中的8位的按位或
v |= (v >> 4) & 0x0F0F0F0F0F0F0F0Full;
v |= (v >> 2) & 0x0303030303030303ull;
constexpr uint64_t lowMask = 0x0101010101010101ull;
v |= (v >> 1) & lowMask;
// 隔离最低位
v &= lowMask;
// 现在这些位对于零字节为0,对于非零字节为1;
// 反转该位
v ^= lowMask;
// 现在这些位对于零字节为1,对于非零字节为0
// 计算结果
return 0 != v;
}
然而,SIMD将更快。在x64体系结构上,SSE是绝对要求,世界上所有的AMD64处理器都必须支持SSE1和SSE2。这是SSE2版本:
bool anyZeroByteSse2(uint64_t v)
{
__m128i vec = _mm_cvtsi64_si128((int64_t)v);
__m128i zero = _mm_setzero_si128();
__m128i eq = _mm_cmpeq_epi8(vec, zero);
return 0 != (_mm_movemask_epi8(eq) & 0xFF);
}
这只需要6条指令,而不是16条:[链接][1]。
[1]: https://godbolt.org/z/v6e5cvsc8
英文:
Technically, you could do something like that:
<!-- language-all: lang-cpp -->
// True if any of the 8 bytes in the integer is 0
bool anyZeroByte( uint64_t v )
{
// Compute bitwise OR of 8 bits in each byte
v |= ( v >> 4 ) & 0x0F0F0F0F0F0F0F0Full;
v |= ( v >> 2 ) & 0x0303030303030303ull;
constexpr uint64_t lowMask = 0x0101010101010101ull;
v |= ( v >> 1 ) & lowMask;
// Isolate the lowest bit
v &= lowMask;
// Now these bits are 0 for zero bytes, 1 for non-zero;
// Invert that bit
v ^= lowMask;
// Now these bits are 1 for zero bytes, 0 for non-zero
// Compute the result
return 0 != v;
}
However, SIMD gonna be way faster. SSE is an absolute requirement on x64 architecture, all AMD64 processors in the world are required to support SSE1 and SSE2. Here’s SSE2 version:
bool anyZeroByteSse2( uint64_t v )
{
__m128i vec = _mm_cvtsi64_si128( (int64_t)v );
__m128i zero = _mm_setzero_si128();
__m128i eq = _mm_cmpeq_epi8( vec, zero );
return 0 != ( _mm_movemask_epi8( eq ) & 0xFF );
}
That’s 6 instructions instead of 16: link.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论