如何检查寄存器是否包含零字节而不使用SIMD指令

huangapple go评论115阅读模式
英文:

How to check if a register contains a zero byte without SIMD instructions

问题

在x64架构中,给定一个64位通用寄存器(不是xmm寄存器),其中填充了一字节无符号值。如何在不使用SSE指令的情况下同时检查它是否为零值?

是否有一种并行的方式来进行检查,而不是以4位步长迭代寄存器?

我尝试使用特定的64位掩码进行比较,但未成功。

英文:

Given a 64 Bit general purpose register (Not a xmm register) in x64 architecture, filled with one byte unsigned values. How can I check it for a zero value simultaneously without using SSE instructions?

Is there a way to do so in a parallel way, without iterating over the register in 4 bit steps?

I tried to compare it with certain 64-bit masks but it is not working.

答案1

得分: 1

以下是翻译好的部分:

技术上,你可以这样做:
<!-- language-all: lang-cpp -->
	// 如果整数中的任意一个字节为0,则为true
	bool anyZeroByte(uint64_t v)
	{
		// 计算每个字节中的8位的按位或
		v |= (v >> 4) & 0x0F0F0F0F0F0F0F0Full;
		v |= (v >> 2) & 0x0303030303030303ull;
		constexpr uint64_t lowMask = 0x0101010101010101ull;
		v |= (v >> 1) & lowMask;
		// 隔离最低位
		v &= lowMask;
		// 现在这些位对于零字节为0,对于非零字节为1;
		// 反转该位
		v ^= lowMask;
		// 现在这些位对于零字节为1,对于非零字节为0
		// 计算结果
		return 0 != v;
	}

然而,SIMD将更快。在x64体系结构上SSE是绝对要求,世界上所有的AMD64处理器都必须支持SSE1和SSE2。这是SSE2版本

	bool anyZeroByteSse2(uint64_t v)
	{
		__m128i vec = _mm_cvtsi64_si128((int64_t)v);
		__m128i zero = _mm_setzero_si128();
		__m128i eq = _mm_cmpeq_epi8(vec, zero);
		return 0 != (_mm_movemask_epi8(eq) & 0xFF);
	}

这只需要6条指令,而不是16条:[链接][1]

[1]: https://godbolt.org/z/v6e5cvsc8
英文:

Technically, you could do something like that:
<!-- language-all: lang-cpp -->
// True if any of the 8 bytes in the integer is 0
bool anyZeroByte( uint64_t v )
{
// Compute bitwise OR of 8 bits in each byte
v |= ( v >> 4 ) & 0x0F0F0F0F0F0F0F0Full;
v |= ( v >> 2 ) & 0x0303030303030303ull;
constexpr uint64_t lowMask = 0x0101010101010101ull;
v |= ( v >> 1 ) & lowMask;
// Isolate the lowest bit
v &= lowMask;
// Now these bits are 0 for zero bytes, 1 for non-zero;
// Invert that bit
v ^= lowMask;
// Now these bits are 1 for zero bytes, 0 for non-zero
// Compute the result
return 0 != v;
}

However, SIMD gonna be way faster. SSE is an absolute requirement on x64 architecture, all AMD64 processors in the world are required to support SSE1 and SSE2. Here’s SSE2 version:

bool anyZeroByteSse2( uint64_t v )
{
	__m128i vec = _mm_cvtsi64_si128( (int64_t)v );
	__m128i zero = _mm_setzero_si128();
	__m128i eq = _mm_cmpeq_epi8( vec, zero );
	return 0 != ( _mm_movemask_epi8( eq ) &amp; 0xFF );
}

That’s 6 instructions instead of 16: link.

huangapple
  • 本文由 发表于 2023年6月1日 20:56:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76382122.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定