The fastest way to convert a UInt64 hex string to a UInt32 value preserving as many leading digits as possible, i.e. truncation

huangapple go评论76阅读模式
英文:

The fastest way to convert a UInt64 hex string to a UInt32 value preserving as many leading digits as possible, i.e. truncation

问题

"我正在寻找将表示ulong的十六进制字符串解析为uint的最快方法,保留尽可能多的前导数字并丢弃其余部分。例如,

string hex = "0xab54a9a1df8a0edb"; // 12345678991234567899
应该输出:uint result = 1234567899;

我可以通过简单地将十六进制解析为ulong,使用ToString获取数字,然后只取尽可能多的数字以适应uint而不会溢出,但我需要更快的方法。谢谢。最好使用C#代码,但任何代码都可以。"

英文:

I'm looking for the fastest way to parse a hex string representing a ulong into a uint keeping as many leading digits as a uint can handle and discarding the rest. For example,

string hex = "0xab54a9a1df8a0edb"; // 12345678991234567899
Should output: uint result = 1234567899;

I can do this by simply parsing the hex into a ulong, getting the digits using ToString and then just taking as many of them as would fit into uint without overflowing but I need something much faster. Thanks. C# code preferred but any would do.

答案1

得分: 2

对于十进制截断,十六进制数字的高位影响低9或10个十进制数字,因此您需要将整个内容转换。https://stackoverflow.com/questions/67054154/is-there-an-algorithm-to-convert-massive-hex-string-to-bytes-stream-quickly-asm/67169220#67169220 具有使用SSE指令集的C++代码。我在那里评论了一些可能的改进,以及https://github.com/zbjornson/fast-hex。如果您正在使用SIMD在较大的缓冲区中查找数字文字,那么这可能特别有用,因此您可能已经在SIMD寄存器中具有十六进制字符串。(不确定SIMDJSON是否这样做。)

将十六进制字符串转换为64位整数确实可以通过SIMD来加速,例如将每个数字映射到0-15的整数,将字节对组合以打包半字节(例如使用x86的pmaddubsw),然后将这些8位块洗牌到寄存器的底部。(例如packuswbpshufb)。至少在x86上,将SIMD有效地移动到通用目的整数寄存器movq rax, xmm0,尽管一些ARM CPU上的ARM等效性较慢。

(如果您的字符串长度固定,并且可能不需要检查不是十六进制数字的无效字符,那么通过SIMD加速ASCII十六进制->无符号整数的速度提升要容易得多。)


u64(C# ulong)的十进制截断以适应u32(C# uint

对某个10的幂取模将截断为一定数量的十进制数字。

对于某些数字,(uint)(x % 10000000000) 可以工作,但是10000000000(1e10 = 后面跟着10个零的1)大于2^32-1。考虑输入像0x2540be3ff9999999999)。我们将得到(uint)9999999999 产生 1410065407 = 0x540be3ff(保留那个34位数字的低32位)。

因此,也许尝试模10的10次方,但如果它对于u32来说太大了,那就模10的9次方。

ulong tendigit = x % 10000000000;  // 1e10
uint truncated = tendigit <= (ulong)0xffffffff ? (uint)tendigit : (uint)(x % 1000000000);  // % 1e9 保留9位十进制数字

如果这不是正确的C#语法或文字需要一些修饰以使它们成为ulong(例如C的10000000000uLL),请告诉我。

直接对原始数字进行两种不同方式的模运算可能至少与尝试获取x % 1e10的前导十进制数字并减去它或其他操作效率相当。汇编语言将需要两个64位乘法逆元常数,从原始数字重新开始,如果分支预测需要计算九位截断,将关键路径延迟保持较短。


二进制截断

@Matthew Whited删除了他的回答(由于十进制截断部分的错误),但基于原始十六进制输入的子字符串的二进制截断部分在某些情况下可能比进行完全转换然后强制转换为较窄类型或使用AND进行掩码更高效。

如果您想要十六进制字符串的最后8个字节

uint.Parse(hex[^8..], NumberStyles.HexNumber)

如果您想要前8个字节

uint.Parse(hex[2..10], NumberStyles.HexNumber);
英文:

For decimal truncation, all the high bits of the hex digit affect the low 9 or 10 decimal digits, so you need to convert the whole thing. https://stackoverflow.com/questions/67054154/is-there-an-algorithm-to-convert-massive-hex-string-to-bytes-stream-quickly-asm/67169220#67169220 has C++ with SSE intrinsics. I commented there with some possible improvements to that, and to https://github.com/zbjornson/fast-hex . This could be especially good if you're using SIMD to find numeric literals in larger buffers, so you might have the hex string in a SIMD register already. (Not sure if SIMDJSON does that.)

Hex-string to 64-bit integer is something SIMD certainly can speed up, e.g. do something to map each digit to a 0-15 integer, combine pairs of bytes to pack nibbles (e.g. with x86 pmaddubsw), then shuffle those 8-bit chunks to the bottom of a register. (e.g. packuswb or pshufb). x86 at least has efficient SIMD to GP-integer movq rax, xmm0, although the ARM equivalent is slow on some ARM CPUs.

(Getting a speedup from SIMD for ASCII hex -> uint is much easier if your strings are fixed-length, and probably if you don't need to check for invalid characters that aren't hex digits.)


Decimal truncation of u64 (C# ulong) to fit in u32 (C# uint)

Modulo by a power of 10 truncates to some number of decimal digits.

(uint)(x % 10000000000) works for some numbers, but 10000000000 (1e10 = one followed by 10 zeros) is larger than 2^32-1. Consider an input like 0x2540be3ff (9999999999). We'd get (uint)9999999999 producing 1410065407 = 0x540be3ff (keeping the low 32 bits of that 34-bit number.)

So perhaps try modulo 1e10, but if it's too big for u32 then modulo 1e9.

  ulong tendigit = x % 10000000000;  // 1e10
  uint truncated = tendigit &lt;= (ulong)0xffffffff ? tendigit : (x % 1000000000);  // % 1e9 keeps 9 decimal digits

If this isn't correct C# syntax or the literals need some decoration to make them ulong (like C 10000000000uLL for good measure), please let me know.

It's probably at least as efficient to just modulo the original number two different ways than to try to get the leading decimal digit of x % 1e10 and subtract it or whatever. The asm is going to need two 64-bit multiplicative inverse constants, and starting from the original number again keeps critical-path latency shorter for out-of-order exec if branch prediction predicts that it needs to calculate the nine-digit truncation.


Binary truncation

@Matthew Whited deleted his answer (due to a bug in the decimal truncation part), but his binary truncation part based on substrings of the original hex input could perhaps be more efficient in some cases than doing the full conversion and then casting to a narrower type or masking with AND.

> If you want the last 8 bytes of the hex string
>
> uint.Parse(hex[^8..],NumberStyles.HexNumber)
>
> If you want the first 8 bytes
>
> uint.Parse(hex[2..10], NumberStyles.HexNumber);

huangapple
  • 本文由 发表于 2023年6月1日 12:27:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76378661.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定