寻找百万位数中连续1位数的最快方法

huangapple go评论70阅读模式
英文:

Fast way to find max number of consecutive 1-bits in million-bit numbers

问题

例如,二进制中的123456789是111010110111100110100010101。连续1位的最大数量是4。我对高效解决非常大的数字(百万位甚至更多)的问题产生了兴趣,我想出了这个方法:

def onebits(n):
    ctr = 0
    while n:
        n &= n >> 1
        ctr += 1
    return ctr

n &= n >> 1 同时截断了每个连续1位的顶部1位。我重复这个过程,直到每个连续1位的串都消失,计算了需要多少步。例如(都是二进制):

   11101011(起始值)
->  1100001
->   100000
->        0

这需要三个步骤,因为最长的连续1位串有三个1位。

对于随机数字,其中连续1位串很短,这是相当快的(在该基准测试中为Kelly3)。但对于具有长连续1位串的数字,它需要高达O(b^2)时间,其中b是n的位数。我们能做得更好吗?

英文:

For example, 123456789 in binary is 111010110111100110100010101. The max number of consecutive 1-bits is 4. I got interested in solving that efficiently for very large numbers (million bits or even more). I came up with this:

def onebits(n):
    ctr = 0
    while n:
        n &= n >> 1
        ctr += 1
    return ctr

The n &= n >> 1 simultaneously cuts off the top 1-bit of every streak of consecutive 1-bits. I repeat that until each streak is gone, counting how many steps it took. For example (all binary):

   11101011 (start value)
->  1100001
->   100000
->        0

It took three steps because the longest streak had three 1-bits.

For random numbers, where streaks are short, this is pretty fast (Kelly3 in that benchmark). But for numbers with long streaks, it takes up to O(b²) time, where b is the size of n in bits. Can we do better?

答案1

得分: 3

是的,我们可以通过进一步发展这个想法以O(b log b)的时间来完成。使用指数搜索

注意,通过截断每个连续1比特序列的最高1比特,也会扩大连续1比特序列之间的间隙。最初,连续1比特序列之间至少被一个0比特分隔开。在截断每个连续1比特序列的第一个1比特之后,连续1比特序列现在至少被两个0比特分隔开。

然后,我们可以执行 n &= n >> 2 来截断所有剩余连续1比特序列的前两个1比特。这也会将间隙扩大至至少四个0比特。

我们继续从每个连续1比特序列的开头截断4、8、16、32等1比特,只要我们仍然有任何1比特序列存在。

假设当尝试截断32时,我们发现没有连续1比特序列剩下。在这一点上,我们切换到反向模式。尝试截断16,然后8、4、2,最后是1。但只保留仍然让我们拥有连续1比特序列的切割。

由于我们只保留了仍然留下一些连续1比特序列的切割,所以最终我们会得到一个长度为1的连续1比特序列(或连续1比特序列),所以我们将其添加到总数中。

这是代码部分:

def onebits_linearithmic(n):
    if n == 0:
        return 0
    total = 0
    cut = 1
    while m := n & (n >> cut):
        n = m
        total += cut
        cut *= 2
    while cut := cut // 2:
        if m := n & (n >> cut):
            n = m
            total += cut
    return total + 1

使用随机的1000000比特数字进行基准测试:

  0.60 ± 0.06 ms  onebits_linearithmic
  1.13 ± 0.08 ms  onebits_quadratic
 48.57 ± 1.25 ms  onebits_linear

我还包括了一个使用字符串的线性时间解决方案,但其隐藏的常数要高得多,所以它仍然慢得多。

接下来,使用随机的100000比特数字,其中有50000比特的1比特序列

  0.13 ± 0.05 ms  onebits_linearithmic
  2.37 ± 0.60 ms  onebits_linear
176.23 ± 7.82 ms  onebits_quadratic

显然,二次方解法变得慢得多。其他两个保持较快,所以让我们尝试使用随机的1000000比特数字,其中有500000比特的1比特序列

  1.36 ± 0.06 ms  onebits_linearithmic
 24.69 ± 0.84 ms  onebits_linear

完整的代码可以在[此处尝试](https://ato.pxeger.com/run?1=jVXLjtMwFN2xyFdchDSxh_SRMiAUTWaDBD_Arqoit7FnoomdYDvMjEq_hA0b2PMLfAZfw7WdtJlOQURqkt7HOfdxrHz90T7Ym0Z9-_a9s2Ly9vezDyUX0Ci-rqwpPnWs1MxWG6JoFgFeG6shh7l_v7upag4qONyl4CzH29UVpHubS3iZ9wbNbaeVs0XRmKeuFGe6sjdyRFUJxMqR7EDQ5wd621hW74vZdBYGmlCYhMxVcwbEl4QBdFxqDnL_L0BhmRh0qBwRz3NYjCCdKfNRMJvB4gCHtf6T7inlSdq-v96O3ZyY0n4-fbBk90SyltRcJSDircrWu3hq2rqyJJ7HlNIoEp3aGKR_stfk5AqOrVFUybbRFjRTZSMjoRsJtpK8sjB4eMuZDR5jEdoguhm8kjMsztiSf46iF_Cu0ZpvrOLGRKLROJlKwfIc0a85Sed40QTOSWCbXnPr3lw93knB5RQu55BB6cpNhd-3CIyd-o6X8xVOC80uQbiEYE6z1V5iAiPged5nDhtrdaVwfIJVdRbjWKdFoZjkRUF9gF-L5cYSV1UCqpNrrhPoVGWxzw2reQJ1o64LYzVnt0kg9otzsm7s2BtIA6WIQ9ewdchZspvgs8c3WezoeW34f6fAHa40eGazxeAMvNAISCc-CYER0a3UyWQrMliujqa2wwDXttuuIaIXoXXxSzwooWufY12Ox1qK1fJ1tlqN9YoSdXIgFsfxZroQO_j1E7ZeG9524W1bN8odxFG_vUfr7rndgTqhEXejw3qfjNmlfcmBkBQuL4H0g6EUJpDSkekiQDyewQDhFCYrRYLqSc3kumSZ19Ighjw8EHjWW4bcfjBT1rZclcTSaKxQg-eFl8TTJXDLH3I_cDre-LCCY2EGL-J5aaYFngv3SyBNIJYGhZzyVwm8Z6igQZJ97OnIj7o7

英文:

Yes, we can do it in O(b log b) time by developing that idea further.
With an exponential search.

Note that by cutting off each streak's top 1-bit, that also widens the gaps between streaks. Originally, streaks were separated by at least one 0-bit. After cutting off each streak's first 1-bit, streaks are now separated by at least two 0-bits.

We can then do n &= n >> 2 to cut off the first two 1-bits of all remaining streaks. Which also widens the gaps to at least four 0-bits.

We continue cutting off 4, 8, 16, 32, etc 1-bits from the start of each streak, as long as we still have any 1-bit streaks remaining.

Let's say when trying to cut off 32, we find that we have no streak left. At this point we switch to reverse mode. Try cutting off 16 instead. Then 8, 4, 2, and finally 1. But only keep the cuts that still leave us a streak.

Since we only kept cuts that left some streak, we end up with a streak (or streaks) of length 1, so we add that to the total.

The code:

def onebits_linearithmic(n):
    if n == 0:
        return 0
    total = 0
    cut = 1
    while m := n & (n >> cut):
        n = m
        total += cut
        cut *= 2
    while cut := cut // 2:
        if m := n & (n >> cut):
            n = m
            total += cut
    return total + 1

Benchmark with random 1,000,000-bit numbers:

  0.60 ± 0.06 ms  onebits_linearithmic
  1.13 ± 0.08 ms  onebits_quadratic
 48.57 ± 1.25 ms  onebits_linear

I included a linear-time solution using strings, but its hidden constant is much higher, so it's still much slower.

Next, random 100,000-bit numbers with 50,000-bit streak of 1-bits:

  0.13 ± 0.05 ms  onebits_linearithmic
  2.37 ± 0.60 ms  onebits_linear
176.23 ± 7.82 ms  onebits_quadratic

The quadratic solution indeed became much slower. The other two remain fast, so let's try them with random 1,000,000-bit numbers with 500,000-bit streak of 1-bits:

  1.36 ± 0.06 ms  onebits_linearithmic
 24.69 ± 0.84 ms  onebits_linear

Full code (Attempt This Online!):

def onebits_quadratic(n):
    ctr = 0
    while n:
        n &= n >> 1
        ctr += 1
    return ctr

def onebits_linearithmic(n):
    if n == 0:
        return 0
    total = 0
    cut = 1
    while m := n & (n >> cut):
        n = m
        total += cut
        cut *= 2
    while cut := cut // 2:
        if m := n & (n >> cut):
            n = m
            total += cut
    return total + 1

def onebits_linear(n):
    return max(map(len, f'{n:b}'.split('0')))

funcs = onebits_quadratic, onebits_linearithmic, onebits_linear

import random
from timeit import repeat
from statistics import mean, stdev

# Correctness
for n in [*range(10000), *(random.getrandbits(1000) for _ in range(1000))]:
  expect = funcs[0](n)
  for f in funcs[1:]:
    if f(n) != expect:
      print('fail:', f.__name__)
    
def test(bits, number, unit, scale, long_streak, funcs):
  if not long_streak:
    print(f'random {bits:,}-bit numbers:')
  else:
    print(f'random {bits:,}-bit numbers with {bits//2:,}-bit streak of 1-bits:')

  times = {f: [] for f in funcs}
  def stats(f):
    ts = [t * scale for t in times[f][5:]]
    return f'{mean(ts):6.2f} ± {stdev(ts):4.2f} {unit} '

  for _ in range(10):
    n = random.getrandbits(bits)
    if long_streak:
      n |= ((1 << (bits//2)) - 1) << (bits//4)
    for f in funcs:
      t = min(repeat(lambda: f(n), number=number)) / number
      times[f].append(t)

  for f in sorted(funcs, key=stats):
    print(stats(f), f.__name__)
  print()

test(1_000_000, 1, 'ms', 1e3, False, funcs)
test(100_000, 1, 'ms', 1e3, True, funcs)
test(1_000_000, 1, 'ms', 1e3, True, funcs[1:])

答案2

得分: 2

以下是您要翻译的内容:

Found your problem interesting.

I checked simple, loop-based approach. It is O(b). Because of excessive use of loops, had to use numba to achieve similar results in terms of timings.

import math
import numba

def loop_based(n: int):
    bytes_array = (n).to_bytes(int(math.log2(n+1) // 8 + 1), "big")
    return optimized_part(bytes_array)

@numba.njit
def optimized_part(bytes_array):
    bits = [1, 2, 4, 8, 16, 32, 64, 128]
    longest = 0
    current = 0
    idx = len(bytes_array) - 1
    while idx >= 0:
        current_byte = bytes_array[idx]
        for bit in bits:
            if bit & current_byte:
                current += 1
            else:
                if current > longest:
                    longest = current
                current = 0
        idx -= 1
    if current > longest:
        longest = current
    return longest

Results

random 1,000,000-bit numbers:
  0.38 ± 0.01 ms  onebits_linearithmic
  0.66 ± 0.01 ms  loop_based
  0.77 ± 0.08 ms  onebits_quadratic
 26.89 ± 0.40 ms  onebits_linear
 51.06 ± 0.43 ms  loop_based (without numba)

random 100,000-bit numbers with 50,000-bit streak of 1-bits:
  0.07 ± 0.00 ms  onebits_linearithmic
  0.07 ± 0.02 ms  loop_based
  0.99 ± 0.02 ms  onebits_linear
  4.99 ± 0.06 ms  loop_based (without numba)
109.81 ± 0.43 ms  onebits_quadratic

random 1,000,000-bit numbers with 500,000-bit streak of 1-bits:
  0.65 ± 0.01 ms  loop_based
  0.82 ± 0.02 ms  onebits_linearithmic
 13.45 ± 0.09 ms  onebits_linear
 49.56 ± 0.56 ms  loop_based (without numba)

It could be optimized for random bits case by:

  • jumping n_current_longest_bits_streak ahead,
  • iterating backward
  • breaking this backward loop when 0 found.
英文:

Found your problem interesting.

I checked simple, loop-based approach. It is O(b).
Becasue of excesive use of loops, had to use numba to achieve similar results in terms of timings.

import math
import numba
def loop_based(n: int):
bytes_array = (n).to_bytes(int(math.log2(n+1) // 8 + 1), "big")
return optimized_part(bytes_array)
@numba.njit
def optimized_part(bytes_array):
bits = [1, 2, 4, 8, 16, 32, 64, 128]
longest = 0
current = 0
idx = len(bytes_array) - 1
while idx >= 0:
current_byte = bytes_array[idx]
for bit in bits:
if bit & current_byte:
current += 1
else:
if current > longest:
longest = current
current = 0
idx -= 1
if current > longest:
longest = current
return longest

Results

random 1,000,000-bit numbers:
0.38 ± 0.01 ms  onebits_linearithmic
0.66 ± 0.01 ms  loop_based
0.77 ± 0.08 ms  onebits_quadratic
26.89 ± 0.40 ms  onebits_linear
51.06 ± 0.43 ms  loop_based (without numba)
random 100,000-bit numbers with 50,000-bit streak of 1-bits:
0.07 ± 0.00 ms  onebits_linearithmic
0.07 ± 0.02 ms  loop_based
0.99 ± 0.02 ms  onebits_linear
4.99 ± 0.06 ms  loop_based (without numba)
109.81 ± 0.43 ms  onebits_quadratic
random 1,000,000-bit numbers with 500,000-bit streak of 1-bits:
0.65 ± 0.01 ms  loop_based
0.82 ± 0.02 ms  onebits_linearithmic
13.45 ± 0.09 ms  onebits_linear
49.56 ± 0.56 ms  loop_based (without numba)

It could be optimized for random bits case, by:

  • jumping n_current_longest_bits_streak ahead,
  • iterating backwards
  • breaking this backwards loop when 0 found.

huangapple
  • 本文由 发表于 2023年3月4日 04:00:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75631407.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定