Partition numpy array in-place by condition.

huangapple go评论71阅读模式
英文:

Partition numpy array in-place by condition

问题

I have a 1d array of u64 ints. I need to partition it based on given bit inplace.

In pure python it's easy two pointer problem (similar to partitioning in quicksort), but I wonder if it possible to do it efficiently using numpy.

Lets say I have:

arr = np.arange(720, dtype=np.uint64)  # a lot of 64bit unsigned ints
np.random.shuffle(arr)  # in unknown order
left = arr[arr & 32 == 0]  # all elements having 5th bit 0 in any order
right = arr[arr & 32 == 32]  # all elements having 5th bit 1 in any order
arr[:] = np.concatenate([left, right])

I would also need to know the index of the partition (aka. len(left) above).

英文:

I have a 1d array of u64 ints. I need to partition it based on given bit inplace.

In pure python it's easy two pointer problem (similar to partitioning in quicksort), but I wonder if it possible to do it efficiently using numpy.

Lets say I have:

arr = np.arange(720, dtype=np.uint64)  # a lot of 64bit unsigned ints
np.random.shuffle(arr)  # in unknown order
left = arr[arr & 32 == 0]  # all elements having 5th bit 0 in any order
right = arr[arr & 32 == 32]  # all elements having 5th bit 1 in any order
arr[:] = np.concatenate([left, right])

I would also need to know the index of the partition (aka. len(left) above).

答案1

得分: 2

你可以使用argsort来为数组分配重新排序后的索引我使用最后一位而不是&32以便更容易理解偶数/奇数的结果):

    import numpy as np
    
    arr = np.arange(20)  # 大量的64位无符号整数
    np.random.shuffle(arr)  # 以未知顺序
    
    print(arr)
    # [ 2  6 15 10 12  5 11  0 14 18 19  3  7  4  1 17  9  8 16 13]
    arr[:] = arr[np.argsort(arr&1)]
    print(arr)
    # [ 2  8  4 16 14  0 18 12 10  6  5 19  3  7 15  1 17  9 11 13]

没有排序的情况下你可以设置一个左右位置的掩码通过计算掩码中的`True`值来测量左侧分区的大小然后使用掩码和反向掩码部分来分配下标

    mask = arr&1 == 0
    left = np.sum(mask)
    arr[:left],arr[left:] = arr[mask],arr[~mask].copy()

*请注意在第二部分中你必须使用.copy()因为在第二次赋值发生之前arr的内容将会发生变化*
英文:

You could use argsort to assign the array with the reordered indexes (I'm using the last bit instead of &32 to make the result easier to understand with evens/odds):

import numpy as np

arr = np.arange(20)  # a lot of 64bit unsigned ints
np.random.shuffle(arr)  # in unknown order

print(arr)
# [ 2  6 15 10 12  5 11  0 14 18 19  3  7  4  1 17  9  8 16 13]
arr[:] = arr[np.argsort(arr&1)]
print(arr)
# [ 2  8  4 16 14  0 18 12 10  6  5 19  3  7 15  1 17  9 11 13]

Without sorting, you could setup a mask of the left and right positions, measure the size of the left partition by counting the True values in the mask, then assign subscripts with the masked and inverse mask parts.

mask = arr&1 == 0
left = np.sum(mask)
arr[:left],arr[left:] = arr[mask],arr[~mask].copy()

note that you have to use .copy() for the second part because the content of arr will have changed before the second assignment takes place.

答案2

得分: 1

你可以使用 np.argsort:

result = arr[np.argsort(arr & 32)]

arr & 32 进行排序相当于对两个值进行排序:0 和 32,这正是你需要的分区。如果你需要分区的索引,

(result & 32).argmax()

这将返回数组中最大值的第一个索引,也就是排序后数据中零的数量。

英文:

You can use np.argsort:

result = arr[np.argsort(arr & 32)]

Sorting arr & 32 is sorting two values: 0 and 32, which is the partitioning you require. If you need the partition index,

(result & 32).argmax()

That will return the first index of the maximum value of the array, which is the number of zeros in the sorted data.

答案3

得分: 1

以下是要翻译的内容:

有很多来自社区的聪明解决方案(谢谢大家!),但看起来使用纯NumPy无法在O(1)的空间内实现。

我决定实现自己的分区例程并使用Numba JIT来接近NumPy的性能。

import numpy as np
from numba import njit

@njit
def partition_by_bit(nparr, bit_to_partition_on):
    n = nparr.shape[0]
    i = 0
    j = n - 1
    mask = 1 << bit_to_partition_on

    while True:
        while i < n and (nparr[i] & mask == 0):
            i += 1
        while j >= 0 and (nparr[j] & mask == mask):
            j -= 1
        if i >= j:
            break
        nparr[i], nparr[j] = nparr[j], nparr[i]

    return i

arr = np.arange(720, dtype=np.uint64)
np.random.shuffle(arr)

partition_index = partition_by_bit(arr, 5)  # O(1)空间,O(n)时间
英文:

There is a lot of cool solutions from the community (thank you all!) but looks like it is not possible to stay within O(1) space using pure numpy.

I decided to implement own partition routine and use numba jit to get closer to numpy performance.

import numpy as np
from numba import njit

@njit
def partition_by_bit(nparr, bit_to_partition_on):
    n = nparr.shape[0]
    i = 0
    j = n - 1
    mask = 1 &lt;&lt; bit_to_partition_on

    while True:
        while i &lt; n and (nparr[i] &amp; mask == 0):
            i += 1
        while j &gt;= 0 and (nparr[j] &amp; mask == mask):
            j -= 1
        if i &gt;= j:
            break
        nparr[i], nparr[j] = nparr[j], nparr[i]

    return i

arr = np.arange(720, dtype=np.uint64)
np.random.shuffle(arr)

partition_index = partition_by_bit(arr, 5)  # O(1) space, O(n) time

huangapple
  • 本文由 发表于 2023年4月6日 23:02:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/75951004.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定