2023年4月6日 23:02:31go评论78阅读模式

英文:

Partition numpy array in-place by condition

问题

I have a 1d array of u64 ints. I need to partition it based on given bit inplace.

In pure python it's easy two pointer problem (similar to partitioning in quicksort), but I wonder if it possible to do it efficiently using numpy.

Lets say I have:

arr = np.arange(720, dtype=np.uint64)  # a lot of 64bit unsigned ints
np.random.shuffle(arr)  # in unknown order
left = arr[arr & 32 == 0]  # all elements having 5th bit 0 in any order
right = arr[arr & 32 == 32]  # all elements having 5th bit 1 in any order
arr[:] = np.concatenate([left, right])

I would also need to know the index of the partition (aka. len(left) above).

英文:

I have a 1d array of u64 ints. I need to partition it based on given bit inplace.

In pure python it's easy two pointer problem (similar to partitioning in quicksort), but I wonder if it possible to do it efficiently using numpy.

Lets say I have:

arr = np.arange(720, dtype=np.uint64)  # a lot of 64bit unsigned ints
np.random.shuffle(arr)  # in unknown order
left = arr[arr &amp; 32 == 0]  # all elements having 5th bit 0 in any order
right = arr[arr &amp; 32 == 32]  # all elements having 5th bit 1 in any order
arr[:] = np.concatenate([left, right])

I would also need to know the index of the partition (aka. len(left) above).

答案1

得分: 2

你可以使用argsort来为数组分配重新排序后的索引（我使用最后一位而不是&amp;32，以便更容易理解偶数/奇数的结果）：

    import numpy as np
    
    arr = np.arange(20)  # 大量的64位无符号整数
    np.random.shuffle(arr)  # 以未知顺序
    
    print(arr)
    # [ 2  6 15 10 12  5 11  0 14 18 19  3  7  4  1 17  9  8 16 13]
    arr[:] = arr[np.argsort(arr&amp;1)]
    print(arr)
    # [ 2  8  4 16 14  0 18 12 10  6  5 19  3  7 15  1 17  9 11 13]

没有排序的情况下，你可以设置一个左右位置的掩码，通过计算掩码中的`True`值来测量左侧分区的大小，然后使用掩码和反向掩码部分来分配下标。

    mask = arr&amp;1 == 0
    left = np.sum(mask)
    arr[:left],arr[left:] = arr[mask],arr[~mask].copy()

*请注意，在第二部分中你必须使用.copy()，因为在第二次赋值发生之前，arr的内容将会发生变化。*

英文:

You could use argsort to assign the array with the reordered indexes (I'm using the last bit instead of &32 to make the result easier to understand with evens/odds):

import numpy as np

arr = np.arange(20)  # a lot of 64bit unsigned ints
np.random.shuffle(arr)  # in unknown order

print(arr)
# [ 2  6 15 10 12  5 11  0 14 18 19  3  7  4  1 17  9  8 16 13]
arr[:] = arr[np.argsort(arr&amp;1)]
print(arr)
# [ 2  8  4 16 14  0 18 12 10  6  5 19  3  7 15  1 17  9 11 13]

Without sorting, you could setup a mask of the left and right positions, measure the size of the left partition by counting the True values in the mask, then assign subscripts with the masked and inverse mask parts.

mask = arr&amp;1 == 0
left = np.sum(mask)
arr[:left],arr[left:] = arr[mask],arr[~mask].copy()

note that you have to use .copy() for the second part because the content of arr will have changed before the second assignment takes place.

答案2

得分: 1

你可以使用 np.argsort:

result = arr[np.argsort(arr & 32)]

对 arr & 32 进行排序相当于对两个值进行排序：0 和 32，这正是你需要的分区。如果你需要分区的索引，

(result & 32).argmax()

这将返回数组中最大值的第一个索引，也就是排序后数据中零的数量。

英文:

You can use np.argsort:

result = arr[np.argsort(arr &amp; 32)]

Sorting arr & 32 is sorting two values: 0 and 32, which is the partitioning you require. If you need the partition index,

(result &amp; 32).argmax()

That will return the first index of the maximum value of the array, which is the number of zeros in the sorted data.

答案3

得分: 1

以下是要翻译的内容：

有很多来自社区的聪明解决方案（谢谢大家！），但看起来使用纯NumPy无法在O(1)的空间内实现。

我决定实现自己的分区例程并使用Numba JIT来接近NumPy的性能。

import numpy as np
from numba import njit

@njit
def partition_by_bit(nparr, bit_to_partition_on):
    n = nparr.shape[0]
    i = 0
    j = n - 1
    mask = 1 << bit_to_partition_on

    while True:
        while i < n and (nparr[i] & mask == 0):
            i += 1
        while j >= 0 and (nparr[j] & mask == mask):
            j -= 1
        if i >= j:
            break
        nparr[i], nparr[j] = nparr[j], nparr[i]

    return i

arr = np.arange(720, dtype=np.uint64)
np.random.shuffle(arr)

partition_index = partition_by_bit(arr, 5)  # O(1)空间，O(n)时间

英文:

There is a lot of cool solutions from the community (thank you all!) but looks like it is not possible to stay within O(1) space using pure numpy.

I decided to implement own partition routine and use numba jit to get closer to numpy performance.

import numpy as np
from numba import njit

@njit
def partition_by_bit(nparr, bit_to_partition_on):
    n = nparr.shape[0]
    i = 0
    j = n - 1
    mask = 1 &lt;&lt; bit_to_partition_on

    while True:
        while i &lt; n and (nparr[i] &amp; mask == 0):
            i += 1
        while j &gt;= 0 and (nparr[j] &amp; mask == mask):
            j -= 1
        if i &gt;= j:
            break
        nparr[i], nparr[j] = nparr[j], nparr[i]

    return i

arr = np.arange(720, dtype=np.uint64)
np.random.shuffle(arr)

partition_index = partition_by_bit(arr, 5)  # O(1) space, O(n) time

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Partition numpy array in-place by condition.

问题

答案1

答案2

答案3

如何解决迭代中的 KeyError？

为什么应该引发 AttributeError 异常？

如何在 pandas 中获取姓和名，当姓是多个名字时。

将一列垂直拆分以创建新列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论