英文:
Partition numpy array in-place by condition
问题
I have a 1d array of u64 ints. I need to partition it based on given bit inplace.
In pure python it's easy two pointer problem (similar to partitioning in quicksort), but I wonder if it possible to do it efficiently using numpy.
Lets say I have:
arr = np.arange(720, dtype=np.uint64) # a lot of 64bit unsigned ints
np.random.shuffle(arr) # in unknown order
left = arr[arr & 32 == 0] # all elements having 5th bit 0 in any order
right = arr[arr & 32 == 32] # all elements having 5th bit 1 in any order
arr[:] = np.concatenate([left, right])
I would also need to know the index of the partition (aka. len(left)
above).
英文:
I have a 1d array of u64 ints. I need to partition it based on given bit inplace.
In pure python it's easy two pointer problem (similar to partitioning in quicksort), but I wonder if it possible to do it efficiently using numpy.
Lets say I have:
arr = np.arange(720, dtype=np.uint64) # a lot of 64bit unsigned ints
np.random.shuffle(arr) # in unknown order
left = arr[arr & 32 == 0] # all elements having 5th bit 0 in any order
right = arr[arr & 32 == 32] # all elements having 5th bit 1 in any order
arr[:] = np.concatenate([left, right])
I would also need to know the index of the partition (aka. len(left)
above).
答案1
得分: 2
你可以使用argsort来为数组分配重新排序后的索引(我使用最后一位而不是&32,以便更容易理解偶数/奇数的结果):
import numpy as np
arr = np.arange(20) # 大量的64位无符号整数
np.random.shuffle(arr) # 以未知顺序
print(arr)
# [ 2 6 15 10 12 5 11 0 14 18 19 3 7 4 1 17 9 8 16 13]
arr[:] = arr[np.argsort(arr&1)]
print(arr)
# [ 2 8 4 16 14 0 18 12 10 6 5 19 3 7 15 1 17 9 11 13]
没有排序的情况下,你可以设置一个左右位置的掩码,通过计算掩码中的`True`值来测量左侧分区的大小,然后使用掩码和反向掩码部分来分配下标。
mask = arr&1 == 0
left = np.sum(mask)
arr[:left],arr[left:] = arr[mask],arr[~mask].copy()
*请注意,在第二部分中你必须使用.copy(),因为在第二次赋值发生之前,arr的内容将会发生变化。*
英文:
You could use argsort to assign the array with the reordered indexes (I'm using the last bit instead of &32 to make the result easier to understand with evens/odds):
import numpy as np
arr = np.arange(20) # a lot of 64bit unsigned ints
np.random.shuffle(arr) # in unknown order
print(arr)
# [ 2 6 15 10 12 5 11 0 14 18 19 3 7 4 1 17 9 8 16 13]
arr[:] = arr[np.argsort(arr&1)]
print(arr)
# [ 2 8 4 16 14 0 18 12 10 6 5 19 3 7 15 1 17 9 11 13]
Without sorting, you could setup a mask of the left and right positions, measure the size of the left partition by counting the True
values in the mask, then assign subscripts with the masked and inverse mask parts.
mask = arr&1 == 0
left = np.sum(mask)
arr[:left],arr[left:] = arr[mask],arr[~mask].copy()
note that you have to use .copy() for the second part because the content of arr will have changed before the second assignment takes place.
答案2
得分: 1
你可以使用 np.argsort
:
result = arr[np.argsort(arr & 32)]
对 arr & 32
进行排序相当于对两个值进行排序:0 和 32,这正是你需要的分区。如果你需要分区的索引,
(result & 32).argmax()
这将返回数组中最大值的第一个索引,也就是排序后数据中零的数量。
英文:
You can use np.argsort
:
result = arr[np.argsort(arr & 32)]
Sorting arr & 32
is sorting two values: 0 and 32, which is the partitioning you require. If you need the partition index,
(result & 32).argmax()
That will return the first index of the maximum value of the array, which is the number of zeros in the sorted data.
答案3
得分: 1
以下是要翻译的内容:
有很多来自社区的聪明解决方案(谢谢大家!),但看起来使用纯NumPy无法在O(1)的空间内实现。
我决定实现自己的分区例程并使用Numba JIT来接近NumPy的性能。
import numpy as np
from numba import njit
@njit
def partition_by_bit(nparr, bit_to_partition_on):
n = nparr.shape[0]
i = 0
j = n - 1
mask = 1 << bit_to_partition_on
while True:
while i < n and (nparr[i] & mask == 0):
i += 1
while j >= 0 and (nparr[j] & mask == mask):
j -= 1
if i >= j:
break
nparr[i], nparr[j] = nparr[j], nparr[i]
return i
arr = np.arange(720, dtype=np.uint64)
np.random.shuffle(arr)
partition_index = partition_by_bit(arr, 5) # O(1)空间,O(n)时间
英文:
There is a lot of cool solutions from the community (thank you all!) but looks like it is not possible to stay within O(1) space using pure numpy.
I decided to implement own partition routine and use numba jit to get closer to numpy performance.
import numpy as np
from numba import njit
@njit
def partition_by_bit(nparr, bit_to_partition_on):
n = nparr.shape[0]
i = 0
j = n - 1
mask = 1 << bit_to_partition_on
while True:
while i < n and (nparr[i] & mask == 0):
i += 1
while j >= 0 and (nparr[j] & mask == mask):
j -= 1
if i >= j:
break
nparr[i], nparr[j] = nparr[j], nparr[i]
return i
arr = np.arange(720, dtype=np.uint64)
np.random.shuffle(arr)
partition_index = partition_by_bit(arr, 5) # O(1) space, O(n) time
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论