最佳方法来检查一个NumPy数组是否全为非负数:

huangapple go评论60阅读模式
英文:

best way to check if a numpy array is all non negative

问题

这段文字的中文翻译如下:

这段代码能够正常运行,但从算法优化的角度来看并不理想,因为在函数解析数组时并不需要存储最小值:

编辑:根据数据的不同,一个优化的函数确实能够节省大量时间,因为它将在首次遇到负值时终止。如果只期望出现一个负值,平均情况下时间将减少一半。然而,在numpy库之外构建优化算法将会付出巨大的代价(Python代码与C++代码相比)。

英文:

This works, but not algorithmically optimal since I dont need the min value to be stored while the function is parsing the array:

def is_non_negative(m):
    return np.min(m) >= 0

Edit: Depending on the data an optimal function could indeed save a lot because it will terminate at the first encounter of a negative value. If only one negative value is expected, time will be cut by a factor of two in average. However building the optimal algorithm outside numpy library will be at a huge cost (Python code vs C++ code).

答案1

得分: 1

以下是已翻译的代码部分:

A possible solution is to use a function in C:

    #include <stdio.h>
    #include <stdlib.h>
    
    int is_negative(double* data, int num_elems) {
        for (int i = 0; i < num_elems; i++) {
            if (data[i] < 0) {
                return 1;
            }
        }
        return 0;
    }

Compile with:

    gcc -c -fPIC is_negative.c -o is_negative.o

And link with:

    gcc -shared is_negative.o -o libis_negative.so

An then, in Python:

    import numpy as np
    import ctypes
    
    lib = ctypes.cdll.LoadLibrary('/tmp/libis_negative.so')
    
    a = np.array([1.0, 2.0, -3.0, 4.0])
    num_elems = a.size
    
    lib.is_negative.restype = ctypes.c_int
    lib.is_negative.argtypes = [
        np.ctypeslib.ndpointer(dtype=np.float64),
        ctypes.c_int,
    ]
    
    result = lib.is_negative(a, num_elems)
    
    if result:
        print("It has negative elements")
    else:
        print("It does not have negative elements")

希望这有帮助!如果您需要更多信息,请随时提问。

英文:

A possible solution is to use a function in C:

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

int is_negative(double* data, int num_elems) {
    for (int i = 0; i &lt; num_elems; i++) {
        if (data[i] &lt; 0) {
            return 1;
        }
    }
    return 0;
}

Compile with:

gcc -c -fPIC is_negative.c -o is_negative.o

And link with:

gcc -shared is_negative.o -o libis_negative.so

An then, in Python:

import numpy as np
import ctypes

lib = ctypes.cdll.LoadLibrary(&#39;/tmp/libis_negative.so&#39;)

a = np.array([1.0, 2.0, -3.0, 4.0])
num_elems = a.size

lib.is_negative.restype = ctypes.c_int
lib.is_negative.argtypes = [
    np.ctypeslib.ndpointer(dtype=np.float64),
    ctypes.c_int,
]

result = lib.is_negative(a, num_elems)

if result:
    print(&quot;It has negative elements&quot;)
else:
    print(&quot;It does not have negative elements&quot;)

答案2

得分: 1

# 使用基于块的策略的一种纯Numpy解决方案:

def is_non_negative(m):
    chunkSize = max(min(65536, m.size/8), 4096) # 自动调整
    for i in range(0, m.size, chunkSize):
        if np.min(m[i:i+chunkSize]) &lt; 0:
            return False
    return True

# 这种解决方案只在数组很大且块足够大以使得Numpy调用的开销很小且足够小以将全局数组分成许多部分时才有效(以从早期剪枝中受益)。块大小需要相当大,以平衡在小数组上的`np.min`的相对较大的开销。

# 这里是一个Numba解决方案:

import numba as nb

# 对于一些主流数据类型,进行即时编译的函数。
@nb.njit(['(float32[::1],)', '(float64[::1],)', '(int_[::1],)'])
def is_non_negative_nb(m):
    for e in m:
        if e &lt; 0:
            return False
    return True

# 结果表明,在我的机器上,这比使用`np.min`更快,尽管LLVM-Lite(Numba的JIT)没有很好地自动矢量化(即不使用SIMD指令)该代码。

# 要获得更快的代码,您需要使用C/C++代码并使用基于块的、友好于SIMD的代码,可能还需要使用SIMD指令,如果编译器没有生成高效的代码,这在这种情况下很不幸常见。
英文:

One pure-Numpy solution is to use a chunk based strategy:

def is_non_negative(m):
    chunkSize = max(min(65536, m.size/8), 4096) # Auto-tunning
    for i in range(0, m.size, chunkSize):
        if np.min(m[i:i+chunkSize]) &lt; 0:
            return False
    return True

This solution is only efficient if the arrays are big, and chunks are big enough for the Numpy call overhead to be small and small enough to split the global array in many parts (so to benefit from the early cut). The chunk size needs to be pretty big so to balance the relatively big overhead of np.min on small arrays.


Here is a Numba solution:

import numba as nb

# Eagerly compiled funciton for some mainstream data-types.
@nb.njit([&#39;(float32[::1],)&#39;, &#39;(float64[::1],)&#39;, &#39;(int_[::1],)&#39;])
def is_non_negative_nb(m):
    for e in m:
        if e &lt; 0:
            return False
    return True

It turns out this is faster than using np.min on my machine although the code is not well auto-vectorized (ie. do not use SIMD instruction) by LLVM-Lite (the JIT of Numba).

For an even faster code, you need to use a C/C++ code and use a chunk-based SIMD-friendly code, and possibly use SIMD intrinsics if the compiler does not generate an efficient code which is unfortunately rather frequent in this case.

huangapple
  • 本文由 发表于 2023年2月24日 14:22:54
  • 转载请务必保留本文链接:https://go.coder-hub.com/75553212.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定