2023年2月24日 14:22:54go评论66阅读模式

英文:

best way to check if a numpy array is all non negative

问题

这段文字的中文翻译如下：

这段代码能够正常运行，但从算法优化的角度来看并不理想，因为在函数解析数组时并不需要存储最小值：

编辑：根据数据的不同，一个优化的函数确实能够节省大量时间，因为它将在首次遇到负值时终止。如果只期望出现一个负值，平均情况下时间将减少一半。然而，在numpy库之外构建优化算法将会付出巨大的代价（Python代码与C++代码相比）。

英文:

This works, but not algorithmically optimal since I dont need the min value to be stored while the function is parsing the array:

def is_non_negative(m):
    return np.min(m) &gt;= 0

Edit: Depending on the data an optimal function could indeed save a lot because it will terminate at the first encounter of a negative value. If only one negative value is expected, time will be cut by a factor of two in average. However building the optimal algorithm outside numpy library will be at a huge cost (Python code vs C++ code).

答案1

得分: 1

以下是已翻译的代码部分：

A possible solution is to use a function in C:

    #include <stdio.h>
    #include <stdlib.h>
    
    int is_negative(double* data, int num_elems) {
        for (int i = 0; i < num_elems; i++) {
            if (data[i] < 0) {
                return 1;
            }
        }
        return 0;
    }

Compile with:

    gcc -c -fPIC is_negative.c -o is_negative.o

And link with:

    gcc -shared is_negative.o -o libis_negative.so

An then, in Python:

    import numpy as np
    import ctypes
    
    lib = ctypes.cdll.LoadLibrary('/tmp/libis_negative.so')
    
    a = np.array([1.0, 2.0, -3.0, 4.0])
    num_elems = a.size
    
    lib.is_negative.restype = ctypes.c_int
    lib.is_negative.argtypes = [
        np.ctypeslib.ndpointer(dtype=np.float64),
        ctypes.c_int,
    ]
    
    result = lib.is_negative(a, num_elems)
    
    if result:
        print("It has negative elements")
    else:
        print("It does not have negative elements")

希望这有帮助！如果您需要更多信息，请随时提问。

英文:

A possible solution is to use a function in C:

#include &lt;stdio.h&gt;
#include &lt;stdlib.h&gt;

int is_negative(double* data, int num_elems) {
    for (int i = 0; i &lt; num_elems; i++) {
        if (data[i] &lt; 0) {
            return 1;
        }
    }
    return 0;
}

Compile with:

gcc -c -fPIC is_negative.c -o is_negative.o

And link with:

gcc -shared is_negative.o -o libis_negative.so

An then, in Python:

import numpy as np
import ctypes

lib = ctypes.cdll.LoadLibrary(&#39;/tmp/libis_negative.so&#39;)

a = np.array([1.0, 2.0, -3.0, 4.0])
num_elems = a.size

lib.is_negative.restype = ctypes.c_int
lib.is_negative.argtypes = [
    np.ctypeslib.ndpointer(dtype=np.float64),
    ctypes.c_int,
]

result = lib.is_negative(a, num_elems)

if result:
    print(&quot;It has negative elements&quot;)
else:
    print(&quot;It does not have negative elements&quot;)

答案2

得分: 1

# 使用基于块的策略的一种纯Numpy解决方案：

def is_non_negative(m):
    chunkSize = max(min(65536, m.size/8), 4096) # 自动调整
    for i in range(0, m.size, chunkSize):
        if np.min(m[i:i+chunkSize]) &lt; 0:
            return False
    return True

# 这种解决方案只在数组很大且块足够大以使得Numpy调用的开销很小且足够小以将全局数组分成许多部分时才有效（以从早期剪枝中受益）。块大小需要相当大，以平衡在小数组上的`np.min`的相对较大的开销。

# 这里是一个Numba解决方案：

import numba as nb

# 对于一些主流数据类型，进行即时编译的函数。
@nb.njit(['(float32[::1],)', '(float64[::1],)', '(int_[::1],)'])
def is_non_negative_nb(m):
    for e in m:
        if e &lt; 0:
            return False
    return True

# 结果表明，在我的机器上，这比使用`np.min`更快，尽管LLVM-Lite（Numba的JIT）没有很好地自动矢量化（即不使用SIMD指令）该代码。

# 要获得更快的代码，您需要使用C/C++代码并使用基于块的、友好于SIMD的代码，可能还需要使用SIMD指令，如果编译器没有生成高效的代码，这在这种情况下很不幸常见。

英文:

One pure-Numpy solution is to use a chunk based strategy:

def is_non_negative(m):
    chunkSize = max(min(65536, m.size/8), 4096) # Auto-tunning
    for i in range(0, m.size, chunkSize):
        if np.min(m[i:i+chunkSize]) &lt; 0:
            return False
    return True

This solution is only efficient if the arrays are big, and chunks are big enough for the Numpy call overhead to be small and small enough to split the global array in many parts (so to benefit from the early cut). The chunk size needs to be pretty big so to balance the relatively big overhead of np.min on small arrays.

Here is a Numba solution:

import numba as nb

# Eagerly compiled funciton for some mainstream data-types.
@nb.njit([&#39;(float32[::1],)&#39;, &#39;(float64[::1],)&#39;, &#39;(int_[::1],)&#39;])
def is_non_negative_nb(m):
    for e in m:
        if e &lt; 0:
            return False
    return True

It turns out this is faster than using np.min on my machine although the code is not well auto-vectorized (ie. do not use SIMD instruction) by LLVM-Lite (the JIT of Numba).

For an even faster code, you need to use a C/C++ code and use a chunk-based SIMD-friendly code, and possibly use SIMD intrinsics if the compiler does not generate an efficient code which is unfortunately rather frequent in this case.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

最佳方法来检查一个NumPy数组是否全为非负数：

问题

答案1

答案2

如何向Dask中的聚合函数传递参数。

Pandas根据条件进行变换

检查我的数据框列中的元素是否具有相同的类型

将datetime.date对象转换为字符串。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论