在多维数组上映射一个函数

huangapple go评论45阅读模式
英文:

Mapping a function on a multi-dimensional array

问题

我有一个大小为MxNxK的大数组。我想要遍历M轴,并在每个NxK数组上应用一个函数。我想要避免使用for循环。

为了说明这一点,我创建了一个大小为10x5x4的数组。

import numpy as np

data = []
for i in range(1, 11):
    a = np.full(shape=(5, 4), fill_value=i)
    data.append(a)

data = np.array(data)

现在,我遍历第一个维度,并在每个5x4数组上应用一个简单的求和/平均值。

means = []
sums = []

for i in data:
    means.append(i.sum())
    sums.append(i.mean())

实际上,我有一个大小为5000x200x20的数组,我想在每个200x20的数组上应用一个函数,但我想避免使用循环。"apply_along_axis"似乎不能解决这个问题。请帮忙。

编辑
针对@chrslg的回答,我添加了一些额外的信息。在我的情况下,我想在MxNxK数组中的每个M子数组中应用概率分布(pdf)和第N百分位数。我注意到,使用numba库进行的第一个函数调用确实需要太长时间(7秒),但每次后续执行仅需要0.77秒。而传统的循环遍历M子数组需要3.9秒。

英文:

I have a large array of dimensions MxNxK. I want to loop through axis M and apply a function on each of the NxK array. I want to avoid a for loop.

To illustrate the point, I build a 10x5x4 array.

import numpy as np

data = []
for i in range(1,11):

    a =np.full(shape=(5,4),fill_value=i)
    data.append(a)

data = np.array(data)

Now, I loop through the first dimension and apply a simple sum/mean in each of the 5x4 array.

means = []
sums = []

for i in data:
    means.append(i.sum())
    sums.append(i.mean())

In reality , I have a 5000x200x20 array and I want to apply a function on each of the 200x20 array , but I want to avoid the loop. "apply_along_axis" doesn't seem to address the issue. Please help

Edit
In response to @chrslg answer, adding some more information. In my case I want to apply a prob_dist (pdf) and a Nth percentile in each of the M subarrays in the MxNxK array. I noticed that , using numba teh first function call indeed takes too long (7s) but every successive execution merely takes 0.77s. Whereas the conventional loop through M subarray is taking 3.9s

答案1

得分: 1

在这种情况下

# 保持与您的代码中相同的均值/总和逻辑反转
means = data.sum(axis=(1,2))
sums = data.means(axis=(1,2))

应该可以工作。

一般来说,有一些方法可以避免使用循环在所有数据上应用函数。例如,apply along axisvectorize

def mymean(subarr):
    return subarr.mean()

mymeanvec = np.vectorize(mymean, signature='(m,n)->()')

mymeanvec(data)

这是一种方法。

但这并不是建议的方法。这些函数只是避免了循环,但并没有减少对Python函数的调用次数。
这就是为什么NumPy的文档明确表示这些函数不是为了性能而设计的原因。

真正的向量化方式是要思考向量化。大多数NumPy操作可以直接在数组上运行,并自己执行循环。

另一种方式,当这不可能时,因为您正在做的事情太特殊,无法使用组合NumPy操作来完成,是使用Numba。

from numba import jit

@jit(nopython=True)
def myNumbaMean(data):
    # 最好是准备好正确形状的数据,并避免使用 `append`。特别是在使用NumPy时(纯Python的 `append` 是高效的)
    means = np.empty((len(data),))
    sums = np.empty((len(data),))
    for i in range(len(data)):
        # 仍然是相同的反转
        means[i] = data[i].sum()
        sums[i] = data[i].mean()
    return means, sums

第一次调用会花一些时间,因为它需要编译。然后它的速度就像C代码一样快。

实际上,使用Numba,甚至可以不依赖于 .mean() .sum(),如果您想要做一些更特殊的事情。即使是这个看似天真的代码也会非常高效:

@jit(nopython=True)
def myNaiveMean(data):
    means = np.empty((len(data),))
    sums = np.zeros((len(data),))
    m,n,p=data.shape
    # 这次没有反转
    for i in range(m):
        for j in range(n):
            for k in range(p):
                sums[i] += data[i,j,k]
        means[i] = sums[i]/n/p
    return means,sums

计时

一些计时方面的考虑
在您的数据形状(5000,200,20)上,计时如下:

方法 计时
Python(您的方法) 133 毫秒
向量化 119 毫秒
Numba 65 毫秒
直接NumPy(我的第一个回答) 54 毫秒
Numba天真 35 毫秒

如您所见,使用Numba,通常会感到困惑,我们通常认为在Python中最差的解决方案经常变成了最好的解决方案。因为它只是在没有子调用和不必要的中间结果(就像在调用 np.meannp.sum 的Numba版本中一样)的情况下完成工作。

甚至令人烦恼的是,这种天真的版本甚至击败了纯NumPy版本(data.sum(axis=(1,2)))。也就是说,如果有纯NumPy版本,我不建议使用Numba。因为纯NumPy代码不比我的代码更聪明(因为这里没有聪明的地方:它只是一个求和,你不能避免迭代所有元素并将它们相加。唯一使NumPy比您的代码更快的是它是编译的C代码与解释的Python代码。但再次与Numba相比,它是编译的代码与编译的代码)。但对于许多计算,NumPy的作者已经实现了比我能想出的第一个解决方案更高效的算法。因此,将所有内容重写为Numba不仅是浪费时间,而且更常见的情况是,导致函数甚至不如NumPy快。

因此,一般来说,我会尽量“考虑NumPy”,就像我在我的第一个答案中所做的那样。然后,如果这不可能,编写一个Numba代码。然后,如果不可能(例如因为Numba不可用),尝试“映射一个函数”,但要意识到这只会节省在循环本身(迭代)中花费的时间,而不是循环内容,而大多数情况下,循环内容才是CPU花费的地方。

最后一点注意:在这种情况下,我的计时并不那么令人印象深刻。通常情况下,当消除Python循环时,我们常常看到这种情况,消除Python循环的性能增益因子通常是100倍或1000倍。这是因为您的天真代码并不是那么慢。因为只有一个循环,外层循环是在Python中完成的。两个隐含的内层循环在 summean 代码内部。总共,您有5000 + 5000×200 + 5000×200×20次迭代。而5000×200 + 5000×200×20已经在NumPy中进行了向量化。所以我们在这里节省的只是5000次迭代和5000次调用的时间。

英文:

In this case

# keeping the same mean/sum logic inversion as in your code
means = data.sum(axis=(1,2))
sums = data.means(axis=(1,2))

should work.

Generally speaking, there are ways to avoid for loop to apply a function on all data. apply along axis for example. Or vectorize

def mymean(subarr):
    return subarr.mean()

mymeanvec = np.vectorize(mymean, signature='(m,n)->()')

mymeanvec(data)

is one way.

But that is not advisable. Those functions just avoid the for loop, but not the M calls to a python function.
That is why numpy's documentation clearly states that those function are not for performance.

The good way to really vectorize, is to think vectorized. Most numpy operation can work directly on arrays, and do the for loop themselves.

Another way, when this happens to be impossible, because what you are doing is too exotic to be written without for loops using combination of numpy operation, is to use numba

from numba import jit

@jit(nopython=True)
def myNumbaMean(data):
    # It is always better to have the correctly shaped data ready, and avoid `append`. Especially with numpy (pure python append is efficient)
    means = np.empty((len(data),))
    sums = np.empty((len(data),))
    for i in range(len(data)):
        # still the same inversion
        means[i] = data[i].sum()
        sums[i] = data[i].mean()
    return means, sums

Takes some time for the first call, because it compiles it. Then it is as fast as C code.

In fact, with numba, you could even not rely on .mean() .sum() if you wanted to do something more exotic. Even this, apparently naive, code, would be quite efficient

@jit(nopython=True)
def myNaiveMean(data):
    means = np.empty((len(data),))
    sums = np.zeros((len(data),))
    m,n,p=data.shape
    # this time without the inversion
    for i in range(m):
        for j in range(n):
            for k in range(p):
                sums[i] += data[i,j,k]
        means[i] = sums[i]/n/p
    return means,sums

Timings

Some timing considerations
On your data shape (5000,200,20), timings are as follows:

Method Timing
python (yours) 133 ms
Vectorization 119 ms
Numba 65 ms
Direct numpy (my first answer) 54 ms
Numba naive 35 ms

As you can see, with numba, this is often quite disorienting, what we are used to consider the worst solution in python often happens to be the best one. Because it just does the job without subcalls and unnecessary intermediary results (like with the numba version where we call np.mean and np.sum).

And, which is even annoying, that naive version even beats the pure numpy one (data.sum(axis=(1,2))). That being said, I would not advise using numba when you have a pure numpy version. Because, indeed, this time, it beats pure numpy; but that is because the pure numpy code does nothng smarter than my code (because there is no room to be smart here: it is just a sum, you can't avoid iterating all elements and summing them. The only thing that makes numpy faster compared to, for example your code, is that it is compiled C code vs interpreted python code. But agains numba, it is compiled code vs compiled code).
But for many computation, the authors of numpy have implemented
more efficient algorithm that the first one I would come up with. So rewriting everything in numba would be not only a waste of time, but more often than not, leads to function not even as fast as numpy.

So, generally speaking, I would always try to "think numpy", as I did in my first answer. Then, if that is not possible, write a numba code. Then, if that is not possible (for example because numba is not available), try to "map a function", but being aware that this spares only the time spend in the for loop itself (the iteration), not the content of the for loop, which, most of the time, is where cpu is spent.

Last remark: in this case, my timings are not that impressive. ×4 gain as most. This seems a lot already. But that is nothing compared to what we often see here when suppressing python for loops (where ×100 or ×1000 gain factor are quite common). This is because your naive code is not that slow. Because only one for loop, the outer one is made in python. The two implicit inner loops are inside sum and mean code. All together, you have 5000 + 5000×200 + 5000×200×20 iterations. And 5000×200 + 5000×200×20 are already vectorized in numpy. So what we saved here are just 5000 iterations and 5000 calls.

huangapple
  • 本文由 发表于 2023年6月15日 21:06:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76482805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定