合并一个 Python 数组沿一个轴

huangapple go评论76阅读模式
英文:

Merge a Python Array Along an Axis

问题

我一直在尝试为Keras模型填充一个训练数据集使用numpy的`append`函数一切都*正常工作*但速度**非常慢**这是我现在正在做的

```python
def populateData():
    images = np.zeros([1, 4, 512, 512, 6])
    for m in range(2):
        for n in range(4):
            batch_np = np.zeros([4, 512, 512, 6])

            # 对批次进行操作...
            # ...

            images = np.append(images, batch_np, axis=0)

随着数组大小随着每次传递而增长,numpy花费的时间以指数方式增加。例如,第一次传递大约需要~1秒,第三次大约需要超过~3秒。到我完成了十几次时,每个append操作需要许多分钟(!)。基于目前的速度,可能需要几天才能完成。

合并一个 Python 数组沿一个轴

我希望能在下一个冰河时代来临之前能够填充好我的训练数据集。除了“获得更好的硬件”之外,我可以做些什么来加速np.append(...)?我理解Numpy的append函数每次调用时都会复制整个数组。是否有一个等价的函数每次都不会执行复制?或者使用引用值,并只是修改那个值?

我尝试过使用Python内置的list append函数来重写其中的一些部分,但它不像numpy的append函数那样提供axis支持。因此,虽然它看起来快得多,但对于这种多维设置来说并不完全适用。


TL;DR:在向Python列表追加时是否有一种指定轴的方法?如果没有,是否有一种更优化的方法来沿指定轴追加到N维数组/加速numpy.append


<details>
<summary>英文:</summary>

I&#39;ve been trying to populate a training data set for use in a Keras model. Using numpy&#39;s `append` function everything *works* fine, but it is **incredibly slow**. Here&#39;s what I&#39;m doing right now:

```python
def populateData():
    images = np.zeros([1, 4, 512, 512, 6])
    for m in range(2):
        for n in range(4):
            batch_np = np.zeros([4, 512, 512, 6])

            # Doing stuff with the batch...
            # ...

            images = np.append(images, batch_np, axis=0)

As the size of the array grows with each pass, the amount of time numpy takes to append new data increases pretty much exponentially. So, for instance, the first pass takes around ~1 second, the third takes just over ~3 seconds. By the time I've done a dozen or more, each append operation takes many minutes(!). Based on the current pace of things, it could take days to complete.

合并一个 Python 数组沿一个轴

I'd like to be able to get my training data set populated sometime before the next ice age. Beyond "getting better hardware", what can I do to speed up np.append(...)? My understanding is that Numpy's append function copies the entire array each time this function gets called. Is there an equivalent function that does not perform a copy each time? Or that uses a reference value instead, and just modifies that?

I've attempted to rewrite some of this using Python's built-in list append function, but it doesn't provide axis support like numpy's append function. So, while that appears to be much much faster, it doesn't quite work for this multi-dimensional setup.


TL;DR: Is there a way to specify an axis when appending to a Python list? If not, is there a more optimal way to append to a N-D array along a specified axis / speed up numpy.append?

答案1

得分: 1

你可以使用 np.stack 并使用一个Python列表:

images = []
for m in range(2):
    for n in range(4):
        batch_np = np.zeros([4, 512, 512, 6])
        ...
        images.append(batch_np)
images = np.stack(images, axis=0)

输出:

>>> images.shape
(8, 4, 512, 512, 6)

或者在循环之前分配整个数组:

M = 2
N = 4
images = np.zeros([M*N, N, 512, 512, 6])
for i, m in enumerate(range(M)):
    for j, n in enumerate(range(N)):
        batch_np = np.zeros([N, 512, 512, 6])
        images[i+j] = batch_np

输出:

>>> images.shape
(8, 4, 512, 512, 6)
英文:

You can use np.stack and use a python list:

images = []
for m in range(2):
    for n in range(4):
        batch_np = np.zeros([4, 512, 512, 6])
        ...
        images.append(batch_np)
images = np.stack(images, axis=0)

Output:

&gt;&gt;&gt; images.shape
(8, 4, 512, 512, 6)

Or allocate the whole array before loops:

M = 2
N = 4
images = np.zeros([M*N, N, 512, 512, 6])
for i, m in enumerate(range(M)):
    for j, n in enumerate(range(N)):
        batch_np = np.zeros([N, 512, 512, 6])
        images[i+j] = batch_np

Output:

&gt;&gt;&gt; images.shape
(8, 4, 512, 512, 6)

答案2

得分: 1

在列表追加时,为什么需要指定一个轴?这两个循环产生相同的形状:

arr = np.zeros([0,3,4])
for i in range(5):
    arr = np.append(arr, np.ones((1,3,4)), axis=0)
arr.shape

输出[62]: (5, 3, 4)

alist = []
for i in range(5):
    alist.append(np.ones((3,4)))
arr = np.array(alist)    
arr.shape

输出[64]: (5, 3, 4)

使用默认轴0的stack也可以实现相同的效果:

np.stack(alist, axis=0).shape

输出[65]: (5, 3, 4)

np.stack(alist, axis=1).shape

输出[66]: (3, 5, 4)

英文:

Why do you need to specify an axis with the list append? These two loops produce the same shape:

In [62]: arr = np.zeros([0,3,4])
    ...: for i in range(5):
    ...:     arr = np.append(arr, np.ones((1,3,4)), axis=0)
    ...: arr.shape
Out[62]: (5, 3, 4)

In [63]: alist = []
    ...: for i in range(5):
    ...:     alist.append(np.ones((3,4)))
    ...: arr = np.array(alist)    
In [64]: arr.shape
Out[64]: (5, 3, 4)

stack with the default axis 0 does the same thing:

In [65]: np.stack(alist, axis=0).shape
Out[65]: (5, 3, 4)
In [66]: np.stack(alist, axis=1).shape
Out[66]: (3, 5, 4)

huangapple
  • 本文由 发表于 2023年4月11日 05:21:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75980839.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定