英文:
Merge a Python Array Along an Axis
问题
我一直在尝试为Keras模型填充一个训练数据集。使用numpy的`append`函数一切都*正常工作*,但速度**非常慢**。这是我现在正在做的:
```python
def populateData():
images = np.zeros([1, 4, 512, 512, 6])
for m in range(2):
for n in range(4):
batch_np = np.zeros([4, 512, 512, 6])
# 对批次进行操作...
# ...
images = np.append(images, batch_np, axis=0)
随着数组大小随着每次传递而增长,numpy花费的时间以指数方式增加。例如,第一次传递大约需要~1秒,第三次大约需要超过~3秒。到我完成了十几次时,每个append
操作需要许多分钟(!)。基于目前的速度,可能需要几天才能完成。
我希望能在下一个冰河时代来临之前能够填充好我的训练数据集。除了“获得更好的硬件”之外,我可以做些什么来加速np.append(...)
?我理解Numpy的append
函数每次调用时都会复制整个数组。是否有一个等价的函数每次都不会执行复制?或者使用引用值,并只是修改那个值?
我尝试过使用Python内置的list append
函数来重写其中的一些部分,但它不像numpy的append函数那样提供axis
支持。因此,虽然它看起来快得多,但对于这种多维设置来说并不完全适用。
TL;DR:在向Python列表追加时是否有一种指定轴的方法?如果没有,是否有一种更优化的方法来沿指定轴追加到N维数组/加速numpy.append
?
<details>
<summary>英文:</summary>
I've been trying to populate a training data set for use in a Keras model. Using numpy's `append` function everything *works* fine, but it is **incredibly slow**. Here's what I'm doing right now:
```python
def populateData():
images = np.zeros([1, 4, 512, 512, 6])
for m in range(2):
for n in range(4):
batch_np = np.zeros([4, 512, 512, 6])
# Doing stuff with the batch...
# ...
images = np.append(images, batch_np, axis=0)
As the size of the array grows with each pass, the amount of time numpy takes to append new data increases pretty much exponentially. So, for instance, the first pass takes around ~1 second, the third takes just over ~3 seconds. By the time I've done a dozen or more, each append
operation takes many minutes(!). Based on the current pace of things, it could take days to complete.
I'd like to be able to get my training data set populated sometime before the next ice age. Beyond "getting better hardware", what can I do to speed up np.append(...)
? My understanding is that Numpy's append
function copies the entire array each time this function gets called. Is there an equivalent function that does not perform a copy each time? Or that uses a reference value instead, and just modifies that?
I've attempted to rewrite some of this using Python's built-in list append
function, but it doesn't provide axis
support like numpy's append function. So, while that appears to be much much faster, it doesn't quite work for this multi-dimensional setup.
TL;DR: Is there a way to specify an axis when appending to a Python list? If not, is there a more optimal way to append to a N-D array along a specified axis / speed up numpy.append
?
答案1
得分: 1
你可以使用 np.stack
并使用一个Python列表:
images = []
for m in range(2):
for n in range(4):
batch_np = np.zeros([4, 512, 512, 6])
...
images.append(batch_np)
images = np.stack(images, axis=0)
输出:
>>> images.shape
(8, 4, 512, 512, 6)
或者在循环之前分配整个数组:
M = 2
N = 4
images = np.zeros([M*N, N, 512, 512, 6])
for i, m in enumerate(range(M)):
for j, n in enumerate(range(N)):
batch_np = np.zeros([N, 512, 512, 6])
images[i+j] = batch_np
输出:
>>> images.shape
(8, 4, 512, 512, 6)
英文:
You can use np.stack
and use a python list:
images = []
for m in range(2):
for n in range(4):
batch_np = np.zeros([4, 512, 512, 6])
...
images.append(batch_np)
images = np.stack(images, axis=0)
Output:
>>> images.shape
(8, 4, 512, 512, 6)
Or allocate the whole array before loops:
M = 2
N = 4
images = np.zeros([M*N, N, 512, 512, 6])
for i, m in enumerate(range(M)):
for j, n in enumerate(range(N)):
batch_np = np.zeros([N, 512, 512, 6])
images[i+j] = batch_np
Output:
>>> images.shape
(8, 4, 512, 512, 6)
答案2
得分: 1
在列表追加时,为什么需要指定一个轴?这两个循环产生相同的形状:
arr = np.zeros([0,3,4])
for i in range(5):
arr = np.append(arr, np.ones((1,3,4)), axis=0)
arr.shape
输出[62]: (5, 3, 4)
alist = []
for i in range(5):
alist.append(np.ones((3,4)))
arr = np.array(alist)
arr.shape
输出[64]: (5, 3, 4)
使用默认轴0的stack
也可以实现相同的效果:
np.stack(alist, axis=0).shape
输出[65]: (5, 3, 4)
np.stack(alist, axis=1).shape
输出[66]: (3, 5, 4)
英文:
Why do you need to specify an axis with the list append? These two loops produce the same shape:
In [62]: arr = np.zeros([0,3,4])
...: for i in range(5):
...: arr = np.append(arr, np.ones((1,3,4)), axis=0)
...: arr.shape
Out[62]: (5, 3, 4)
In [63]: alist = []
...: for i in range(5):
...: alist.append(np.ones((3,4)))
...: arr = np.array(alist)
In [64]: arr.shape
Out[64]: (5, 3, 4)
stack
with the default axis 0 does the same thing:
In [65]: np.stack(alist, axis=0).shape
Out[65]: (5, 3, 4)
In [66]: np.stack(alist, axis=1).shape
Out[66]: (3, 5, 4)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论