2023年4月11日 05:21:32go评论111阅读模式

英文:

Merge a Python Array Along an Axis

问题

我一直在尝试为Keras模型填充一个训练数据集。使用numpy的`append`函数一切都*正常工作*，但速度**非常慢**。这是我现在正在做的：
```python
def populateData():
    images = np.zeros([1, 4, 512, 512, 6])
    for m in range(2):
        for n in range(4):
            batch_np = np.zeros([4, 512, 512, 6])
            # 对批次进行操作...
            # ...
            images = np.append(images, batch_np, axis=0)

随着数组大小随着每次传递而增长，numpy花费的时间以指数方式增加。例如，第一次传递大约需要~1秒，第三次大约需要超过~3秒。到我完成了十几次时，每个append操作需要许多分钟(!)。基于目前的速度，可能需要几天才能完成。

我希望能在下一个冰河时代来临之前能够填充好我的训练数据集。除了“获得更好的硬件”之外，我可以做些什么来加速np.append(...)？我理解Numpy的append函数每次调用时都会复制整个数组。是否有一个等价的函数每次都不会执行复制？或者使用引用值，并只是修改那个值？

我尝试过使用Python内置的list append函数来重写其中的一些部分，但它不像numpy的append函数那样提供axis支持。因此，虽然它看起来快得多，但对于这种多维设置来说并不完全适用。

TL;DR：在向Python列表追加时是否有一种指定轴的方法？如果没有，是否有一种更优化的方法来沿指定轴追加到N维数组/加速numpy.append？


<details>
<summary>英文:</summary>
I&#39;ve been trying to populate a training data set for use in a Keras model. Using numpy&#39;s `append` function everything *works* fine, but it is **incredibly slow**. Here&#39;s what I&#39;m doing right now:
```python
def populateData():
    images = np.zeros([1, 4, 512, 512, 6])
    for m in range(2):
        for n in range(4):
            batch_np = np.zeros([4, 512, 512, 6])
            # Doing stuff with the batch...
            # ...
            images = np.append(images, batch_np, axis=0)

As the size of the array grows with each pass, the amount of time numpy takes to append new data increases pretty much exponentially. So, for instance, the first pass takes around ~1 second, the third takes just over ~3 seconds. By the time I've done a dozen or more, each append operation takes many minutes(!). Based on the current pace of things, it could take days to complete.

I'd like to be able to get my training data set populated sometime before the next ice age. Beyond "getting better hardware", what can I do to speed up np.append(...)? My understanding is that Numpy's append function copies the entire array each time this function gets called. Is there an equivalent function that does not perform a copy each time? Or that uses a reference value instead, and just modifies that?

I've attempted to rewrite some of this using Python's built-in list append function, but it doesn't provide axis support like numpy's append function. So, while that appears to be much much faster, it doesn't quite work for this multi-dimensional setup.

TL;DR: Is there a way to specify an axis when appending to a Python list? If not, is there a more optimal way to append to a N-D array along a specified axis / speed up numpy.append?

答案1

得分: 1

你可以使用 np.stack 并使用一个Python列表：

images = []
for m in range(2):
    for n in range(4):
        batch_np = np.zeros([4, 512, 512, 6])
        ...
        images.append(batch_np)
images = np.stack(images, axis=0)

输出：

>>> images.shape
(8, 4, 512, 512, 6)

或者在循环之前分配整个数组：

M = 2
N = 4
images = np.zeros([M*N, N, 512, 512, 6])
for i, m in enumerate(range(M)):
    for j, n in enumerate(range(N)):
        batch_np = np.zeros([N, 512, 512, 6])
        images[i+j] = batch_np

输出：

>>> images.shape
(8, 4, 512, 512, 6)

英文:

You can use np.stack and use a python list:

images = []
for m in range(2):
    for n in range(4):
        batch_np = np.zeros([4, 512, 512, 6])
        ...
        images.append(batch_np)
images = np.stack(images, axis=0)

Output:

&gt;&gt;&gt; images.shape
(8, 4, 512, 512, 6)

Or allocate the whole array before loops:

M = 2
N = 4
images = np.zeros([M*N, N, 512, 512, 6])
for i, m in enumerate(range(M)):
    for j, n in enumerate(range(N)):
        batch_np = np.zeros([N, 512, 512, 6])
        images[i+j] = batch_np

Output:

&gt;&gt;&gt; images.shape
(8, 4, 512, 512, 6)

答案2

得分: 1

在列表追加时，为什么需要指定一个轴？这两个循环产生相同的形状：

arr = np.zeros([0,3,4])
for i in range(5):
    arr = np.append(arr, np.ones((1,3,4)), axis=0)
arr.shape

输出[62]: (5, 3, 4)

alist = []
for i in range(5):
    alist.append(np.ones((3,4)))
arr = np.array(alist)    
arr.shape

输出[64]: (5, 3, 4)

使用默认轴0的stack也可以实现相同的效果：

np.stack(alist, axis=0).shape

输出[65]: (5, 3, 4)

np.stack(alist, axis=1).shape

输出[66]: (3, 5, 4)

英文:

Why do you need to specify an axis with the list append? These two loops produce the same shape:

In [62]: arr = np.zeros([0,3,4])
    ...: for i in range(5):
    ...:     arr = np.append(arr, np.ones((1,3,4)), axis=0)
    ...: arr.shape
Out[62]: (5, 3, 4)
In [63]: alist = []
    ...: for i in range(5):
    ...:     alist.append(np.ones((3,4)))
    ...: arr = np.array(alist)    
In [64]: arr.shape
Out[64]: (5, 3, 4)

stack with the default axis 0 does the same thing:

In [65]: np.stack(alist, axis=0).shape
Out[65]: (5, 3, 4)
In [66]: np.stack(alist, axis=1).shape
Out[66]: (3, 5, 4)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并一个 Python 数组沿一个轴

问题

答案1

答案2

为什么它在CSV文件中创建了一个空行？

如何在 countplot 中添加百分比？

未找到XPath定位的元素。

Apparent issues with initializing arrays of strings in Java.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。