2023年6月30日 05:17:00go评论79阅读模式

英文:

Image folding by dividing into grid for convolutional neural network

问题

假设我有一个300 x 300的输入图像，具有1个通道，包含在形状为（300，300，1）的NumPy数组中。通道是单比特的 - 要么是0，要么是1。

我如何将其分成一个4 x 4的网格，每个网格宽度为75像素，并通过将比特相加堆叠在一起？

最终，我将得到一个形状为（75，75，1）的单个NumPy数组。此时，最后一个通道的值可以在0到16之间。

将其作为卷积神经网络的输入，这样做效果如何？这是缩小我的输入的有效方法吗？

英文:

Suppose I have a 300 x 300 input image with 1 channel, contained in a numpy array with shape (300, 300, 1). And the channel is single bit - either 0 or 1.

How can I divide it into a 4 x 4 grid, each grid being 75 by 75 pixels wide and stack the grids together by summing up the bits?

In the end, I'd have a single numpy array that's (75, 75, 1). The value of the last channel can range from 0 to 16 at this point.

How well would this work as an input to a convolutional neural network? Is this an effective way of shrinking my input?

答案1

得分: 1

You can do it using https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.as_strided.html

# 模拟单通道二进制图像
SIZE = 300
X = np.random.randint(low=0, high=2, size=(SIZE,SIZE,1))

BLOCK_SIZE = 4
assert SIZE % BLOCK_SIZE == 0
# 通过其步幅定义移动窗口
new_stride = [X.strides[0], X.strides[1], X.strides[0]*BLOCK_SIZE, X.strides[1]*BLOCK_SIZE, X.strides[2]]
coarsened_X = np.sum( np.lib.stride_tricks.as_strided(X, shape=[BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, SIZE//BLOCK_SIZE, 1], strides=new_stride), axis=(0,1))

关于您的最后一个问题：是的，我认为这是缩小CNN输入的一种相关方式，这取决于您有多少训练数据可用，它可能比在非常大的图像上使用第一个可训练卷积层更有效。请注意，出于优化目的，最好对输入进行平均而不是求和。

一个展示它大约比您的大小的数组快约50倍的基准测试：

编辑：

我刚刚尝试了只使用reshape和orient transform来完成这个方法，可能不太可读和/或不太通用：

X.reshape(BLOCK_SIZE, SIZE//BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, 1, order=&#39;F&#39;).sum(axis=(0,2))

我建议这样做是因为它似乎比使用stride的方式快几个百分点。以防感兴趣。

英文:

You can do it using https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.as_strided.html

#mocking a single channel binary image
SIZE = 300
X = np.random.randint(low=0, high=2, size=(SIZE,SIZE,1))

BLOCK_SIZE = 4
assert SIZE % BLOCK_SIZE == 0
# defining the moving window by its strides
new_stride = [X.strides[0], X.strides[1], X.strides[0]*BLOCK_SIZE, X.strides[1]*BLOCK_SIZE, X.strides[2]]
coarsened_X = np.sum( np.lib.stride_tricks.as_strided(X, shape=[BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, SIZE//BLOCK_SIZE, 1], strides=new_stride), axis=(0,1))

About your final question: yeah I do think it's a relevant way of shrinking the input of your CNN, depending on how much training data you have available it can be more efficient that a first trainable convolutional layer on a very large image. Note that it's preferable to average your inputs instead of summing them for optimization purposes.

A benchmark showing it's ~50x times faster for an array of your size:

edit:

I just tried this method of doing it using only reshape and orient transform, possibly less readable and/or generalizable:

X.reshape(BLOCK_SIZE, SIZE//BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, 1, order=&#39;F&#39;).sum(axis=(0,2))

I am suggesting it as it seems to be a few percent faster than the stride way. Just in case of interest.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

图像折叠，通过分成网格进行卷积神经网络

问题

答案1

How do I divide the content of a txt file by /2 in Python without encountering 'ValueError: invalid literal for int() with base 10'?

循环以筛选字典中的NaN值

如何在discord.py中为这个斜杠命令添加选项？

为什么在Jupyterlab的虚拟环境中能够访问未安装的软件包？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论