图像折叠,通过分成网格进行卷积神经网络

huangapple go评论72阅读模式
英文:

Image folding by dividing into grid for convolutional neural network

问题

假设我有一个300 x 300的输入图像,具有1个通道,包含在形状为(300,300,1)的NumPy数组中。通道是单比特的 - 要么是0,要么是1。

我如何将其分成一个4 x 4的网格,每个网格宽度为75像素,并通过将比特相加堆叠在一起?

最终,我将得到一个形状为(75,75,1)的单个NumPy数组。此时,最后一个通道的值可以在0到16之间。

将其作为卷积神经网络的输入,这样做效果如何?这是缩小我的输入的有效方法吗?

英文:

Suppose I have a 300 x 300 input image with 1 channel, contained in a numpy array with shape (300, 300, 1). And the channel is single bit - either 0 or 1.

How can I divide it into a 4 x 4 grid, each grid being 75 by 75 pixels wide and stack the grids together by summing up the bits?

In the end, I'd have a single numpy array that's (75, 75, 1). The value of the last channel can range from 0 to 16 at this point.

How well would this work as an input to a convolutional neural network? Is this an effective way of shrinking my input?

答案1

得分: 1

You can do it using https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.as_strided.html

# 模拟单通道二进制图像
SIZE = 300
X = np.random.randint(low=0, high=2, size=(SIZE,SIZE,1))

BLOCK_SIZE = 4
assert SIZE % BLOCK_SIZE == 0
# 通过其步幅定义移动窗口
new_stride = [X.strides[0], X.strides[1], X.strides[0]*BLOCK_SIZE, X.strides[1]*BLOCK_SIZE, X.strides[2]]
coarsened_X = np.sum( np.lib.stride_tricks.as_strided(X, shape=[BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, SIZE//BLOCK_SIZE, 1], strides=new_stride), axis=(0,1))

关于您的最后一个问题:是的,我认为这是缩小CNN输入的一种相关方式,这取决于您有多少训练数据可用,它可能比在非常大的图像上使用第一个可训练卷积层更有效。请注意,出于优化目的,最好对输入进行平均而不是求和。

一个展示它大约比您的大小的数组快约50倍的基准测试:

图像折叠,通过分成网格进行卷积神经网络

编辑:

我刚刚尝试了只使用reshape和orient transform来完成这个方法,可能不太可读和/或不太通用:

X.reshape(BLOCK_SIZE, SIZE//BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, 1, order='F').sum(axis=(0,2))

我建议这样做是因为它似乎比使用stride的方式快几个百分点。以防感兴趣。

英文:

You can do it using https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.as_strided.html

#mocking a single channel binary image
SIZE = 300
X = np.random.randint(low=0, high=2, size=(SIZE,SIZE,1))

BLOCK_SIZE = 4
assert SIZE % BLOCK_SIZE == 0
# defining the moving window by its strides
new_stride = [X.strides[0], X.strides[1], X.strides[0]*BLOCK_SIZE, X.strides[1]*BLOCK_SIZE, X.strides[2]]
coarsened_X = np.sum( np.lib.stride_tricks.as_strided(X, shape=[BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, SIZE//BLOCK_SIZE, 1], strides=new_stride), axis=(0,1))

About your final question: yeah I do think it's a relevant way of shrinking the input of your CNN, depending on how much training data you have available it can be more efficient that a first trainable convolutional layer on a very large image. Note that it's preferable to average your inputs instead of summing them for optimization purposes.

A benchmark showing it's ~50x times faster for an array of your size:

图像折叠,通过分成网格进行卷积神经网络

edit:

I just tried this method of doing it using only reshape and orient transform, possibly less readable and/or generalizable:

X.reshape(BLOCK_SIZE, SIZE//BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, 1, order='F').sum(axis=(0,2))

I am suggesting it as it seems to be a few percent faster than the stride way. Just in case of interest.

huangapple
  • 本文由 发表于 2023年6月30日 05:17:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584648.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定