英文:
Image folding by dividing into grid for convolutional neural network
问题
假设我有一个300 x 300的输入图像,具有1个通道,包含在形状为(300,300,1)的NumPy数组中。通道是单比特的 - 要么是0,要么是1。
我如何将其分成一个4 x 4的网格,每个网格宽度为75像素,并通过将比特相加堆叠在一起?
最终,我将得到一个形状为(75,75,1)的单个NumPy数组。此时,最后一个通道的值可以在0到16之间。
将其作为卷积神经网络的输入,这样做效果如何?这是缩小我的输入的有效方法吗?
英文:
Suppose I have a 300 x 300 input image with 1 channel, contained in a numpy array with shape (300, 300, 1). And the channel is single bit - either 0 or 1.
How can I divide it into a 4 x 4 grid, each grid being 75 by 75 pixels wide and stack the grids together by summing up the bits?
In the end, I'd have a single numpy array that's (75, 75, 1). The value of the last channel can range from 0 to 16 at this point.
How well would this work as an input to a convolutional neural network? Is this an effective way of shrinking my input?
答案1
得分: 1
You can do it using https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.as_strided.html
# 模拟单通道二进制图像
SIZE = 300
X = np.random.randint(low=0, high=2, size=(SIZE,SIZE,1))
BLOCK_SIZE = 4
assert SIZE % BLOCK_SIZE == 0
# 通过其步幅定义移动窗口
new_stride = [X.strides[0], X.strides[1], X.strides[0]*BLOCK_SIZE, X.strides[1]*BLOCK_SIZE, X.strides[2]]
coarsened_X = np.sum( np.lib.stride_tricks.as_strided(X, shape=[BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, SIZE//BLOCK_SIZE, 1], strides=new_stride), axis=(0,1))
关于您的最后一个问题:是的,我认为这是缩小CNN输入的一种相关方式,这取决于您有多少训练数据可用,它可能比在非常大的图像上使用第一个可训练卷积层更有效。请注意,出于优化目的,最好对输入进行平均而不是求和。
一个展示它大约比您的大小的数组快约50倍的基准测试:
编辑:
我刚刚尝试了只使用reshape和orient transform来完成这个方法,可能不太可读和/或不太通用:
X.reshape(BLOCK_SIZE, SIZE//BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, 1, order='F').sum(axis=(0,2))
我建议这样做是因为它似乎比使用stride的方式快几个百分点。以防感兴趣。
英文:
You can do it using https://numpy.org/devdocs/reference/generated/numpy.lib.stride_tricks.as_strided.html
#mocking a single channel binary image
SIZE = 300
X = np.random.randint(low=0, high=2, size=(SIZE,SIZE,1))
BLOCK_SIZE = 4
assert SIZE % BLOCK_SIZE == 0
# defining the moving window by its strides
new_stride = [X.strides[0], X.strides[1], X.strides[0]*BLOCK_SIZE, X.strides[1]*BLOCK_SIZE, X.strides[2]]
coarsened_X = np.sum( np.lib.stride_tricks.as_strided(X, shape=[BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, SIZE//BLOCK_SIZE, 1], strides=new_stride), axis=(0,1))
About your final question: yeah I do think it's a relevant way of shrinking the input of your CNN, depending on how much training data you have available it can be more efficient that a first trainable convolutional layer on a very large image. Note that it's preferable to average your inputs instead of summing them for optimization purposes.
A benchmark showing it's ~50x times faster for an array of your size:
edit:
I just tried this method of doing it using only reshape and orient transform, possibly less readable and/or generalizable:
X.reshape(BLOCK_SIZE, SIZE//BLOCK_SIZE, BLOCK_SIZE, SIZE//BLOCK_SIZE, 1, order='F').sum(axis=(0,2))
I am suggesting it as it seems to be a few percent faster than the stride way. Just in case of interest.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论