2023年4月10日 21:34:16go评论66阅读模式

英文:

Confusion in the calculation of hidden layer size in CNN

问题

以下是代码中的中文翻译部分：

我正在尝试理解卷积神经网络。我正在阅读《深度学习》一书。以下是他们编写的代码。

import numpy as np, sys
np.random.seed(1)

from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

images, labels = (x_train[0:1000].reshape(1000, 28*28)/255, y_train[0:1000])

one_hot_labels = np.zeros((len(labels), 10))

for i, l in enumerate(labels):
    one_hot_labels[i][l] = 1
labels = one_hot_labels

test_images = x_test.reshape(len(x_test), 28*28) / 255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
    test_labels[i][l] = 1

def tanh(x):
    return np.tanh(x)

def tanh2deriv(output):
    return 1 - (output ** 2)

def softmax(x):
    temp = np.exp(x)
    return temp/np.sum(temp, axis=1, keepdims=True)

alpha, iterations = (2, 30)
pixels_per_image, num_labels = (784, 10)
batch_size = 128

input_rows = 28
input_cols = 28

kernel_rows = 3
kernel_cols = 3
num_kernels = 16

hidden_size = ((input_rows - kernel_rows)*(input_cols - kernel_cols))*num_kernels

kernels = 0.02*np.random.random((kernel_rows*kernel_cols, num_kernels))-0.01

weights_1_2 = 0.02*np.random.random((hidden_size, num_labels))-0.1

def get_image_section(layer, row_from, row_to, col_from, col_to):
    section = layer[:, row_from:row_to, col_from:col_to]
    return section.reshape(-1,1, row_to-row_from,col_to-col_from)

for j in range(iterations):
    # ...（以下为代码的其余部分，省略）

关于你的疑问：

hidden_size = ((input_rows - kernel_rows)*(input_cols - kernel_cols))*num_kernels

这行代码计算了隐藏层的大小，但它似乎与你的期望不符。你提到有一个5x5图像，3x3过滤器，1个过滤器，1个步幅，没有填充。根据这些值，实际上你应该得到 hidden_size = 4，这是正确的。

你可能感到困惑的原因是隐藏层的大小与卷积操作的数量不完全相同。hidden_size 只是表示了隐藏层的神经元数量，它不考虑卷积操作的数量。在这种情况下，你有一个 5x5 图像，应用 3x3 过滤器，不使用填充，所以只有一个过滤器可以在图像上移动。这一个过滤器的每次移动都会产生一个值，共有 4 个位置可以移动，因此 hidden_size = 4。

卷积操作的数量是由图像的大小、过滤器的大小、步幅和填充决定的。你正确地指出，在这种情况下，有 9 个卷积操作。但 hidden_size 不是用来表示卷积操作的数量的，它仅仅表示隐藏层的神经元数量。

英文:

I am trying to understand the convolutional neural network. I am reading the book deep learning by grokking. Here is the code that they have written.

import numpy as np, sys
np.random.seed(1)
from keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
images, labels = (x_train[0:1000].reshape(1000, 28*28)/255, y_train[0:1000])
one_hot_labels = np.zeros((len(labels), 10))
for i, l in enumerate(labels):
one_hot_labels[i][l] = 1
labels = one_hot_labels
test_images = x_test.reshape(len(x_test), 28*28) / 255
test_labels = np.zeros((len(y_test), 10))
for i, l in enumerate(y_test):
test_labels[i][l] = 1
def tanh(x):
return np.tanh(x)
def tanh2deriv(output):
return 1 - (output ** 2)
def softmax(x):
temp = np.exp(x)
return temp/np.sum(temp, axis=1, keepdims=True)
alpha, iterations = (2, 30)
pixels_per_image, num_labels = (784, 10)
batch_size = 128
input_rows = 28
input_cols = 28
kernel_rows = 3
kernel_cols = 3
num_kernels = 16
hidden_size = ((input_rows - kernel_rows)*(input_cols - kernel_cols))*num_kernels
kernels = 0.02*np.random.random((kernel_rows*kernel_cols, num_kernels))-0.01
weights_1_2 = 0.02*np.random.random((hidden_size, num_labels))-0.1
def get_image_section(layer, row_from, row_to, col_from, col_to):
section = layer[:, row_from:row_to, col_from:col_to]
return section.reshape(-1,1, row_to-row_from,col_to-col_from)
for j in range(iterations):
correct_cnt = 0
for i in range(int(len(images)/batch_size)):
batch_start, batch_end = ((i*batch_size), ((i+1)*batch_size))
layer_0 = images[batch_start:batch_end]
layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
# print(layer_0.shape)
sects = list()
for row_start in range(layer_0.shape[1]-kernel_rows):
for col_start in range(layer_0.shape[2]-kernel_cols):
sect = get_image_section(layer_0, row_start, row_start+kernel_rows, col_start, col_start+kernel_cols)
sects.append(sect)
expanded_input = np.concatenate(sects, axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0]*es[1], -1)
kernel_output = flattened_input.dot(kernels)
layer_1 = tanh(kernel_output.reshape(es[0], -1))
dropout_mask = np.random.randint(2, size=layer_1.shape)
layer_1 *= dropout_mask*2
layer_2 = softmax(np.dot(layer_1, weights_1_2))
for k in range(batch_size):
labelset = labels[batch_start+k:batch_start+k+1]
_inc = int(np.argmax(layer_2[k:k+1]) == np.argmax(labelset))
correct_cnt += _inc
layer_2_delta = (labels[batch_start:batch_end]-layer_2)/(batch_size*layer_2.shape[0])
layer_1_delta = layer_2_delta.dot(weights_1_2.T)*tanh2deriv(layer_1)
layer_1_delta *= dropout_mask
weights_1_2 += alpha*layer_1.T.dot(layer_2_delta)
l1d_reshape = layer_1_delta.reshape(kernel_output.shape)
k_update = flattened_input.T.dot(l1d_reshape)
kernels -= alpha*k_update
test_correct_cnt = 0
for i in range(len(test_images)):
layer_0 = test_images[i:i+1]
layer_0 = layer_0.reshape(layer_0.shape[0], 28, 28)
sects = list()
for row_start in range(layer_0.shape[1]-kernel_rows):
for col_start in range(layer_0.shape[2]-kernel_cols):
sect = get_image_section(layer_0, row_start, row_start+kernel_rows, col_start, col_start+kernel_rows)
sects.append(sect)
expanded_input = np.concatenate(sects, axis=1)
es = expanded_input.shape
flattened_input = expanded_input.reshape(es[0]*es[1], -1)
kernel_output = flattened_input.dot(kernels)
layer_1 = tanh(kernel_output.reshape(es[0], -1))
layer_2 = np.dot(layer_1, weights_1_2)
test_correct_cnt += int(np.argmax(layer_2) == np.argmax(test_labels[i:i+1]))
if(j%10 == 0):
print(f&quot;I:{j} Test-Acc:{test_correct_cnt/float(len(test_images))} Train-Acc:{correct_cnt/float(len(images))}&quot;)

I am confused about the following line

hidden_size = ((input_rows - kernel_rows)*(input_cols - kernel_cols))*num_kernels

So, if I have a 5x5 image, 3x3 filter, 1 filter, 1 stride and no padding then according to this equation I should have hidden_size as 4. But If I do a convolution operation on paper then I am doing 9 convolution operations. So can anyone please explain what am I doing wrong?

答案1

得分: 2

I am not familiar with that book, but the code you presented seems to ignore a column/row at the end. If you add +1 as shown below, you will get 9 convolution operations for a 5x5 image.

# from
hidden_size = ((input_rows - kernel_rows)*(input_cols - kernel_cols))*num_kernels

# to
hidden_size = ((input_rows - kernel_rows + 1)*(input_cols - kernel_cols + 1))*num_kernels

# This changes are required for both training and testing.
# from
for row_start in range(layer_0.shape[1]-kernel_rows):
    for col_start in range(layer_0.shape[2]-kernel_cols):

# to
for row_start in range(layer_0.shape[1]-kernel_rows + 1):
    for col_start in range(layer_0.shape[2]-kernel_cols + 1):

In case you didn't get what these changes were about, here are some animations to help you understand. Each cell represents a pixel in the image, and the red box represents the convolution kernel.

英文:

I am not familiar with that book, but the code you presented seems to ignore a column/row at the end. If you add +1 as shown below, you will get 9 convolution operations for a 5x5 image.

# from
hidden_size = ((input_rows - kernel_rows)*(input_cols - kernel_cols))*num_kernels

# to
hidden_size = ((input_rows - kernel_rows + 1)*(input_cols - kernel_cols + 1))*num_kernels

# This changes are required for both training and testing.
# from
        for row_start in range(layer_0.shape[1]-kernel_rows):
            for col_start in range(layer_0.shape[2]-kernel_cols):

# to
        for row_start in range(layer_0.shape[1]-kernel_rows + 1):
            for col_start in range(layer_0.shape[2]-kernel_cols + 1):

In case you didn't get what these changes were about, here are some animations to help you understand. Each cell represents a pixel in the image, and the red box represents the convolution kernel.

before:

after:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在CNN中计算隐藏层大小时存在混淆。

问题

答案1

向字典中的列表添加数值

比较NumPy数组中的数字。

dict_items类未显示正确的类继承。

How can I add my custom method for image preprocessing task in model.Sequential() of Tensorflow?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论