英文:
Terrible accuracy in keras CNN
问题
I am a beginner to deep learning and I am trying to create a model of handwritten words classification, I created the dataset and it contains 71 different classes with 1000 image of each class.
The problem is that I tried to create a CNN model with different combinations of convolutional, max pooling and dense layers, while also changing the optimizer, but the accuracy remains TERRIBLE. Here are my results.
Is this a problem in the model, in the dataset or my parameters? What do you suggest?
Here is the last model I tried with:
model = Sequential([
Conv2D(32, kernel_size=(2, 2), activation="relu", input_shape=(143, 75, 1)),
MaxPooling2D(pool_size=(3, 3)),
Conv2D(64, kernel_size=(4, 4), activation="relu"),
MaxPooling2D(pool_size=(9, 9)),
Flatten(),
Dense(512, activation="relu"),
Dense(128, activation="sigmoid"),
Dense(71, activation="softmax")
])
model.compile(optimizer=Nadam(learning_rate=0.01), loss="categorical_crossentropy", metrics=["accuracy"])
英文:
I am a beginner to deep learning and I am trying to create a model of handwritten words classification, I created the dataset and it contains 71 different classes with 1000 image of each class.
The problem is that I tried to create a CNN model with different combinations of convolutional, max pooling and dense layers, while also changing the optimizer, but the accuracy remains TERRIBLE.Here are my results.
Is this a problem in the model, in the dataset or my parameters? What do you suggest?
Here is the last model I tried with
model = Sequential([
Conv2D(32, kernel_size=(2, 2),activation="relu", input_shape=(143, 75, 1)),
MaxPooling2D(pool_size=(3,3)),
Conv2D(64, kernel_size=(4, 4),activation="relu"),
MaxPooling2D(pool_size=(9,9)),
Flatten(),
Dense(512, activation="relu"),
Dense(128, activation="sigmoid"),
Dense(71, activation="softmax")
])
model.compile(optimizer=Nadam(learning_rate=0.01), loss="categorical_crossentropy", metrics=["accuracy"])
答案1
得分: 1
以下是您要翻译的内容:
"你的模型问题在于池化层的大小。Keras的官方文档对池化层有以下解释:
沿着输入的空间维度(高度和宽度)通过在输入窗口上(窗口大小由pool_size定义)每个通道取最大值来降采样输入。窗口沿每个维度移动strides。
默认情况下,池化层的池化大小为(2,2)
,这意味着在一个4个元素的矩阵窗口中,只考虑最大值。
如果我们打印您模型的摘要,我们会得到如下结果。
所以从这些层和其形状来看,conv2d_1(Conv2D)
层的输出与max_pooling2d_1(MaxPooling2D)
层的输出之间存在很大的变化。输出形状从(44,21,64)
变为(4,2,64)
。这是因为您在密集层之前使用了(9,9)
的池化层大小。
为了理解池化层大小和池化的效果,请考虑下面的输入图像,其大小为(183,183,3)
。
现在,当我们对上述图像应用2D最大池化,池化大小为(2,2)
时,我们得到以下图像,其空间尺寸缩小为(91,91,3)
。这里图像的尺寸减小了,但图像内的信息被保留了。
现在,对于相同的输入图像,池化大小为(3,3)
的最大池化输出将是以下图像,尺寸为(61,61,3)
。
使用池化大小为(5,5)
时,我们得到最大池输出:
其空间尺寸为(36,36,3)
。在这里,您根本看不到任何信息。
"为什么会这样?"因为白色像素是255,黑色像素是0,当您进行最大池化时,始终会选择255。现在,由于您使用的池化大小是(9,9)
,在窗口中考虑了更多的白色像素以及黑色像素,从而使您的空间尺寸减小到(20,20,3)
,就像使用池化大小(5,5)
的情况一样。(这里只显示了对图像的池化效果。当您添加Conv2D层时,输出将根据滤波器值而变化。)
使用这种模型,您将无法学到任何东西,即使您更改了其他组件,如优化器或损失函数。
因此,请将模型架构更改为以下内容并重新训练您的网络。
model = Sequential([
Conv2D(32, kernel_size=(3, 3), activation="relu", input_shape=(143, 75, 1)),
MaxPooling2D(),
Conv2D(64, kernel_size=(3, 3), activation="relu"),
MaxPooling2D(),
Flatten(),
Dense(512, activation="relu"),
Dense(128, activation="relu"),
Dense(71, activation="softmax")
])
model.compile(optimizer=tf.keras.optimizers.Nadam(learning_rate=0.01), loss="categorical_crossentropy", metrics=["accuracy"])
**附言:**通常情况下,卷积层的内核大小选择为(3,3)或(5,5)。例如,当您有一个卷积层Conv2D(64, kernel_size=(3, 3))
时,每个具有大小(3,3)的内核都有64个。此外,在将图像输入模型之前,请不要忘记对图像进行标准化。
祝好!"
英文:
The problem with your model is your pool size. The official documentation of Keras says this about the pooling layer
> Downsamples the input along its spatial dimensions (height and width) by taking the maximum value over an input window (of size defined by pool_size) for each channel of the input. The window is shifted by strides along each dimension.
By default, the pooling layer has a pool size of (2,2)
which means that in a matrix window of 4 elements, only the maximum value is taken into consideration.
If we print the summary of your model we get
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 142, 74, 32) 160
max_pooling2d (MaxPooling2D (None, 47, 24, 32) 0
)
conv2d_1 (Conv2D) (None, 44, 21, 64) 32832
max_pooling2d_1 (MaxPooling (None, 4, 2, 64) 0
2D)
flatten (Flatten) (None, 512) 0
dense (Dense) (None, 512) 262656
dense_1 (Dense) (None, 128) 65664
dense_2 (Dense) (None, 71) 9159
=================================================================
Total params: 370,471
Trainable params: 370,471
Non-trainable params: 0
_________________________________________________________________
So by looking at those layers and its shapes, there is a big change between conv2d_1(Conv2D)
layer output and max_pooling2d_1(MaxPooling2D)
layer output. The output shapes changes from (44,21,64)
to (4,2,64)
. This is because you are using a pool size of (9,9)
for your Pooling layer before the Dense layer.
To understand the effect of the pool size and pooling, consider the below input image which has a size of (183,183,3)
.
Now when we apply a 2D max pooling of the above image with a pool size of (2,2)
, we get the following image whose spatial dimensions are reduced to (91,91,3)
. Here the image dimensions got reduced but the information within the image is preserved.
Now, for the same input image, the max pooling output for a pool size of (3,3)
would be the following image with dimensions of (61,61,3)
Here you can barely see the word Awesome in the image.
with a pool size of (5,5)
we get the max pool output as
with a spatial dimension of (36,36,3)
. Here you don't see any information at all.
Why is that ?
Because white pixels are 255 and black pixels are 0 and when you do a max pool you always take 255. Now since you are using a pool size of (9,9)
, you consider more of the white pixels along with black ones in a window and you lose out on the useful information with your spatial dimensions reducing to (20,20,3)
just like the case with the pool size of (5,5)
. (Here only the effect of pooling is shown on the image. When you add the Conv2D layers, the output will change based on the filter values.)
With this your model will not be able to learn anything despite you changing any other components like optimizers or loss functions.
So, change the model architecture to something like below and retrain your network
model = Sequential([
Conv2D(32, kernel_size=(3, 3),activation="relu", input_shape=(143, 75, 1)),
MaxPooling2D(),
Conv2D(64, kernel_size=(3, 3),activation="relu"),
MaxPooling2D(),
Flatten(),
Dense(512, activation="relu"),
Dense(128, activation="relu"),
Dense(71, activation="softmax")
])
model.compile(optimizer=tf.keras.optimizers.Nadam(learning_rate=0.01), loss="categorical_crossentropy", metrics=["accuracy"])
P.S: It is a common choice to take the kernel size to be either (3,3) or (5,5) for a convolution layer. For e.g. when you have a convolution layer as Conv2D(64, kernel_size=(3, 3))
, you will have 64 filter each with a size of (3,3). Also, don't forget to normalize your images before you feed them to the model.
Cheers !!!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论