英文:
Exchange a pooling layer using conv2d layer in keras
问题
我有一个使用Keras构建的神经网络,其中包括两个Conv2D层、一个平均池化层和一个稠密输出层。我希望将训练好的模型后续部署到FPGA上,但架构不支持MaxPooling或AveragePooling层。然而,我在某处看到可以通过调整参数来使用Conv2D层进行池化,但我不确定如何准确实现。
我天真地以为,像这样的Pooling层(最大或平均或其他类型):
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1, 3)))
会大致完成与以下代码相同的任务:
model.add(tf.keras.layers.Conv2D(1, (1, 3), strides=(1, 3), use_bias=False, padding='same', name='Conv3'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
我认为选择一个滤波器等效于告诉网络执行一次操作(例如,池化或最大值,选择哪个效果最好)。并且维度和步幅应该与Pooling层相对应。然而,我的模型中的总参数数量相差很大,我不明白为什么使用AveragePooling的模型有15,644个参数,而使用Conv2D变体的模型只有2,604个参数?此外,使用这种方式进行模型训练时性能明显较差。
英文:
I have a neural network in keras with two conv2d layers, an average pooling layer and a dense output layer.
I want to put the trained model on an FPGA later on and the architecture does not support MaxPooling or AveragePooling layers.
However, I read somewhere that you can basically use a conv2d for pooling by playing with the parameters, I am unsure how to do it exactly.
I naively thought that a Pooling layer (max or average or whatever) like this:
model.add(tf.keras.layers.AveragePooling2D(pool_size=(1, 3)))
would do roughly the same job as this:
model.add(tf.keras.layers.Conv2D(1, (1, 3),strides=(1,3),use_bias=False,padding='same',name='Conv3'))
model.add(tf.keras.layers.BatchNormalization())
model.add(tf.keras.layers.Activation('relu'))
Where I thought that choosing 1 filter would be equivalent of telling to network to do one operation (e.g. pooling or maxing, whichever seems best). And the dimension and stride should correspond to that of the Pooling layer.
However the total parameters in my model are vastly different and I fail to understand why my model with the averagepooling has 15,644 parameters and my model with the conv2d variant only has 2,604 parameters?
Also the model performs a lot worse when doing it like this.
答案1
得分: 1
以下是代码的翻译部分:
# 你可以创建卷积层,设置权重以执行平均池化操作,然后将该层设置为不可训练。
示例代码:
conv_pool_weights = np.zeros((2, 2, 4, 4)) # 此形状应根据先前层的输出形状来计算
for i in range(conv_pool_weights.shape[2]):
conv_pool_weights[:,:,i,i] = 1./(conv_pool_weights.shape[0]*conv_pool_weights.shape[1])
conv_pool = Conv2D(4, kernel_size=(2, 2), strides=(2, 2), input_shape=(16, 16, 4), use_bias=False)
model_conv = Sequential(
conv_pool
)
conv_pool.set_weights([conv_pool_weights])
conv_pool.trainable = False
model_pool = Sequential(
AveragePooling2D(input_shape=(16, 16, 4))
)
model_conv.summary()
model_pool.summary()
输出:
模型: "sequential"
_________________________________________________________________
层 (类型) 输出形状 参数数
=================================================================
conv2d (Conv2D) (None, 8, 8, 4) 64
=================================================================
总参数: 64
可训练参数: 0
不可训练参数: 64
_________________________________________________________________
模型: "sequential_1"
_________________________________________________________________
层 (类型) 输出形状 参数数
=================================================================
average_pooling2d (AverageP (None, 8, 8, 4) 0
ooling2D)
=================================================================
总参数: 0
可训练参数: 0
不可训练参数: 0
_________________________________________________________________
测试:
random_input = np.random.random((4, 16, 16, 4))
pred_1 = model_pool.predict(random_input)
pred_2 = model_conv.predict(random_input)
print(np.mean(np.abs(pred_1 - pred_2)))
输出:
1.1503289e-08
正如我们所看到的,存在一些差异,但是它们是可以忽略的。
英文:
You could create conv layer, set weights that would perform average pooling and then set this layer as not trainable.
Example code:
conv_pool_weights = np.zeros((2, 2, 4, 4)) # this shape should be computed depending on shape of previous layer's output
for i in range(conv_pool_weights.shape[2]):
conv_pool_weights[:,:,i,i] = 1./(conv_pool_weights.shape[0]*conv_pool_weights.shape[1])
conv_pool = Conv2D(4, kernel_size=(2, 2), strides=(2, 2), input_shape=(16, 16, 4), use_bias=False)
model_conv = Sequential(
conv_pool
)
conv_pool.set_weights([conv_pool_weights])
conv_pool.trainable = False
model_pool = Sequential(
AveragePooling2D(input_shape=(16, 16, 4))
)
model_conv.summary()
model_pool.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 8, 8, 4) 64
=================================================================
Total params: 64
Trainable params: 0
Non-trainable params: 64
_________________________________________________________________
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
average_pooling2d (AverageP (None, 8, 8, 4) 0
ooling2D)
=================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
Test:
random_input = np.random.random((4, 16, 16, 4))
pred_1 = model_pool.predict(random_input)
pred_2 = model_conv.predict(random_input)
print(np.mean(np.abs(pred_1 - pred_2)))
Output:
1.1503289e-08
As we can see there is some difference but it is negligible.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论