When fine-tuning a pre-trained Model, how does tensorflow know that the base_model has been changed?

huangapple go评论104阅读模式
英文:

When fine-tuning a pre-trained Model, how does tensorflow know that the base_model has been changed?

问题

Ng的卷积神经网络类的第2周实验是使用MobileNetV2进行迁移学习的实验。这个实验和一个额外的教程都以相似的方式开始:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False

然后,它们继续添加池化层、Dropout层和一个包含1个单元的密集层,应用BinaryCrossentropy损失和某种优化器,然后在输入的自定义数据上进行训练。我们可以称这个自定义模型为"model2",与Ng的实验中一样。

你提到的Coursera课程中的模型如下,这里的"base_model"变量之所以重要,是因为在Coursera实验的不同闭包中调用它(在这之前,它是在方法外部调用的,如下所示):

def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
    input_shape = image_shape + (3,0)
    base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=false, weights='imagenet')
    base_model.trainable = False
    # ...(模型的其余部分)
    return model

model2 = alpaca_model()
# ...(编译和训练模型的其余部分)

接下来,它们在两者中都进行"fine-tuning",解冻了内部网络的一些最后层,以便对它们进行训练,如下所示:

fine_tune_at = 120

base_model = model2.layers[4] # 这里是重点,稍后会解释
base_model.trainable = True

for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

# ...(损失函数、优化器和其他参数的设置)

接下来,你可能感到困惑的地方是在"fine-tuning"之后,似乎没有明显地将"base_model"传递给"model2",而结果却有显著的改善。这是因为在TensorFlow中,"base_model"和"model2"之间的关系是通过引用共享的。当你更改"base_model"的可训练性时,实际上也影响了"model2",因为它们都指向相同的基本模型。

因此,修改"base_model"实际上影响了"model2",因为"model2"构建在"base_model"之上,并且它们共享了相同的模型权重和结构。这就是为什么在"fine-tuning"之后,"model2"的性能有所改善。

英文:

Ng's Convolutional Neural Network class's Week 2 Lab on using Transfer Learning with MobileNetV2 (summary: https://github.com/EhabR98/Transfer-Learning-with-MobileNetV2) and an additional tutorial (https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/) both begin like this:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False

They then proceed to add a pooling layer(s), a Dropout layer and a Dense 1-unit layer to the end, apply a BinaryCrossentropy loss and some kind of optimizer, then train it on some custom data that has been inputted. Lets call this custom model "model2" as Ng's lab does

Here's what my the Coursera class model looks like, its important to include here because the variable base_model is called in two different closures throughout the Coursera lab (previous to this it was called outside of a method, as base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=True, weights='imagenet'); base_model.trainable= False)

def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter())
    input_shape = image_shape + (3,0)
    base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights='imagenet')
    base_model.trainable = False
    inputs = tf.keras.Input(shape=input_shape)
    x = data_augmentation(inputs)
    x = preprocess_input(x)
    x = base_model(x, training=False)
    x = tfl.GlobalAveragePooling2D()(x)
    x = tfl.Dropout(0.2)(x)
    prediction_layer = tfl.Dense(1)
    outputs = prediction_layer(x)
    model = tf.keras.Model(inputs, outputs)
    return model

model2 = alpaca_model()
base_learning_rate = 0.001
initial_epochs = 5
model2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=["accuracy"])
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)

This performs OK, getting as much as 80% accuracy

Fine tuning -- Now in both the course lab and the tutorial, they then proceed to "unfreeze" some of the last layers of the internal network so that they can be trained, like so:

fine_tune_at = 120

base_model = model2.layers[4] #totally separate question, but I would love to hear in comments, what this does exactly. It is difficult to Google this.
base_model.trainable = True

print("#/layers in base model: ", len(base_model.layers))

for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

loss_function = tf.keras.losses.BinaryCrossentrop(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=base_learning_rate*0.1)
metrics = ['accuracy']

fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs

Up until this point, I'm satisfied, I can clearly see what is going on, but then:

model2.compile(loss=loss_function,optimizer=optimizer,metrics=metrics)
history_fine = model2.fit(train_dataset, epochs=total_epochs, initial_epoch=history.epoch[-1], validation_data = validation_dataset)

This leads to a marked improvement in results. Which confused me, I was very much expecting base_model to get passed in somehow. I didn’t imagine that altering some other variable that hasn’t been passed into or been initially called would come into play.

So given all of that context, the question is: How is altering the base_model affecting model2?

If the above example from the Coursera lab is as confusing to you as it is to me, the example shown on https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/ as mentioned above is much simpler and contains much less ambiguity as base_model is defined only once. Regardless, the same dynamic applies and I'm equally confused on both. Thanks again for your time

答案1

得分: 1

以下是您要翻译的内容:

Your

totally separate question, but I would love to hear in comments, what this does exactly.

The following list get the MobileNetV2 model:

base_model = model2.layers[4]

Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.

How is altering the base_model affecting model2

During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.

To view the full model summary with nested models, use:

>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)
英文:

Your

> totally separate question, but I would love to hear in comments, what this does exactly.

The following list get the MobileNetV2 model:

base_model = model2.layers[4]

Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.

> How is altering the base_model affecting model2

During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.

To view the full model summary with nested models, use:

>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)

答案2

得分: 0

我会提前发布一个回答,针对 (问题) Professor Ng 的 Convolution Neural Network,第2周作业,"使用 MobileNet 进行迁移学习",希望其他学生能找到这个答案,并意识到他们并不疯狂,这个实验室的代码写得很糟糕。

我不确定它在 Jupyter 上是如何运行的,但我在这个实验室遇到问题的主要原因是 base_model 在实验室中被多次定义。它应该只被定义一次。更糟糕的是,base_modelalpaca_model() 函数内部被重新定义,但这在函数的封闭范围之外是无法访问的。我不在行业中工作,但在一个已经定义的方法内重新定义一个变量,然后在方法之外再次调用它,这真的是很糟糕的编码方式。

一旦我将 base_model 从函数中拿出来,在之前定义它,一切都不仅在计算机上运行完美,而且在我的头脑中也很清晰。

英文:

I'll go ahead and post an answer that speaks to (the problem with) Professor Ng's Convolution Neural Network, Week 2 Assignment, "Transfer Learning with MobileNet", with the hope that other students might find this answer and realize that they are not crazy and that the Lab was poorly coded.

I'm not sure how it (appears to work) on Jupyter, but the main reason I was having problems with this lab was that base_model was defined several times within the lab. It should have only been defined once. Even worse, the base_model was redefined inside the alpaca_model() function, but that's not accessible outside the closure of the function. I'm not in the industry, but that is just plain terrible coding to redefine a variable inside a method that's already been defined then call it again outside of the method.

Once I took base_model out of the function, defining it beforehand, everything works perfectly not just on the computer, but in my head.

huangapple
  • 本文由 发表于 2023年2月7日 03:28:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75365737.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定