2023年2月7日 03:28:40go评论104阅读模式

英文:

When fine-tuning a pre-trained Model, how does tensorflow know that the base_model has been changed?

问题

Ng的卷积神经网络类的第2周实验是使用MobileNetV2进行迁移学习的实验。这个实验和一个额外的教程都以相似的方式开始：

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
base_model.trainable = False

然后，它们继续添加池化层、Dropout层和一个包含1个单元的密集层，应用BinaryCrossentropy损失和某种优化器，然后在输入的自定义数据上进行训练。我们可以称这个自定义模型为"model2"，与Ng的实验中一样。

你提到的Coursera课程中的模型如下，这里的"base_model"变量之所以重要，是因为在Coursera实验的不同闭包中调用它（在这之前，它是在方法外部调用的，如下所示）：

def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter()):
    input_shape = image_shape + (3,0)
    base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=false, weights='imagenet')
    base_model.trainable = False
    # ...（模型的其余部分）
    return model

model2 = alpaca_model()
# ...（编译和训练模型的其余部分）

接下来，它们在两者中都进行"fine-tuning"，解冻了内部网络的一些最后层，以便对它们进行训练，如下所示：

fine_tune_at = 120

base_model = model2.layers[4] # 这里是重点，稍后会解释
base_model.trainable = True

for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

# ...（损失函数、优化器和其他参数的设置）

接下来，你可能感到困惑的地方是在"fine-tuning"之后，似乎没有明显地将"base_model"传递给"model2"，而结果却有显著的改善。这是因为在TensorFlow中，"base_model"和"model2"之间的关系是通过引用共享的。当你更改"base_model"的可训练性时，实际上也影响了"model2"，因为它们都指向相同的基本模型。

因此，修改"base_model"实际上影响了"model2"，因为"model2"构建在"base_model"之上，并且它们共享了相同的模型权重和结构。这就是为什么在"fine-tuning"之后，"model2"的性能有所改善。

英文:

Ng's Convolutional Neural Network class's Week 2 Lab on using Transfer Learning with MobileNetV2 (summary: https://github.com/EhabR98/Transfer-Learning-with-MobileNetV2) and an additional tutorial (https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/) both begin like this:

base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights=&#39;imagenet&#39;)
base_model.trainable = False

They then proceed to add a pooling layer(s), a Dropout layer and a Dense 1-unit layer to the end, apply a BinaryCrossentropy loss and some kind of optimizer, then train it on some custom data that has been inputted. Lets call this custom model "model2" as Ng's lab does

Here's what my the Coursera class model looks like, its important to include here because the variable base_model is called in two different closures throughout the Coursera lab (previous to this it was called outside of a method, as base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=True, weights='imagenet'); base_model.trainable= False)

def alpaca_model(image_shape=IMG_SIZE, data_augmentation=data_augmenter())
    input_shape = image_shape + (3,0)
    base_model = tf.keras.applications.MobileNetV2(input_shape=input_shape, include_top=False, weights=&#39;imagenet&#39;)
    base_model.trainable = False
    inputs = tf.keras.Input(shape=input_shape)
    x = data_augmentation(inputs)
    x = preprocess_input(x)
    x = base_model(x, training=False)
    x = tfl.GlobalAveragePooling2D()(x)
    x = tfl.Dropout(0.2)(x)
    prediction_layer = tfl.Dense(1)
    outputs = prediction_layer(x)
    model = tf.keras.Model(inputs, outputs)
    return model

model2 = alpaca_model()
base_learning_rate = 0.001
initial_epochs = 5
model2.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=base_learning_rate), loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=[&quot;accuracy&quot;])
history = model2.fit(train_dataset, validation_data=validation_dataset, epochs=initial_epochs)

This performs OK, getting as much as 80% accuracy

Fine tuning -- Now in both the course lab and the tutorial, they then proceed to "unfreeze" some of the last layers of the internal network so that they can be trained, like so:

fine_tune_at = 120

base_model = model2.layers[4] #totally separate question, but I would love to hear in comments, what this does exactly. It is difficult to Google this.
base_model.trainable = True

print(&quot;#/layers in base model: &quot;, len(base_model.layers))

for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

loss_function = tf.keras.losses.BinaryCrossentrop(from_logits=True)
optimizer = tf.keras.optimizers.Adam(learning_rate=base_learning_rate*0.1)
metrics = [&#39;accuracy&#39;]

fine_tune_epochs = 5
total_epochs = initial_epochs + fine_tune_epochs

Up until this point, I'm satisfied, I can clearly see what is going on, but then:

model2.compile(loss=loss_function,optimizer=optimizer,metrics=metrics)
history_fine = model2.fit(train_dataset, epochs=total_epochs, initial_epoch=history.epoch[-1], validation_data = validation_dataset)

This leads to a marked improvement in results. Which confused me, I was very much expecting base_model to get passed in somehow. I didn’t imagine that altering some other variable that hasn’t been passed into or been initially called would come into play.

So given all of that context, the question is: How is altering the base_model affecting model2?

If the above example from the Coursera lab is as confusing to you as it is to me, the example shown on https://blog.roboflow.com/how-to-train-mobilenetv2-on-a-custom-dataset/ as mentioned above is much simpler and contains much less ambiguity as base_model is defined only once. Regardless, the same dynamic applies and I'm equally confused on both. Thanks again for your time

答案1

得分: 1

以下是您要翻译的内容：

Your

totally separate question, but I would love to hear in comments, what this does exactly.

The following list get the MobileNetV2 model:

base_model = model2.layers[4]

Why 4? Because the first layer is the input, the second layer is the data augmentation (a Sequential model), the third and fourth layers are for image preprocessing (divide by 127.5 and subtract -1 to have values between -1 and 1), the fifth layer is MobileNetV2 (index 4). The other layers are your top-net.

How is altering the base_model affecting model2

During the first pass (transfer learning), all layers of MNv2 are frozen, so weights and biases remain intact. Whereas for the second pass (fine tuning), the last convolution layers (block 13 to 16 and last Conv2D) are now unfrozen so that the model can modify the weights and bias of the base model. Therefore, the following layers will be changed during training.

To view the full model summary with nested models, use:

>>> model2.summary(line_length=125, expand_nested=True, show_trainable=True)

英文:

Your

> totally separate question, but I would love to hear in comments, what this does exactly.

The following list get the MobileNetV2 model:

base_model = model2.layers[4]

> How is altering the base_model affecting model2

To view the full model summary with nested models, use:

&gt;&gt;&gt; model2.summary(line_length=125, expand_nested=True, show_trainable=True)

答案2

得分: 0

我会提前发布一个回答，针对 (问题) Professor Ng 的 Convolution Neural Network，第2周作业，"使用 MobileNet 进行迁移学习"，希望其他学生能找到这个答案，并意识到他们并不疯狂，这个实验室的代码写得很糟糕。

我不确定它在 Jupyter 上是如何运行的，但我在这个实验室遇到问题的主要原因是 base_model 在实验室中被多次定义。它应该只被定义一次。更糟糕的是，base_model 在 alpaca_model() 函数内部被重新定义，但这在函数的封闭范围之外是无法访问的。我不在行业中工作，但在一个已经定义的方法内重新定义一个变量，然后在方法之外再次调用它，这真的是很糟糕的编码方式。

一旦我将 base_model 从函数中拿出来，在之前定义它，一切都不仅在计算机上运行完美，而且在我的头脑中也很清晰。

英文:

I'll go ahead and post an answer that speaks to (the problem with) Professor Ng's Convolution Neural Network, Week 2 Assignment, "Transfer Learning with MobileNet", with the hope that other students might find this answer and realize that they are not crazy and that the Lab was poorly coded.

I'm not sure how it (appears to work) on Jupyter, but the main reason I was having problems with this lab was that base_model was defined several times within the lab. It should have only been defined once. Even worse, the base_model was redefined inside the alpaca_model() function, but that's not accessible outside the closure of the function. I'm not in the industry, but that is just plain terrible coding to redefine a variable inside a method that's already been defined then call it again outside of the method.

Once I took base_model out of the function, defining it beforehand, everything works perfectly not just on the computer, but in my head.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

When fine-tuning a pre-trained Model, how does tensorflow know that the base_model has been changed?

问题

答案1

答案2

在笛卡尔平面上移动的概率

Pandas Data Error on value_counts() does not display the count correctly to clean data.

优化 Django 查询 – 减少数据库请求和正确的查询集访问

创建一个来自 pandas 的唯一的 JSON 对象。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论