2023年7月3日 16:38:50go评论62阅读模式

英文:

Torch loaded model accuracy is downed

问题

我已经训练了一个ResNet模型，并将其权重保存到一个.pt文件中，如下所示。

这是文件 1

model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr=0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)
torch.save(model.state_dict(), 'myresnet.pt')

model.eval()
loss, acc, y_pred, y_true = test_model(model, criterion)

我训练了一个模型，并获得了95%的验证准确性。

然后，我在一个单独的测试集上测试了模型，在那里获得了93%的准确性。

在这些步骤之后，我关闭了我的代码文件。

后来，我创建了一个新的空白代码脚本，并加载了模型的保存权重（.pt文件）以供进一步使用。

这是文件 2

model = models.resnet50()
state_dict = torch.load('myresnet.pt')
model.load_state_dict(state_dict)
model.eval()
model.to(device)
loss, acc, y_pred, y_true = test_model(model, criterion)

问题

在加载.pt文件并仅使用测试集数据进行测试后，测试准确性严重下降到20.6%。

我的尝试

最初，我怀疑.pt文件已损坏，因此多次重新运行了我的代码，但情况没有改变。

我将文件2中的所有代码复制并附加到文件1中，结果得到了理想的准确性。

为什么会发生这种情况？这与数据加载器有关吗？

以下是我的数据加载器：

batch_size = 4
image_size = [32, 32]
random_seed = int(time.time() // 1000)
random.seed(random_seed)

def random_ratio_3d(): return [randrange(0, 100) / 100, randrange(0, 100) / 100, randrange(0, 100) / 100]
tmp_mean, tmp_std = random_ratio_3d(), random_ratio_3d()

data_test_path = 'data/test/'
test_dataset = ImageFolder(data_test_path, Compose([Resize(image_size), ToTensor(), Normalize(mean=tmp_mean, std=tmp_std)]))

datasets = {}
datasets['test'] = test_dataset

dataloaders, batch_num = {}, {}
num_workers = 6  # CPU核心数量的一半
dataloaders['test'] = DataLoader(datasets['test'], batch_size=batch_size, shuffle=True, num_workers=num_workers)
batch_num['test'] = len(dataloaders['test'])

请问你还需要其他帮助吗？

英文:

I have trained a ResNet model and saved its weights to a .pt file as shown below.

## This is file 1 ##
model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr = 0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)
torch.save(model.state_dict(), myresnet.pt&#39;)

model.eval()
loss, acc, y_pred, y_true = test_model(model, criterion)

I trained a model and achieved a validation accuracy of 95%.

Then, I tested the model on a separate test set, where it achieved an accuracy of 93%.

After these steps, I closed my code files.

Later, I created new empty code script and loaded the saved weights of the model(.pt file) for further use

## This is file 2 ##
model = models.resnet50()
state_dict = torch.load(&#39;myresnet.pt&#39;)
model.load_state_dict(state_dict)
model.eval()
model.to(device)
loss, acc, y_pred, y_true = test_model(model, criterion)

Problem

After loading the .pt file and testing with test set data only, the test accuracy seriously decreased to 20.6%

My try

Initially, I suspected that the .pt file was corrupted, so I reran my code multiple times, but the situation remained unchanged.

I copied all the code from file 2 and appended it to file 1, which resulted in the desirable accuracy.

why happend this? this is something to do with dataloader?

the below is my dataloader

batch_size  = 4
image_size = [32, 32]
random_seed = int(time.time()//1000)
random.seed(random_seed)

def random_ratio_3d(): return [randrange(0, 100)/100, randrange(0, 100)/100, randrange(0, 100)/100]
tmp_mean, tmp_std = random_ratio_3d(), random_ratio_3d()

#data_train_path = &#39;data/train/&#39;
data_test_path = &#39;data/test/&#39;
#train_dataset = ImageFolder(data_train_path, Compose([Resize(image_size), ToTensor(), Normalize(mean=tmp_mean, std=tmp_std)]))
test_dataset = ImageFolder(data_test_path, Compose([Resize(image_size), ToTensor(), Normalize(mean=tmp_mean, std=tmp_std)]))

#train_idx, valid_idx = train_test_split(list(range(len(train_dataset))), test_size=0.2, random_state=random_seed)
datasets = {}
#datasets[&#39;train&#39;] = Subset(train_dataset, train_idx)
#datasets[&#39;valid&#39;] = Subset(train_dataset, valid_idx)
datasets[&#39;test&#39;]  = test_dataset

dataloaders, batch_num = {}, {}
num_workers = 6 # half of cpu core number
#dataloaders[&#39;train&#39;] = DataLoader(datasets[&#39;train&#39;], batch_size=batch_size, shuffle=True, num_workers=num_workers)
#dataloaders[&#39;valid&#39;] = DataLoader(datasets[&#39;valid&#39;],batch_size=batch_size, shuffle=True, num_workers=num_workers)
dataloaders[&#39;test&#39;]  = DataLoader(datasets[&#39;test&#39;], batch_size=batch_size, shuffle=True, num_workers=num_workers)
#batch_num[&#39;train&#39;], batch_num[&#39;valid&#39;], batch_num[&#39;test&#39;] = len(dataloaders[&#39;train&#39;]), len(dataloaders[&#39;valid&#39;]), len(dataloaders[&#39;test&#39;])
batch_num[&#39;test&#39;] = len(dataloaders[&#39;test&#39;])

答案1

得分: 1

当使用ADAM时，我们应该保存优化器状态以及模型状态。Adam是一种自适应学习率方法，这意味着它会为各种参数计算个别的学习率。

你可以尝试使用以下代码：

## 这是文件 1 ##
model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr=0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)

# torch.save(model.state_dict(), myresnet.pt)

torch.save({
  'epoch': epochs,
  'model_state_dict': model.state_dict(),
  'optimizer_state_dict': optimizer.state_dict(),
  'loss': loss,
}, myresnet.pt)

# 使用以下方法加载
checkpoint = torch.load(myresnet.pt)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# 如果要继续训练，可以取消下面一行的注释
# model.train()
loss, acc, y_pred, y_true = test_model(model, criterion)

以下是这个教程的基础，以及一个解释为什么特别在使用ADAM优化器时应该保存优化器状态的答案：

教程链接：https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-a-general-checkpoint-for-inference-and-or-resuming-training

答案链接：https://stackoverflow.com/a/70770694/15327033

英文:

>We should save the optimizer state along with model state when using ADAM. Adam is an adaptive learning rate method, which means it computes individual learning rates for various parameters.

Can you try the code using this ??

## This is file 1 ##
model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr = 0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)

# torch.save(model.state_dict(), myresnet.pt&#39;)

torch.save({
  &#39;epoch&#39;: epochs,
  &#39;model_state_dict&#39;: model.state_dict(),
  &#39;optimizer_state_dict&#39;: optimizer.state_dict(),
  &#39;loss&#39;: loss,
}, myresnet.pt)

# Load using this
checkpoint = torch.load(myresnet.pt)
model.load_state_dict(checkpoint[&#39;model_state_dict&#39;])
optimizer.load_state_dict(checkpoint[&#39;optimizer_state_dict&#39;])
epoch = checkpoint[&#39;epoch&#39;]
loss = checkpoint[&#39;loss&#39;]

model.eval()
# model.train() # to continue training
loss, acc, y_pred, y_true = test_model(model, criterion)

Here's the tutorial which is the basis for this and here's an answer which is explaining why we should save optimizer state especially when using ADAM optimiser.

https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-a-general-checkpoint-for-inference-and-or-resuming-training

https://stackoverflow.com/a/70770694/15327033

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Torch loaded model accuracy is downed.

问题

这是文件 1

这是文件 2

答案1

如何使用预训练的编码器来自定义Unet

Feature extraction process using too much memory and causing a crash. What can I do?

为什么PyTorch比TensorFlow慢得多？两者都在运行CUDA。

Loss function giving nan in pytorch

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论