Torch loaded model accuracy is downed.

huangapple go评论62阅读模式
英文:

Torch loaded model accuracy is downed

问题

我已经训练了一个ResNet模型,并将其权重保存到一个.pt文件中,如下所示。

这是文件 1

model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr=0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)
torch.save(model.state_dict(), 'myresnet.pt')

model.eval()
loss, acc, y_pred, y_true = test_model(model, criterion)

我训练了一个模型,并获得了95%的验证准确性。

然后,我在一个单独的测试集上测试了模型,在那里获得了93%的准确性。

在这些步骤之后,我关闭了我的代码文件。

后来,我创建了一个新的空白代码脚本,并加载了模型的保存权重(.pt文件)以供进一步使用。

这是文件 2

model = models.resnet50()
state_dict = torch.load('myresnet.pt')
model.load_state_dict(state_dict)
model.eval()
model.to(device)
loss, acc, y_pred, y_true = test_model(model, criterion)

问题

在加载.pt文件并仅使用测试集数据进行测试后,测试准确性严重下降到20.6%。

我的尝试

最初,我怀疑.pt文件已损坏,因此多次重新运行了我的代码,但情况没有改变。

我将文件2中的所有代码复制并附加到文件1中,结果得到了理想的准确性。

为什么会发生这种情况?这与数据加载器有关吗?

以下是我的数据加载器:

batch_size = 4
image_size = [32, 32]
random_seed = int(time.time() // 1000)
random.seed(random_seed)

def random_ratio_3d(): return [randrange(0, 100) / 100, randrange(0, 100) / 100, randrange(0, 100) / 100]
tmp_mean, tmp_std = random_ratio_3d(), random_ratio_3d()

data_test_path = 'data/test/'
test_dataset = ImageFolder(data_test_path, Compose([Resize(image_size), ToTensor(), Normalize(mean=tmp_mean, std=tmp_std)]))

datasets = {}
datasets['test'] = test_dataset

dataloaders, batch_num = {}, {}
num_workers = 6  # CPU核心数量的一半
dataloaders['test'] = DataLoader(datasets['test'], batch_size=batch_size, shuffle=True, num_workers=num_workers)
batch_num['test'] = len(dataloaders['test'])

请问你还需要其他帮助吗?

英文:

I have trained a ResNet model and saved its weights to a .pt file as shown below.

## This is file 1 ##
model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr = 0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)
torch.save(model.state_dict(), myresnet.pt')

model.eval()
loss, acc, y_pred, y_true = test_model(model, criterion)

I trained a model and achieved a validation accuracy of 95%.

Then, I tested the model on a separate test set, where it achieved an accuracy of 93%.

After these steps, I closed my code files.

Later, I created new empty code script and loaded the saved weights of the model(.pt file) for further use

## This is file 2 ##
model = models.resnet50()
state_dict = torch.load('myresnet.pt')
model.load_state_dict(state_dict)
model.eval()
model.to(device)
loss, acc, y_pred, y_true = test_model(model, criterion)

Problem

After loading the .pt file and testing with test set data only, the test accuracy seriously decreased to 20.6%

My try

Initially, I suspected that the .pt file was corrupted, so I reran my code multiple times, but the situation remained unchanged.

I copied all the code from file 2 and appended it to file 1, which resulted in the desirable accuracy.

why happend this? this is something to do with dataloader?

the below is my dataloader

batch_size  = 4
image_size = [32, 32]
random_seed = int(time.time()//1000)
random.seed(random_seed)

def random_ratio_3d(): return [randrange(0, 100)/100, randrange(0, 100)/100, randrange(0, 100)/100]
tmp_mean, tmp_std = random_ratio_3d(), random_ratio_3d()

#data_train_path = 'data/train/'
data_test_path = 'data/test/'
#train_dataset = ImageFolder(data_train_path, Compose([Resize(image_size), ToTensor(), Normalize(mean=tmp_mean, std=tmp_std)]))
test_dataset = ImageFolder(data_test_path, Compose([Resize(image_size), ToTensor(), Normalize(mean=tmp_mean, std=tmp_std)]))

#train_idx, valid_idx = train_test_split(list(range(len(train_dataset))), test_size=0.2, random_state=random_seed)
datasets = {}
#datasets['train'] = Subset(train_dataset, train_idx)
#datasets['valid'] = Subset(train_dataset, valid_idx)
datasets['test']  = test_dataset

dataloaders, batch_num = {}, {}
num_workers = 6 # half of cpu core number
#dataloaders['train'] = DataLoader(datasets['train'], batch_size=batch_size, shuffle=True, num_workers=num_workers)
#dataloaders['valid'] = DataLoader(datasets['valid'],batch_size=batch_size, shuffle=True, num_workers=num_workers)
dataloaders['test']  = DataLoader(datasets['test'], batch_size=batch_size, shuffle=True, num_workers=num_workers)
#batch_num['train'], batch_num['valid'], batch_num['test'] = len(dataloaders['train']), len(dataloaders['valid']), len(dataloaders['test'])
batch_num['test'] = len(dataloaders['test'])

答案1

得分: 1

当使用ADAM时,我们应该保存优化器状态以及模型状态。Adam是一种自适应学习率方法,这意味着它会为各种参数计算个别的学习率。

你可以尝试使用以下代码:

## 这是文件 1 ##
model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr=0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)

# torch.save(model.state_dict(), myresnet.pt)

torch.save({
  'epoch': epochs,
  'model_state_dict': model.state_dict(),
  'optimizer_state_dict': optimizer.state_dict(),
  'loss': loss,
}, myresnet.pt)

# 使用以下方法加载
checkpoint = torch.load(myresnet.pt)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# 如果要继续训练,可以取消下面一行的注释
# model.train()
loss, acc, y_pred, y_true = test_model(model, criterion)

以下是这个教程的基础,以及一个解释为什么特别在使用ADAM优化器时应该保存优化器状态的答案:

教程链接:https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-a-general-checkpoint-for-inference-and-or-resuming-training

答案链接:https://stackoverflow.com/a/70770694/15327033

英文:

>We should save the optimizer state along with model state when using ADAM. Adam is an adaptive learning rate method, which means it computes individual learning rates for various parameters.

Can you try the code using this ??

## This is file 1 ##
model = resnet50()
model.to(device)
optimizer = Adam(model.parameters(), eps=1e-08, lr = 0.001, weight_decay=1e-4, betas=(0.9, 0.999))
criterion = nn.CrossEntropyLoss()
scheduler = lr_scheduler.MultiplicativeLR(optimizer, lr_lambda=lmbda)

model.train()
train_model(model, criterion, optimizer, scheduler, num_epochs=num_epochs)

# torch.save(model.state_dict(), myresnet.pt')

torch.save({
  'epoch': epochs,
  'model_state_dict': model.state_dict(),
  'optimizer_state_dict': optimizer.state_dict(),
  'loss': loss,
}, myresnet.pt)

# Load using this
checkpoint = torch.load(myresnet.pt)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']

model.eval()
# model.train() # to continue training
loss, acc, y_pred, y_true = test_model(model, criterion)

Here's the tutorial which is the basis for this and here's an answer which is explaining why we should save optimizer state especially when using ADAM optimiser.

https://pytorch.org/tutorials/beginner/saving_loading_models.html#saving-loading-a-general-checkpoint-for-inference-and-or-resuming-training

https://stackoverflow.com/a/70770694/15327033

huangapple
  • 本文由 发表于 2023年7月3日 16:38:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603131.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定