Pytorch:lr_schedule恢复训练的注意事项

  在训练过程中我们一般会使用pytorch已有的学习率调整策略,如:

import torch
import torch.optim as optim
from torchvision.models.resnet import resnet50
net = resnet50(num_classes=1000)
optimizer = optim.Adam(net.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [20, 30, 40, 50], 0.1)
for epoch in range(num_epoch):
    scheduler.step()
    train()
    valid()
    ...

  有时候会因为不可抗拒的外界因素导致训练被中断,在pytorch中恢复训练的方法就是把最近保存的模型重新加载,然后重新训练即可。假设我们从epoch10开始恢复训练,可以利用lr_scheduler的一个参数:

scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [20, 30, 40, 50], 0.1, last_epoch=10)

  这样就不需要手动地去改[20, 30, 40, 50]->[10, 20, 30, 40] 。需要注意的是在optimizer定义参数组的时候需要加入’initial_lr’,不然会报错:

"param 'initial_lr' is not specified in param_groups[*] when resuming an optimizer"

举个粟子:

import torch
import torch.optim as optim
from torchvision.models.resnet import resnet50
net = torch.load('resnet50_epoch10.pth')
optimizer = optim.Adam([{'params': net.parameters(), 'initial_lr': 1e-3}], lr=1e-3)
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, [20, 30, 40, 50], 0.1, last_epoch=10)
for epoch in range(11, num_epoch):
    scheduler.step()
    train()
    valid()
    ...

猜你喜欢

转载自blog.csdn.net/guls999/article/details/85695409