mixup论文传送门
论文中对比了 α \alpha α和 λ \lambda λ在cifar10和cifar100下不同模型的表现,有兴趣的童鞋可以详细看看。作为模型嵌入来说,直接加入以下代码即可实现数据增强mixup手段。
def mixup_data(x, y, alpha=1.0, use_cuda=True):
'''Compute the mixup data. Return mixed inputs, pairs of targets, and lambda'''
if alpha > 0.:
lam = np.random.beta(alpha, alpha)
else:
lam = 1.
batch_size = x.size()[0]
if use_cuda:
index = torch.randperm(batch_size).cuda()
else:
index = torch.randperm(batch_size)
mixed_x = lam * x + (1 - lam) * x[index,:]
y_a, y_b = y, y[index]
return mixed_x, y_a, y_b, lam
def mixup_criterion(y_a, y_b, lam):
return lambda criterion, pred: lam * criterion(pred, y_a) + (1 - lam) * criterion(pred, y_b)
def train(trainloader, model, criterion, optimizer, epoch, args):
# switch to train mode
model.train()
batch_time = AverageMeter()
data_time = AverageMeter()
losses = AverageMeter()
top1 = AverageMeter()
top5 = AverageMeter()
end = time.time()
# bar = Bar('Processing', max=len(trainloader))
for batch_idx, (inputs, targets) in enumerate(trainloader):
# measure data loading time
data_time.update(time.time() - end)
# print(inputs.size())
if args.use_cuda:
# inputs, targets = inputs.cuda(), targets.cuda(async=True)
inputs, targets = inputs.cuda(), targets.cuda()
# inputs, targets = torch.autograd.Variable(
# inputs, requires_grad=False), torch.autograd.Variable(
# targets, requires_grad=False)
if args.is_mixup:
# generate mixed inputs, two one-hot label vectors and mixing coefficient
# targets_a is origin_inputs label, argets_b is shuffled origin_inputs label
inputs, targets_a, targets_b, lam = mixup_data(inputs, targets, args.alpha, args.use_cuda)
# optimizer.zero_grad()
outputs = model(inputs)
loss_func = mixup_criterion(targets_a, targets_b, lam)
loss = loss_func(criterion, outputs)
只有在训练时输入的样本是两个随机样本的混合,计算损失为两个样本一起计算。有不同的权重值。
我的理解是两个样本的随机混合,效果是能够产生更多数据分布的样本。过拟合另外一方面是测试的样本,训练时没有学习到。那如果在训练期间就尽可能多的产生数据分布的样本,那不就能一定程度降低过拟合了。
进行模型测试为正常样本输入,不进行混合。