预训练之后微调参数一致
今天训练之后发现了相应的问题,即预训练之后微调出现的参数一致。
设想问题1,后面的padding出现问题
inputs =
{'input_ids': tensor([[ 2, 136, 4, 149, 149, 38, 171, 4, 2062, 3, 16, 23,
148, 4, 8249, 3],
[ 2, 33, 3044, 130, 276, 33, 23, 68, 3, 130, 276, 33,
23, 215, 216, 3],
[ 2, 16, 624, 33, 1023, 129, 14, 129, 3, 33, 1753, 33,
265, 1940, 4, 3],
[ 2, 109, 104, 4, 4, 65, 47, 68, 20, 3, 641, 33,
65, 47, 68, 3],
[ 2, 441, 449, 14, 4, 973, 33, 4, 16, 3, 33, 443,
16, 10, 1100, 3],
[ 2, 1620, 133, 584, 355, 335, 4, 771, 3, 136, 137, 4,
335, 469, 771, 3],
[ 2, 1, 6652, 726, 2813, 811, 1903, 4, 3, 1709, 4, 350,
249, 1180, 6652, 3],
[ 2, 16, 14, 27, 129, 4, 3, 16, 4, 220, 4, 4,
9, 10, 591, 3],
[ 2, 1, 27, 43, 13, 772, 543, 3, 130, 27, 43, 13,
772, 543, 79, 3],
[ 2, 908, 33, 4, 443, 16, 15, 3, 33, 14, 443, 16,
15, 7, 495, 3]], device='cuda:0'), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0'), 'attention_mask': tensor([[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True],
[True, True, True, True, True, True, True, True, True, True, True, True,
True, True, True, True]], device='cuda:0'), 'labels': tensor([[-100, -100, 137, -100, -100, -100, -100, 33, -100, -100, -100, -100,
-100, 148, 123, -100],
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100],
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, 278, -100],
[-100, -100, -100, 105, 33, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100],
[-100, -100, -100, -100, 822, -100, -100, 2742, -100, -100, -100, -100,
-100, -100, -100, -100],
[-100, -100, -100, -100, 355, -100, 469, -100, -100, -100, -100, 355,
-100, -100, -100, -100],
[-100, -100, -100, -100, -100, -100, 1903, 297, -100, -100, 606, -100,
-100, -100, -100, -100],
[-100, -100, -100, -100, -100, 62, -100, -100, 5357, -100, 14, 27,
-100, -100, -100, -100],
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 43, -100,
-100, -100, -100, -100],
[-100, -100, -100, 14, -100, -100, -100, -100, -100, -100, -100, -100,
-100, -100, -100, -100]], device='cuda:0')}
之前我后面的padding填充的为0,而现在我后面的padding填充的值为-100,可能这里会对交叉熵的损失结果造成影响。
注意这里的padding为0与padding为-100有很大的区别,padding为-100的时候,相当于这个位置的概率不进行计算,而padding为0的时候,这个位置使用数值必须为标记0,如果不使用标记0的时候,会计算出相应的交叉熵损失并反向传播,这样预测的内容多了之后,模型就会偏向于预测0数值,最终模型会偏向于无论传入任何数值,模型都会去预测同一结果的参数