huggingface中BertModel和BertForPreTraining的区别

1.BertModel模型的最后一层输出是原始的hidden_state,没有使用任何特定的head。

与BertModel相比,BertForPreTraining多了两个分类头,一个是在Bert模型预训练期间的与[CLS]相关的分类头,另一个是预训练期间预测两个句子关系的分类头。

我们分别打印出来BertModel和BertForPreTraining模型每一层的参数名字就可以看出来,如下所示:

1.将huggingface官网的bert-base-uncased代码下载到本地

 2.初始化BertModel和BertForPreTraining模型,打印每个模型的层信息

config = BertConfig.from_pretrained('E:/研究生学习/我的项目/CLMLF-main/bert-base-uncased')
model = BertForPreTraining.from_pretrained('E:/研究生学习/我的项目/CLMLF-main/bert-base-uncased', config=config)
model = BertModel.from_pretrained('E:/研究生学习/我的项目/CLMLF-main/bert-base-uncased')

for name,param in model1.named_parameters():
    print(name)
for name,param in model2.named_parameters():
    print(name)

3.输出结果如下所示

(1).BertForPreTrainging输出结果

bert.embeddings.word_embeddings.weight
bert.embeddings.position_embeddings.weight
bert.embeddings.token_type_embeddings.weight
bert.embeddings.LayerNorm.weight
bert.embeddings.LayerNorm.bias
bert.encoder.layer.0.attention.self.query.weight
bert.encoder.layer.0.attention.self.query.bias
bert.encoder.layer.0.attention.self.key.weight
bert.encoder.layer.0.attention.self.key.bias
bert.encoder.layer.0.attention.self.value.weight
bert.encoder.layer.0.attention.self.value.bias
bert.encoder.layer.0.attention.output.dense.weight
bert.encoder.layer.0.attention.output.dense.bias
bert.encoder.layer.0.attention.output.LayerNorm.weight
bert.encoder.layer.0.attention.output.LayerNorm.bias
bert.encoder.layer.0.intermediate.dense.weight
bert.encoder.layer.0.intermediate.dense.bias
bert.encoder.layer.0.output.dense.weight
bert.encoder.layer.0.output.dense.bias
bert.encoder.layer.0.output.LayerNorm.weight
bert.encoder.layer.0.output.LayerNorm.bias
bert.encoder.layer.1.attention.self.query.weight
bert.encoder.layer.1.attention.self.query.bias
bert.encoder.layer.1.attention.self.key.weight
bert.encoder.layer.1.attention.self.key.bias
bert.encoder.layer.1.attention.self.value.weight
bert.encoder.layer.1.attention.self.value.bias
bert.encoder.layer.1.attention.output.dense.weight
bert.encoder.layer.1.attention.output.dense.bias
bert.encoder.layer.1.attention.output.LayerNorm.weight
bert.encoder.layer.1.attention.output.LayerNorm.bias
bert.encoder.layer.1.intermediate.dense.weight
bert.encoder.layer.1.intermediate.dense.bias
bert.encoder.layer.1.output.dense.weight
bert.encoder.layer.1.output.dense.bias
bert.encoder.layer.1.output.LayerNorm.weight
bert.encoder.layer.1.output.LayerNorm.bias
bert.encoder.layer.2.attention.self.query.weight
bert.encoder.layer.2.attention.self.query.bias
bert.encoder.layer.2.attention.self.key.weight
bert.encoder.layer.2.attention.self.key.bias
bert.encoder.layer.2.attention.self.value.weight
bert.encoder.layer.2.attention.self.value.bias
bert.encoder.layer.2.attention.output.dense.weight
bert.encoder.layer.2.attention.output.dense.bias
bert.encoder.layer.2.attention.output.LayerNorm.weight
bert.encoder.layer.2.attention.output.LayerNorm.bias
bert.encoder.layer.2.intermediate.dense.weight
bert.encoder.layer.2.intermediate.dense.bias
bert.encoder.layer.2.output.dense.weight
bert.encoder.layer.2.output.dense.bias
bert.encoder.layer.2.output.LayerNorm.weight
bert.encoder.layer.2.output.LayerNorm.bias
bert.encoder.layer.3.attention.self.query.weight
bert.encoder.layer.3.attention.self.query.bias
bert.encoder.layer.3.attention.self.key.weight
bert.encoder.layer.3.attention.self.key.bias
bert.encoder.layer.3.attention.self.value.weight
bert.encoder.layer.3.attention.self.value.bias
bert.encoder.layer.3.attention.output.dense.weight
bert.encoder.layer.3.attention.output.dense.bias
bert.encoder.layer.3.attention.output.LayerNorm.weight
bert.encoder.layer.3.attention.output.LayerNorm.bias
bert.encoder.layer.3.intermediate.dense.weight
bert.encoder.layer.3.intermediate.dense.bias
bert.encoder.layer.3.output.dense.weight
bert.encoder.layer.3.output.dense.bias
bert.encoder.layer.3.output.LayerNorm.weight
bert.encoder.layer.3.output.LayerNorm.bias
bert.encoder.layer.4.attention.self.query.weight
bert.encoder.layer.4.attention.self.query.bias
bert.encoder.layer.4.attention.self.key.weight
bert.encoder.layer.4.attention.self.key.bias
bert.encoder.layer.4.attention.self.value.weight
bert.encoder.layer.4.attention.self.value.bias
bert.encoder.layer.4.attention.output.dense.weight
bert.encoder.layer.4.attention.output.dense.bias
bert.encoder.layer.4.attention.output.LayerNorm.weight
bert.encoder.layer.4.attention.output.LayerNorm.bias
bert.encoder.layer.4.intermediate.dense.weight
bert.encoder.layer.4.intermediate.dense.bias
bert.encoder.layer.4.output.dense.weight
bert.encoder.layer.4.output.dense.bias
bert.encoder.layer.4.output.LayerNorm.weight
bert.encoder.layer.4.output.LayerNorm.bias
bert.encoder.layer.5.attention.self.query.weight
bert.encoder.layer.5.attention.self.query.bias
bert.encoder.layer.5.attention.self.key.weight
bert.encoder.layer.5.attention.self.key.bias
bert.encoder.layer.5.attention.self.value.weight
bert.encoder.layer.5.attention.self.value.bias
bert.encoder.layer.5.attention.output.dense.weight
bert.encoder.layer.5.attention.output.dense.bias
bert.encoder.layer.5.attention.output.LayerNorm.weight
bert.encoder.layer.5.attention.output.LayerNorm.bias
bert.encoder.layer.5.intermediate.dense.weight
bert.encoder.layer.5.intermediate.dense.bias
bert.encoder.layer.5.output.dense.weight
bert.encoder.layer.5.output.dense.bias
bert.encoder.layer.5.output.LayerNorm.weight
bert.encoder.layer.5.output.LayerNorm.bias
bert.encoder.layer.6.attention.self.query.weight
bert.encoder.layer.6.attention.self.query.bias
bert.encoder.layer.6.attention.self.key.weight
bert.encoder.layer.6.attention.self.key.bias
bert.encoder.layer.6.attention.self.value.weight
bert.encoder.layer.6.attention.self.value.bias
bert.encoder.layer.6.attention.output.dense.weight
bert.encoder.layer.6.attention.output.dense.bias
bert.encoder.layer.6.attention.output.LayerNorm.weight
bert.encoder.layer.6.attention.output.LayerNorm.bias
bert.encoder.layer.6.intermediate.dense.weight
bert.encoder.layer.6.intermediate.dense.bias
bert.encoder.layer.6.output.dense.weight
bert.encoder.layer.6.output.dense.bias
bert.encoder.layer.6.output.LayerNorm.weight
bert.encoder.layer.6.output.LayerNorm.bias
bert.encoder.layer.7.attention.self.query.weight
bert.encoder.layer.7.attention.self.query.bias
bert.encoder.layer.7.attention.self.key.weight
bert.encoder.layer.7.attention.self.key.bias
bert.encoder.layer.7.attention.self.value.weight
bert.encoder.layer.7.attention.self.value.bias
bert.encoder.layer.7.attention.output.dense.weight
bert.encoder.layer.7.attention.output.dense.bias
bert.encoder.layer.7.attention.output.LayerNorm.weight
bert.encoder.layer.7.attention.output.LayerNorm.bias
bert.encoder.layer.7.intermediate.dense.weight
bert.encoder.layer.7.intermediate.dense.bias
bert.encoder.layer.7.output.dense.weight
bert.encoder.layer.7.output.dense.bias
bert.encoder.layer.7.output.LayerNorm.weight
bert.encoder.layer.7.output.LayerNorm.bias
bert.encoder.layer.8.attention.self.query.weight
bert.encoder.layer.8.attention.self.query.bias
bert.encoder.layer.8.attention.self.key.weight
bert.encoder.layer.8.attention.self.key.bias
bert.encoder.layer.8.attention.self.value.weight
bert.encoder.layer.8.attention.self.value.bias
bert.encoder.layer.8.attention.output.dense.weight
bert.encoder.layer.8.attention.output.dense.bias
bert.encoder.layer.8.attention.output.LayerNorm.weight
bert.encoder.layer.8.attention.output.LayerNorm.bias
bert.encoder.layer.8.intermediate.dense.weight
bert.encoder.layer.8.intermediate.dense.bias
bert.encoder.layer.8.output.dense.weight
bert.encoder.layer.8.output.dense.bias
bert.encoder.layer.8.output.LayerNorm.weight
bert.encoder.layer.8.output.LayerNorm.bias
bert.encoder.layer.9.attention.self.query.weight
bert.encoder.layer.9.attention.self.query.bias
bert.encoder.layer.9.attention.self.key.weight
bert.encoder.layer.9.attention.self.key.bias
bert.encoder.layer.9.attention.self.value.weight
bert.encoder.layer.9.attention.self.value.bias
bert.encoder.layer.9.attention.output.dense.weight
bert.encoder.layer.9.attention.output.dense.bias
bert.encoder.layer.9.attention.output.LayerNorm.weight
bert.encoder.layer.9.attention.output.LayerNorm.bias
bert.encoder.layer.9.intermediate.dense.weight
bert.encoder.layer.9.intermediate.dense.bias
bert.encoder.layer.9.output.dense.weight
bert.encoder.layer.9.output.dense.bias
bert.encoder.layer.9.output.LayerNorm.weight
bert.encoder.layer.9.output.LayerNorm.bias
bert.encoder.layer.10.attention.self.query.weight
bert.encoder.layer.10.attention.self.query.bias
bert.encoder.layer.10.attention.self.key.weight
bert.encoder.layer.10.attention.self.key.bias
bert.encoder.layer.10.attention.self.value.weight
bert.encoder.layer.10.attention.self.value.bias
bert.encoder.layer.10.attention.output.dense.weight
bert.encoder.layer.10.attention.output.dense.bias
bert.encoder.layer.10.attention.output.LayerNorm.weight
bert.encoder.layer.10.attention.output.LayerNorm.bias
bert.encoder.layer.10.intermediate.dense.weight
bert.encoder.layer.10.intermediate.dense.bias
bert.encoder.layer.10.output.dense.weight
bert.encoder.layer.10.output.dense.bias
bert.encoder.layer.10.output.LayerNorm.weight
bert.encoder.layer.10.output.LayerNorm.bias
bert.encoder.layer.11.attention.self.query.weight
bert.encoder.layer.11.attention.self.query.bias
bert.encoder.layer.11.attention.self.key.weight
bert.encoder.layer.11.attention.self.key.bias
bert.encoder.layer.11.attention.self.value.weight
bert.encoder.layer.11.attention.self.value.bias
bert.encoder.layer.11.attention.output.dense.weight
bert.encoder.layer.11.attention.output.dense.bias
bert.encoder.layer.11.attention.output.LayerNorm.weight
bert.encoder.layer.11.attention.output.LayerNorm.bias
bert.encoder.layer.11.intermediate.dense.weight
bert.encoder.layer.11.intermediate.dense.bias
bert.encoder.layer.11.output.dense.weight
bert.encoder.layer.11.output.dense.bias
bert.encoder.layer.11.output.LayerNorm.weight
bert.encoder.layer.11.output.LayerNorm.bias
bert.pooler.dense.weight
bert.pooler.dense.bias
cls.predictions.bias
cls.predictions.transform.dense.weight
cls.predictions.transform.dense.bias
cls.predictions.transform.LayerNorm.weight
cls.predictions.transform.LayerNorm.bias
cls.seq_relationship.weight

cls.seq_relationship.bias

 (2).BertModel输出结果

embeddings.word_embeddings.weight
embeddings.position_embeddings.weight
embeddings.token_type_embeddings.weight
embeddings.LayerNorm.weight
embeddings.LayerNorm.bias
encoder.layer.0.attention.self.query.weight
encoder.layer.0.attention.self.query.bias
encoder.layer.0.attention.self.key.weight
encoder.layer.0.attention.self.key.bias
encoder.layer.0.attention.self.value.weight
encoder.layer.0.attention.self.value.bias
encoder.layer.0.attention.output.dense.weight
encoder.layer.0.attention.output.dense.bias
encoder.layer.0.attention.output.LayerNorm.weight
encoder.layer.0.attention.output.LayerNorm.bias
encoder.layer.0.intermediate.dense.weight
encoder.layer.0.intermediate.dense.bias
encoder.layer.0.output.dense.weight
encoder.layer.0.output.dense.bias
encoder.layer.0.output.LayerNorm.weight
encoder.layer.0.output.LayerNorm.bias
encoder.layer.1.attention.self.query.weight
encoder.layer.1.attention.self.query.bias
encoder.layer.1.attention.self.key.weight
encoder.layer.1.attention.self.key.bias
encoder.layer.1.attention.self.value.weight
encoder.layer.1.attention.self.value.bias
encoder.layer.1.attention.output.dense.weight
encoder.layer.1.attention.output.dense.bias
encoder.layer.1.attention.output.LayerNorm.weight
encoder.layer.1.attention.output.LayerNorm.bias
encoder.layer.1.intermediate.dense.weight
encoder.layer.1.intermediate.dense.bias
encoder.layer.1.output.dense.weight
encoder.layer.1.output.dense.bias
encoder.layer.1.output.LayerNorm.weight
encoder.layer.1.output.LayerNorm.bias
encoder.layer.2.attention.self.query.weight
encoder.layer.2.attention.self.query.bias
encoder.layer.2.attention.self.key.weight
encoder.layer.2.attention.self.key.bias
encoder.layer.2.attention.self.value.weight
encoder.layer.2.attention.self.value.bias
encoder.layer.2.attention.output.dense.weight
encoder.layer.2.attention.output.dense.bias
encoder.layer.2.attention.output.LayerNorm.weight
encoder.layer.2.attention.output.LayerNorm.bias
encoder.layer.2.intermediate.dense.weight
encoder.layer.2.intermediate.dense.bias
encoder.layer.2.output.dense.weight
encoder.layer.2.output.dense.bias
encoder.layer.2.output.LayerNorm.weight
encoder.layer.2.output.LayerNorm.bias
encoder.layer.3.attention.self.query.weight
encoder.layer.3.attention.self.query.bias
encoder.layer.3.attention.self.key.weight
encoder.layer.3.attention.self.key.bias
encoder.layer.3.attention.self.value.weight
encoder.layer.3.attention.self.value.bias
encoder.layer.3.attention.output.dense.weight
encoder.layer.3.attention.output.dense.bias
encoder.layer.3.attention.output.LayerNorm.weight
encoder.layer.3.attention.output.LayerNorm.bias
encoder.layer.3.intermediate.dense.weight
encoder.layer.3.intermediate.dense.bias
encoder.layer.3.output.dense.weight
encoder.layer.3.output.dense.bias
encoder.layer.3.output.LayerNorm.weight
encoder.layer.3.output.LayerNorm.bias
encoder.layer.4.attention.self.query.weight
encoder.layer.4.attention.self.query.bias
encoder.layer.4.attention.self.key.weight
encoder.layer.4.attention.self.key.bias
encoder.layer.4.attention.self.value.weight
encoder.layer.4.attention.self.value.bias
encoder.layer.4.attention.output.dense.weight
encoder.layer.4.attention.output.dense.bias
encoder.layer.4.attention.output.LayerNorm.weight
encoder.layer.4.attention.output.LayerNorm.bias
encoder.layer.4.intermediate.dense.weight
encoder.layer.4.intermediate.dense.bias
encoder.layer.4.output.dense.weight
encoder.layer.4.output.dense.bias
encoder.layer.4.output.LayerNorm.weight
encoder.layer.4.output.LayerNorm.bias
encoder.layer.5.attention.self.query.weight
encoder.layer.5.attention.self.query.bias
encoder.layer.5.attention.self.key.weight
encoder.layer.5.attention.self.key.bias
encoder.layer.5.attention.self.value.weight
encoder.layer.5.attention.self.value.bias
encoder.layer.5.attention.output.dense.weight
encoder.layer.5.attention.output.dense.bias
encoder.layer.5.attention.output.LayerNorm.weight
encoder.layer.5.attention.output.LayerNorm.bias
encoder.layer.5.intermediate.dense.weight
encoder.layer.5.intermediate.dense.bias
encoder.layer.5.output.dense.weight
encoder.layer.5.output.dense.bias
encoder.layer.5.output.LayerNorm.weight
encoder.layer.5.output.LayerNorm.bias
encoder.layer.6.attention.self.query.weight
encoder.layer.6.attention.self.query.bias
encoder.layer.6.attention.self.key.weight
encoder.layer.6.attention.self.key.bias
encoder.layer.6.attention.self.value.weight
encoder.layer.6.attention.self.value.bias
encoder.layer.6.attention.output.dense.weight
encoder.layer.6.attention.output.dense.bias
encoder.layer.6.attention.output.LayerNorm.weight
encoder.layer.6.attention.output.LayerNorm.bias
encoder.layer.6.intermediate.dense.weight
encoder.layer.6.intermediate.dense.bias
encoder.layer.6.output.dense.weight
encoder.layer.6.output.dense.bias
encoder.layer.6.output.LayerNorm.weight
encoder.layer.6.output.LayerNorm.bias
encoder.layer.7.attention.self.query.weight
encoder.layer.7.attention.self.query.bias
encoder.layer.7.attention.self.key.weight
encoder.layer.7.attention.self.key.bias
encoder.layer.7.attention.self.value.weight
encoder.layer.7.attention.self.value.bias
encoder.layer.7.attention.output.dense.weight
encoder.layer.7.attention.output.dense.bias
encoder.layer.7.attention.output.LayerNorm.weight
encoder.layer.7.attention.output.LayerNorm.bias
encoder.layer.7.intermediate.dense.weight
encoder.layer.7.intermediate.dense.bias
encoder.layer.7.output.dense.weight
encoder.layer.7.output.dense.bias
encoder.layer.7.output.LayerNorm.weight
encoder.layer.7.output.LayerNorm.bias
encoder.layer.8.attention.self.query.weight
encoder.layer.8.attention.self.query.bias
encoder.layer.8.attention.self.key.weight
encoder.layer.8.attention.self.key.bias
encoder.layer.8.attention.self.value.weight
encoder.layer.8.attention.self.value.bias
encoder.layer.8.attention.output.dense.weight
encoder.layer.8.attention.output.dense.bias
encoder.layer.8.attention.output.LayerNorm.weight
encoder.layer.8.attention.output.LayerNorm.bias
encoder.layer.8.intermediate.dense.weight
encoder.layer.8.intermediate.dense.bias
encoder.layer.8.output.dense.weight
encoder.layer.8.output.dense.bias
encoder.layer.8.output.LayerNorm.weight
encoder.layer.8.output.LayerNorm.bias
encoder.layer.9.attention.self.query.weight
encoder.layer.9.attention.self.query.bias
encoder.layer.9.attention.self.key.weight
encoder.layer.9.attention.self.key.bias
encoder.layer.9.attention.self.value.weight
encoder.layer.9.attention.self.value.bias
encoder.layer.9.attention.output.dense.weight
encoder.layer.9.attention.output.dense.bias
encoder.layer.9.attention.output.LayerNorm.weight
encoder.layer.9.attention.output.LayerNorm.bias
encoder.layer.9.intermediate.dense.weight
encoder.layer.9.intermediate.dense.bias
encoder.layer.9.output.dense.weight
encoder.layer.9.output.dense.bias
encoder.layer.9.output.LayerNorm.weight
encoder.layer.9.output.LayerNorm.bias
encoder.layer.10.attention.self.query.weight
encoder.layer.10.attention.self.query.bias
encoder.layer.10.attention.self.key.weight
encoder.layer.10.attention.self.key.bias
encoder.layer.10.attention.self.value.weight
encoder.layer.10.attention.self.value.bias
encoder.layer.10.attention.output.dense.weight
encoder.layer.10.attention.output.dense.bias
encoder.layer.10.attention.output.LayerNorm.weight
encoder.layer.10.attention.output.LayerNorm.bias
encoder.layer.10.intermediate.dense.weight
encoder.layer.10.intermediate.dense.bias
encoder.layer.10.output.dense.weight
encoder.layer.10.output.dense.bias
encoder.layer.10.output.LayerNorm.weight
encoder.layer.10.output.LayerNorm.bias
encoder.layer.11.attention.self.query.weight
encoder.layer.11.attention.self.query.bias
encoder.layer.11.attention.self.key.weight
encoder.layer.11.attention.self.key.bias
encoder.layer.11.attention.self.value.weight
encoder.layer.11.attention.self.value.bias
encoder.layer.11.attention.output.dense.weight
encoder.layer.11.attention.output.dense.bias
encoder.layer.11.attention.output.LayerNorm.weight
encoder.layer.11.attention.output.LayerNorm.bias
encoder.layer.11.intermediate.dense.weight
encoder.layer.11.intermediate.dense.bias
encoder.layer.11.output.dense.weight
encoder.layer.11.output.dense.bias
encoder.layer.11.output.LayerNorm.weight
encoder.layer.11.output.LayerNorm.bias
pooler.dense.weight
pooler.dense.bias

通过比较两个输出结果,可以发现BertForTraining比BertModel多了最后的几层, 即

cls.predictions.bias
cls.predictions.transform.dense.weight
cls.predictions.transform.dense.bias
cls.predictions.transform.LayerNorm.weight
cls.predictions.transform.LayerNorm.bias
cls.seq_relationship.weight

cls.seq_relationship.bias

 (3)由于BertForTraining的模型比BertModel的模型多了几层,我们可以使用以下的代码使得BertForTraining和BertModel的层数相同,即使用BertForTraining对象的bert内容,就可以只使用BertForTraining中的与Bert相关的层了,BertForTraining最后的几层是cls开头的,是与分类相关的层,我们不进行使用。

config = BertConfig.from_pretrained('E:/研究生学习/我的项目/CLMLF-main/bert-base-uncased')
model1 = BertForPreTraining.from_pretrained('E:/研究生学习/我的项目/CLMLF-main/bert-base-uncased', config=config)
model1 = model1.bert # 只使用BertForPreTraining中与bert相关的层
model2 = BertModel.from_pretrained('E:/研究生学习/我的项目/CLMLF-main/bert-base-uncased')
# output = model1(inputs['input_ids'],inputs['attention_mask'])
for name,param in model1.named_parameters():
    print(name)
for name,param in model2.named_parameters():
    print(name)

通过运行发现,二者输出的结果一致。

猜你喜欢

转载自blog.csdn.net/qq_43775680/article/details/127819926