【论文解读】One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers - 代码天地

【论文解读】One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

其他 2023-04-08 08:31:50 阅读次数: 0

这是一篇关于NLP领域多模型知识蒸馏的文章，整体思路比较清晰，介绍了一种多模型蒸馏的方法。

1. 简介

论文题目：One Teacher is Enough?
Pre-trained Language Model Distillation from Multiple Teachers
地址：https://arxiv.org/pdf/2106.01023.pdf

2. motivation & abstract

单个教师模型指导学生模型训练时，教师模型的结果如果有偏，则容易导致最终模型结果有偏，最终学生模型的精度较低。

因此本文提出了一种多教师模型的知识蒸馏方法（co-finetune）。在这里引入shared pooling和prediction layer去对齐输出空间，从而保证更好地蒸馏。此外，对蒸馏的损失函数进行改进，提出multi-teacher hidden loss和multi-teacher distillation loss去同时利用教师模型的中间层以及输出层信息，最终在3个benchamark dataset上获取了最优性能。

注意：本文属于task-specific knowledge distillation。

3. MT-BERT

3.1 Multi-Teacher Co-Finetuning

不同的教师模型是基于不同的超参数进行训练的，因此他们单

猜你喜欢

转载自blog.csdn.net/u012526003/article/details/125258727

【论文解读】One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Enriching Pre-trained Language Model with Entity Information for Relation Classification 论文研读

X2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks论文笔记

论文阅读 | Pre-trained Models for Natural Language Processing: A Survey

【知识蒸馏】 Knowledge Distillation from A Stronger Teacher

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

[文献阅读]——AMBERT: A PRE-TRAINED LANGUAGE MODEL WITH MULTI-GRAINED TOKENIZATION

Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in GEC翻译

论文阅读 | ACL2019 Exploring Pre-trained Language Models for Event Extraction and Generation

【论文笔记】Enhancing Pre-Trained Language Representations with Rich Knowledge for MRC

【论文笔记】MacBert：Revisiting Pre-trained Models for Chinese Natural Language Processing

LLMs：《GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL》翻译与解读

Student Helping Teacher: Teacher Evolution via Self-Knowledge Distillation论文解读

论文阅读：Pre-trained Models for Natural Language Processing: A Survey 综述：自然语言处理的预训练模型

Lion:Adversarial Distillation of Closed-Source Large Language Model

论文讲解：Knowledge distillation: A good teacher is patient and consistent

Pre-trained Models for Natural Language Processing: A Survey

Using pre-trained word embeddings in a Keras model

论文阅读9-Fine-tuning Pre-Trained Transformer Language Models to(远程监督关系抽取,ACL2019,GPT,长尾关系,DISTRE）

论文阅读总结：UniLM(Unified Language Model Pre-training for Natural Language Understanding and Generation)

论文笔记 --《Unified Language Model Pre-training for Natural Language Understanding a

【多模态论文解读】Align before Fuse: Vision and Language Representation Learning with Momentum Distillation

知识蒸馏（Distillation）相关论文阅读（2）——Cross Model Distillation for Supervision Transfer

MODEL COMPRESSION VIA DISTILLATION AND QUANTIZATION 论文笔记

Private Model Compression via Knowledge Distillation 论文笔记

END-TO-END NAMED ENTITY RECOGNITION AND RELATION EXTRACTION USING PRE-TRAINED LANGUAGE MODELS

Making Pre-trained Language Models Better Few-Shot Learners

ZSSeg: A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language

机器学习：self supervised learning- Recent Advances in pre-trained language models

【计算机视觉】Vision and Language Pre-Trained Models算法介绍合集（三）

今日推荐

周排行

四大线程池详解

如何高效使用Vim

Mogodb的常用操作总结

Spyder默认页面布局调整

SAR日志分析

OAuth是一个关于授权（authorization）的开放网络标准，在全世界得到广泛应用，目前的版本是2.0版。本文对OAuth 2.0的设计思路和运行流程，做一个简明通俗的解释，主要参考材料为R

WebService中注解开发，CXF，Spring整合，Rest风格

2019考研英语一 Text1分析

windows下安装docker详细步骤

CentOS 7/6系统升级内核版本到5.2.2

每日归档

更多

2024-08-05(0)

2024-08-04(0)

2024-08-03(0)

2024-08-02(0)

2024-08-01(0)

2024-07-31(0)

2024-07-30(0)

2024-07-29(0)

2024-07-28(0)

2024-07-27(0)