前言
大模型大行其道,但是当实际落地时,需要考虑硬件和运行功耗,因此企业更希望部署的是“小”模型。因此学习一些蒸馏技术就成为一些算法工程师必备的技能点。
_
MGD
论文: Masked Generative Distillation
代码: https://github.com/yzd-v/MGD
Dist
Knowledge Distillation from A Stronger Teacher
代码: https://github.com/hunto/DIST_KD
伪代码
import torch.nn as nn
def cosine_similarity(a, b, eps=1e-8):
return (a * b).sum(1) / (a.norm(dim=1) * b.norm(dim=1) + eps)
def pearson_correlation(a, b, eps=1e-8):
return cosine_similarity(a - a.mean(1).unsqueeze(1), b - b.mean(1).unsqueeze(1), eps)
def inter_class_relation(y_s, y_t):
return 1 - pearson_correlation(y_s, y_t).mean()
def intra_class_relation(y_s, y_t):
return inter_class_relation(y_s.transpose(0, 1), y_t.transpose(0, 1))
class DIST(nn.Module):
def __init__(self, beta, gamma):
super(DIST, self).__init__()
self.beta = beta
self.gamma = gamma
def forward(self, z_s, z_t):
y_s = z_s.softmax(dim=1)
y_t = z_t.softmax(dim=1)
inter_loss = inter_class_relation(y_s, y_t)
intra_loss = intra_class_relation(y_s, y_t)
kd_loss = self.beta * inter_loss + self.gamma * intra_loss
return kd_loss
Teacher-student
论文: Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
代码: https://github.com/yyliu01/PS-MT
博文: CVPR 2022 | PS-MT:半监督语义分割需要更稳定的一致性训练!
蒸馏骨干
TinyViT
论文: TinyViT: Fast Pretraining Distillation for Small Vision Transformers
代码: https://github.com/microsoft/Cream/tree/main/TinyViT
博文: ECCV22|只能11%的参数就能优于Swin,微软提出快速预训练蒸馏方法TinyViT
半监督
DTG-SSOD
22.07
论文 DTG-SSOD: Dense Teacher Guidance for Semi-Supervised Object Detection
博文: DTG-SSOD:最新半监督检测框架,Dense Teacher
数据蒸馏
R2L
2022 ECCV
论文: R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis
博文: ECCV 2022|Snap&东北大学提出R2L:用数据蒸馏加速NeRF
代码: https://github.com/snap-research/R2L