实践分享|探索书生·浦语2.5（InternLM2.5）：PyTorch模型微调的最佳实践

10月26日，北京站源创会，聊聊高性能计算与大模型推理

引言

2024年6月30日，上海人工智能实验室发布InternLM2.5-7B系列模型，包括InternLM2.5-7B、InternLM2.5-7B-Chat和InternLM2.5-7B-Chat-1M。具体介绍如下：

InternLM2.5-7B：在通用领域和领域增强语料进一步预训练，在评估中拥有良好的语言能力。
InternLM2.5-7B-Chat：通过监督微调（SFT）和在线RLHF在InternLM2.5的基础上进一步对齐。InternLM2.5-7B-Chat表现出更好的指令遵循、聊天体验和函数调用能力。
InternLM2.5-7B-Chat-1M：InternLM2.5-7B-Chat的1M长上下文版本。InternLM2.5-7B-Chat-1M支持百万字超长上下文推理，同时保持与InternLM2.5-7B-Chat相同的性能。

相比上一代模型，InternLM2.5具有三个突出亮点：

出色的推理能力：数学推理能力达到世界先进水平，超越Llama3、Gemma2-9B等模型。
1M上下文窗口：采用业界流行的"大海捞针"来评估模型的长文本信息召回能力，InternLM2.5在1M token范围内实现了几乎完美的大海捞针召回。在LongBench等长上下文任务上具有领先性能。
工具使用能力更强：具有强大的工具调用能力，比如可以针对复杂问题，搜索上百个网页并进行整合分析。

通过开源工具OpenCompass对InternLM2.5在几个重要的评测集上进行了评测，部分评测结果如下表所示，欢迎访问OpenCompass 榜单获取更多的评测结果。

Base Model

Benchmark	InternLM2.5-7B	LLaMA-3-8B	Yi-1.5-9B
MMLU(5-shot)	71.6	66.4	71.6
CMMLU(5-shot)	79.1	51.0	74.1
BBH(3-shot)	70.1	59.7	71.1
MATH(4-shot)	34.0	16.4	31.9
GSM8K(4-shot)	74.8	54.3	74.5
GPQA(0-shot)	31.3	31.3	27.8

Chat Model*

Benchmark	InternLM2.5-7B-Chat	Llama3-8B-Instruct	Gemma2-9B-IT	Yi-1.5-9B-Chat	GLM-4-9B-Chat	Qwen2-7B-Instruct
MMLU (5-shot)	72.8	68.4	70.9	71.0	71.4	70.8
CMMLU (5-shot)	78.0	53.3	60.3	74.5	74.5	80.9
BBH (3-shot CoT)	71.6	54.4	68.2*	69.6	69.6	65.0
MATH (0-shot CoT)	60.1	27.9	46.9	51.1	51.1	48.6
GSM8K (0-shot CoT)	86.0	72.9	88.9	80.1	85.3	82.9
GPQA (0-shot)	38.4	26.1	33.8	37.9	36.9	38.4

以上评测结果基于OpenCompass获得（部分数据标注 * 代表数据来自原始论文），具体测试细节可参见OpenCompass中提供的配置文件。
评测数据会因OpenCompass的版本迭代而存在数值差异，请以OpenCompass最新版的评测结果为主。

环境准备

安装Ascend CANN Toolkit和Kernels

安装方法请参考安装教程或使用以下命令

# 请替换 URL 为 CANN 版本和设备型号对应的 URL
# 安装 CANN Toolkit
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run
bash Ascend-cann-toolkit_8.0.RC1.alpha001_linux-"$(uname -i)".run --install

# 安装 CANN Kernels
wget https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C17SPC701/Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run
bash Ascend-cann-kernels-910b_8.0.RC1.alpha001_linux.run --install

# 设置环境变量
source /usr/local/Ascend/ascend-toolkit/set_env.sh

安装openMind Hub Client以及openMind Library

安装openMind Hub Client

pip install openmind_hub

安装openMind Library，并安装PyTorch框架及其依赖。

pip install openmind[pt]

更详细的安装信息请参考openMind官方的环境安装章节。

安装LLaMa Factory

git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git
cd LLaMA-Factory
pip install -e ".[torch-npu,metrics]"

模型链接和下载

InternLM2.5模型系列由社区开发者在魔乐社区贡献，包括：

InternLM2.5-7B：https://modelers.cn/models/AI-Research/internlm2_5-7b
InternLM2.5-7B-Chat：https://modelers.cn/models/AI-Research/internlm2_5-7b-chat
InternLM2.5-7B-Chat-1M：https://modelers.cn/models/AI-Research/internlm2_5-7b-chat-1m

通过Git从魔乐社区下载模型的repo，以InternLM2.5-7B-Chat为例：

# 首先保证已安装git-lfs（https://git-lfs.com）
git lfs install
git clone https://modelers.cn/AI-Research/internlm2_5-7b-chat.git

模型推理

用户可以使用openMind Library或者LLaMa Factory进行模型推理，以InternLM2.5-7B-Chat为例，具体如下：

使用openMind Library进行模型推理

新建推理脚本inference_internlm2_5_7b_chat.py，推理脚本内容为：

import torch
from openmind import AutoTokenizer, AutoModelForCausalLM

# 若模型已下载，可替换成模型本地路径
tokenizer = AutoTokenizer.from_pretrained("AI-Research/internlm2_5-7b-chat", trust_remote_code=True)
# `torch_dtype=torch.float16`可以令模型以float16精度加载，否则transformers会将模型加载为float32，导致显存不足
model = AutoModelForCausalLM.from_pretrained("AI-Research/internlm2_5-7b-chat", torch_dtype=torch.float16, trust_remote_code=True).npu()
model = model.eval()
response, history = model.chat(tokenizer, "你好，请提供三个管理时间的建议。", history=[])
print(response)

执行推理脚本：

python inference_internlm2_5_7b_chat.py

推理结果如下：

使用LLaMa Factory与模型交互

在LLaMa Factory路径下新建examples/inference/internlm2_5_7b_chat.yaml推理配置文件，文件内容为：

model_name_or_path: xxx # 当前仅支持本地加载，填写InternLM2.5-7B-Chat本地权重路径
template: intern2

使用以下命令与模型进行交互：

llamafactory-cli chat examples/inference/internlm2_5_7b_chat.yaml

交互结果如下：

模型微调

我们使用单张昇腾NPU，基于LLaMa Factory框架，采用LLaMa Factory提供的自我认知数据集进行Lora微调，改变模型对自己和作者的认知。

数据集

自我认知数据集在LLaMa Factory仓的data/identity.json路径下，这里使用文本编辑器将数据集的{{name}}和{{author}}分别替换为"小昇"和"openMind"。替换后的示例如下：

identity_dataset

微调

在LLaMa Factory路径下新建examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml微调配置文件，微调配置文件如下：

### model
model_name_or_path: xxx # 当前仅支持本地加载，填写InternLM2.5-7B-Chat本地权重路径

### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all

### dataset
dataset: identity
template: intern2
cutoff_len: 128
preprocessing_num_workers: 16

### output
output_dir: saves/internlm2_5_7b_chat/lora/sft
logging_steps: 5
save_steps: 20 
plot_loss: true
overwrite_output_dir: true

### train
per_device_train_batch_size: 8
gradient_accumulation_steps: 1
learning_rate: 1.0e-4
num_train_epochs: 5.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
ddp_timeout: 180000000

通过下面的命令启动微调：

export ASCEND_RT_VISIBLE_DEVICES=0
llamafactory-cli train examples/train_lora/internlm2_5_7b_chat_lora_sft.yaml

微调可视化

微调结果

微调结束后，在LLaMa Factory路径下新建examples/inference/internlm2_5_7b_chat_lora_sft.yaml推理配置文件，配置文件内容为：

model_name_or_path: xxx # 当前仅支持本地加载，填写InternLM2.5-7B-Chat本地权重路径
adapter_name_or_path: saves/internlm2_5_7b_chat/lora/sft/checkpoint-60/ 
template: intern2
finetuning_type: lora

通过下面的命令启动推理：

llamafactory-cli chat examples/inference/internlm2_5_7b_chat_lora_sft.yaml

推理结果为：

训练前生成样例
- 问题：您好，请介绍一下你自己

训练后生成样例
- 问题：您好，请介绍一下你自己

微调后通用能力测试
- 问题1：请介绍一下故宫

- 问题2：请问怎么做糖醋排骨？

结语

本次实践采用的是一款应用使能开发套件openMind，在调用数据集训练、推理、微调等方面挺方便顺畅的。朋友们可以试试，也欢迎分享你们的经验，一起交流：

openMind，一款应用使能开发套件，为各大模型社区提供支持，提供海量模型/数据托管能力、在线推理体验服务，同时具备模型训练、微调、评估、推理等全流程开发能力。开发者通过简单的API接口即可实现微调、推理等任务，极大缩短开发周期，助力AI技术的创新发展。目前，openMind已支持欢迎在魔乐等AI生态社区，欢迎体验。