Server - PyTorch Lighting Warning 的 seed_everything、gpus、max_epochs、checkpoint 等解决方案

欢迎关注我的CSDN:https://spike.blog.csdn.net/
本文地址:https://spike.blog.csdn.net/article/details/132673146

Img
PyTorch Lightning 是一个用于简化 PyTorch 代码的高级框架,可以帮助你快速构建、训练和部署深度学习模型。核心思想是将模型的逻辑和工程分离,只需要关注模型的核心部分,而不用担心数据加载、分布式训练、优化器等细节。PyTorch Lightning 还提供了一系列的工具和插件,让你可以轻松地使用各种加速器、日志系统、可视化工具等,目标是让你用最少的代码实现最高的性能,同时保持 PyTorch 的灵活性和可扩展性。

1. seed_everything

Warning 如下:

LightningDeprecationWarning: pytorch_lightning.utilities.seed.seed_everything has been deprecated in v1.8.0 and will be removed in v1.10.0. Please use lightning_lite.utilities.seed.seed_everything instead.
pytorch_lightning.utilities.seed.seed_everything has been deprecated in v1.8.0 and will be”

原因是 pytorch_lightning 升级至 v1.8.0 版本,seed_everything 函数文件更换位置,修改方案如下:

# from pytorch_lightning.utilities.seed import seed_everything
from lightning_lite.utilities.seed import seed_everything

if args.seed:  # 使用 PyTorch Lighting 设置随机种子
    seed_everything(args.seed)

参考:PyTorch Lightning - pytorch_lightning.utilities.seed

2. Trainer(gpus=1)

Warning 如下:

LightningDeprecationWarning: Setting Trainer(gpus=1) is deprecated in v1.7 and will be removed in v2.0. Please use Trainer(accelerator='gpu', devices=1) instead.

原因 gpus 参数需要更丰富的设置方式,替换成 accelerator + devices 参数,即:

trainer = pl.Trainer.from_argparse_args(
    args,
	# ...
    gpus=None,
    accelerator='gpu',
    devices=args.gpus
)

参考:CSDN - Pytorch-Lightning中的训练器–Trainer

3. max_epochs

Warning 如下:

PossibleUserWarning: max_epochs was not set. Setting it to 1000 epochs. To train without an epoch limit, set max_epochs=-1.

原因是建议设置 max_epochs 参数,默认是 -1,即:

trainer = pl.Trainer.from_argparse_args(
    args,
	# ...
    max_epochs=-1,
)

4. Checkpoint

Warning:

UserWarning: Checkpoint directory mydata/output_dir/checkpoints exists and is not empty.
rank_zero_warn(f"Checkpoint directory {dirpath} exists and is not empty.")

原因是 Checkpoint 文件夹已经存在,建设根据时间戳设置 output_dir,即:

timestamp=$(date +%s)
--output_dir "mydata/output_dir_${timestamp}/"

参考:shell脚本获取当前时间戳

5. cpu_offload

Warning:

Config parameter cpu_offload is deprecated use offload_optimizer instead

将 DeepSpeed 的 CPU 负载参数,由 cpu_offload 设置成 offload_optimizer,修改 deepspeed_config.json,即

"zero_optimization": {
    
    
  # ...
  "offload_optimizer": {
    
    
    "device": "cpu",
    "pin_memory": true,
    "buffer_count": 4,
    "fast_init": false
  },
},

参考:

猜你喜欢

转载自blog.csdn.net/u012515223/article/details/132673146