pytorch使用指定GPU报错:
Traceback (most recent call last):
File "test_bed/process_deepglint.py", line 102, in <module>
pred_dataset(outputFile)
File "test_bed/process_deepglint.py", line 36, in pred_dataset
pred_loader_deepg, model, criterion, attrWeights, useArcface = main()
File "/home/user1/main_cs_0708.py", line 114, in main
model = models.__dict__[arch]()
File "/home/user1/models/arc_face.py", line 35, in arcface
learner = arc_face.face_learner(conf, inference=True)
File "/home/user1/arc_face/Learner.py", line 24, in __init__
self.model = Backbone(conf.net_depth, conf.drop_ratio, conf.net_mode).to(conf.device)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 386, in to
return self._apply(convert)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 193, in _apply
module._apply(fn)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 199, in _apply
param.data = fn(param.data)
File "/home/user1/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 384, in convert
return t.to(device, dtype if t.is_floating_point() else None, non_blocking)
RuntimeError: CUDA error: invalid device ordinal
原因可能是:
- 代码中多个位置设置了使用哪些GPU,相互冲突,包括但不限于以下形式:
os.environ, torch.device, torch.cuda.set_device, args.gpu_id
等等,具体代码具体分析。不同代码作用范围不同,可能你后来设置的没有起到作用,起作用的是之前设置的。 - os.environ和 torch.device没有配合好。详见:matt-gardner@https://github.com/allenai/allennlp/issues/1090
- torch.device API 官方:https://pytorch.org/docs/stable/tensor_attributes.html
在我的代码中最后设置的就是:
os.environ['CUDA_VISIBLE_DEVICES'] = '1,'
conf.device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
这样设置的就是使用第二个GPU,序号为1