起因:
利用mmdetection的tools/benchmark.py计算FPS时报错。
错误内容如下:
Traceback (most recent call last):
File "tools/analysis_tools/benchmark.py", line 191, in <module>
main()
File "tools/analysis_tools/benchmark.py", line 183, in main
init_dist(args.launcher, **cfg.dist_params)
File "D:\Anaconda\envs\eagermot\lib\site-packages\mmcv\runner\dist_utils.py", line 18, in init_dist
_init_dist_pytorch(backend, **kwargs)
File "D:\Anaconda\envs\eagermot\lib\site-packages\mmcv\runner\dist_utils.py", line 32, in _init_dist_pytorch
dist.init_process_group(backend=backend, **kwargs)
File "D:\Anaconda\envs\eagermot\lib\site-packages\torch\distributed\distributed_c10d.py", line 510, in init_process_group
timeout=timeout))
File "D:\Anaconda\envs\eagermot\lib\site-packages\torch\distributed\distributed_c10d.py", line 597, in _new_process_group_helper
raise RuntimeError("Distributed package doesn't have NCCL "
RuntimeError: Distributed package doesn't have NCCL built in
原因分析:
windows不支持NCCL backend
解决方式:
1、定位到如下代码位置
File "D:\Anaconda\envs\eagermot\lib\site-packages\mmcv\runner\dist_utils.py", line 32, in _init_dist_pytorch
2、在1(line 32)之前添加代码
backend='gloo'