报错信息
RuntimeError: CUDA error: device-side assert triggered
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [31,0,0], thread: [100,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [30,0,0], thread: [162,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [32,0,0], thread: [290,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.
解决
-
检查label中是否有-1,或者label中有大于num_classes的数。label更新无误后可解决问题
-
其他解决方法,尝试运行时加上:
CUDA_LAUNCH_BLOCKING=1 python train.py
联系方式
公众号搜索:YueTan