cuda 数组元素个数大于线程数目

关键是更新tid

tid += blockDim.x * gridDim.x;  (加上所有线程的数量,以便对其他数组元素接着更新)

__global__ void add(int *d_arr, int *d_brr, int *d_crr, int arrLength) {  
    int tid = blockDim.x * blockIdx.x + threadIdx.x;
    if(tid <arrLength) {  
        d_crr[tid] = d_arr[tid] + d_brr[tid];  
        tid += blockDim.x * gridDim.x;  
    }  
} 

猜你喜欢

转载自blog.csdn.net/qq_30263737/article/details/81218627