矩阵乘法 | 多线程优化加速

在此篇文章中，主要介绍矩阵的传统算法 $O(N^3)$ 的并行加速实现，包括pthread、openmp、mpich等。

单线程

void singleThread(int **matrix1, int **matrix2, int **res1) {
	int tmp=0, i, j, k;
	for(i=0;i<M;i++) {
		for(k=0;k<P;k++) {
			tmp=0;
			for(j=0;j<N;j++) {
				tmp += matrix1[i][j]*matrix2[j][k];
			}
			res1[i][k]=tmp;
		}
	}
}

pthread多线程

void *calc(void *rs) {
	Rows *rows = (Rows *)rs;
	int tmp, i, k, j;
	for(i=rows->row1; i<rows->row2; i++) {
		for(k=0;k<P;k++) {
			tmp=0;
			for(j=0;j<N;j++) {
				tmp += matrix1[i][j] * matrix2[j][k];
			}
			res2[i][k]=tmp;
		}
	}
}

void mutipleThread() {
	pthread_t threads[T];
	Rows rows[T];
	for(int i=0;i<T;i++) {
		rows[i].row1=i*(M/T);
		rows[i].row2=(i+1)*(M/T);
		pthread_create(&threads[i], NULL, calc, &rows[i]);
	}
	for(int i=0;i<T;i++) {
		pthread_join(threads[i], NULL);
	}
}

openmp多线程

void openMP() {
	int tmp, i, k, j;
	#pragma omp parallel for private(tmp, i, j, k)
	for(int thread=0;thread<T;thread++) {
		for(i=thread*(M/T); i<(thread+1)*(M/T); i++) {
			for(k=0;k<P;k++) {
				tmp=0;
				for(j=0;j<N;j++) {
					tmp += matrix1[i][j] * matrix2[j][k];
				}
				res3[i][k]=tmp;
			}
		}
	}
}

mpich多进程

void mpich(int myid, int numprocs) {
	int tmp, i, k, j;
	for(i=myid*(M/numprocs); i<(myid+1)*(M/numprocs); i++) {
		for(k=0;k<P;k++) {
			tmp=0;
			for(j=0;j<N;j++) {
				tmp += matrix1[i][j] * matrix2[j][k];
			}
			res4[i][k]=tmp;
		}
	}
}
int main(int argc, char **argv) {
	MPI_Init(&argc, &argv);
	init(matrix1, matrix2, res4);
	int myid, numprocs;
	MPI_Comm_rank(MPI_COMM_WORLD, &myid);
	MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
	double start = MPI_Wtime();
	mpich(myid, numprocs);
	double finish = MPI_Wtime();

	printf("I'm rank %d of %d, running %f seconds.\n", myid, numprocs, finish-start);
	MPI_Finalize();
}

运行结果分析

以下多线程和OpenMP使用四个线程并行，MPI采用四个进程并行

方法	用时
单线程	12.660000
多线程	4.070000
OpenMP	3.985000
MPI	3.581340

由上表可以看出，多线程、OpenMP、MPI都有四倍以上的加速，其中多线程和OpenMP运行效果不相上下，MPI效果最好。

矩阵乘法 | 多线程优化加速

单线程

pthread多线程

openmp多线程

mpich多进程

运行结果分析

猜你喜欢