矩阵相乘-并行算法

由天下分享时间：2025/3/22 7:15:54 加入收藏我要投稿点赞

temp +=

buffer[i*width+k+3]*n[j*width+k+3]; temp +=

buffer[i*width+k+4]*n[j*width+k+4]; temp +=

buffer[i*width+k+5]*n[j*width+k+5]; temp +=

buffer[i*width+k+6]*n[j*width+k+6]; temp +=

buffer[i*width+k+7]*n[j*width+k+7]; temp +=

buffer[i*width+k+8]*n[j*width+k+8]; temp +=

buffer[i*width+k+9]*n[j*width+k+9]; }

ans[i*width+j] = temp; } }

在将循环次数压缩的同时，为了进一步减少循环的运算量，在每一个步长为10的循环之前做预处理，避免循环体中的重复运算。例如在主进程在接受其他进程时，将结果矩阵整合的过程：

for(k=1;k

MPI_Recv(ans,line*width,MPI_INT,k,2,MPI_COMM_WORLD,&status);

for(i=0;i

count=i*k*width; //将i*k*width提前算好，减少了下一步循环的重复运算 count1=i*width;

for(j=0;j

p[count+j+3] = ans[count1+j+3]; p[count+j+4] = ans[count1+j+4]; p[count+j+5] = ans[count1+j+5]; p[count+j+6] = ans[count1+j+6]; p[count+j+7] = ans[count1+j+7]; p[count+j+8] = ans[count1+j+8]; p[count+j+9] = ans[count1+j+9]; } } }

2. 节省空间

在进行矩阵工作量划分并传送的时候，为每一个进程开辟仅仅是自己所需要大小的空间，例如在9进程的环境下，每个进程所需要接受的缓存空间为B矩阵大小以及大约1/9大小A矩阵。内存开辟： buffer = (DATA 矩阵A分块传输：

*)malloc(sizeof(DATA)*width*line);

for(k=1;k

for(i=k;i

count=i/numprocs*width; count1=i*width; for(j=0;j

buffer[count+j]=m[count1+j];

buffer[count+j+1]=m[count1+j+1];

buffer[count+j+2]=m[count1+j+2];

buffer[count+j+3]=m[count1+j+3];

buffer[count+j+4]=m[count1+j+4];

buffer[count+j+5]=m[count1+j+5];

buffer[count+j+6]=m[count1+j+6];

buffer[count+j+7]=m[count1+j+7];

buffer[count+j+8]=m[count1+j+8];

buffer[count+j+9]=m[count1+j+9]; } }

MPI_Send(buffer,line*width,MPI_INT,k,1,MPI_COMM_WORLD);

同样的方式也运用在运行空间的开辟上。这样做不仅仅是内存空间的节约，同时也减少了进程之间的数据传输量，大大节省了进程之间的协作时间！

矩阵相乘-并行算法

temp+=buffer[i*width+k+3]*n[j*width+k+3];temp+=buffer[i*width+k+4]*n[j*width+k+4];temp+=buffer[i*width+k+5]*n[j*width+k+5];temp+=b

推荐度：

点击下载文档文档为doc格式

矩阵相乘-并行算法

矩阵相乘-并行算法

相关推荐文档

精选图文

热门排序

推荐文章

热门标签

相关文章列表