2017-01-30 111 views
0

我正在运行一个代码,我简单地创建了2个矩阵:一个矩阵的尺寸为args x nsame,另一个矩阵的尺寸为nsame x bcols。结果是一个尺寸为x bcols的数组。这是相当简单的使用BLAS来实现,并出现下面的代码如期运行使用具有的openmpi下面的主从模式时:`当尺寸变大时会出现矩阵计算错误

#include <iostream> 
#include <stdio.h> 
#include <iostream> 
#include <cmath> 
#include <mpi.h> 
#include <gsl/gsl_blas.h> 
using namespace std;` 

int main(int argc, char** argv){ 
    int noprocs, nid; 
    MPI_Status status; 
    MPI_Init(&argc, &argv); 
    MPI_Comm_rank(MPI_COMM_WORLD, &nid); 
    MPI_Comm_size(MPI_COMM_WORLD, &noprocs); 
    int master = 0; 

    const int nsame = 500; //must be same if matrices multiplied together = acols = brows 
    const int arows = 500; 
    const int bcols = 527; //works for 500 x 500 x 527 and 6000 x 100 x 36 
    int rowsent; 
    double buff[nsame]; 
    double b[nsame*bcols]; 
    double c[arows][bcols]; 
    double CC[1*bcols]; //here ncols corresponds to numbers of rows for matrix b 
    for (int i = 0; i < bcols; i++){ 
       CC[i] = 0.; 
    }; 
    // Master part 
    if (nid == master) { 

     double a [arows][nsame]; //creating identity matrix of dimensions arows x nsame (it is I if arows = nsame) 
     for (int i = 0; i < arows; i++){ 
      for (int j = 0; j < nsame; j++){ 
       if (i == j) 
        a[i][j] = 1.; 
       else 
        a[i][j] = 0.; 
      } 
     } 
     double b[nsame*bcols];//here ncols corresponds to numbers of rows for matrix b 
      for (int i = 0; i < (nsame*bcols); i++){ 
       b[i] = (10.*i + 3.)/(3.*i - 2.) ; 
      }; 
     MPI_Bcast(b,nsame*bcols, MPI_DOUBLE_PRECISION, master, MPI_COMM_WORLD); 
     rowsent=0; 
     for (int i=1; i < (noprocs); i++) { 
      // Note A is a 2D array so A[rowsent]=&A[rowsent][0] 
      MPI_Send(a[rowsent], nsame, MPI_DOUBLE_PRECISION,i,rowsent+1,MPI_COMM_WORLD); 
      rowsent++; 
     } 

     for (int i=0; i<arows; i++) { 
      MPI_Recv(CC, bcols, MPI_DOUBLE_PRECISION, MPI_ANY_SOURCE, MPI_ANY_TAG, 
        MPI_COMM_WORLD, &status); 
      int sender = status.MPI_SOURCE; 
      int anstype = status.MPI_TAG;   //row number+1 
      int IND_I = 0; 
      while (IND_I < bcols){ 
       c[anstype - 1][IND_I] = CC[IND_I]; 
       IND_I++; 
      } 
      if (rowsent < arows) { 
       MPI_Send(a[rowsent], nsame,MPI_DOUBLE_PRECISION,sender,rowsent+1,MPI_COMM_WORLD); 
       rowsent++; 
      } 
      else {  // tell sender no more work to do via a 0 TAG 
       MPI_Send(MPI_BOTTOM,0,MPI_DOUBLE_PRECISION,sender,0,MPI_COMM_WORLD); 
      } 
     } 
    } 

    // Slave part 
    else { 
     MPI_Bcast(b,nsame*bcols, MPI_DOUBLE_PRECISION, master, MPI_COMM_WORLD); 
     MPI_Recv(buff,nsame,MPI_DOUBLE_PRECISION,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
     while(status.MPI_TAG != 0) { 
      int crow = status.MPI_TAG; 
      gsl_matrix_view AAAA = gsl_matrix_view_array(buff, 1, nsame); 
      gsl_matrix_view BBBB = gsl_matrix_view_array(b, nsame, bcols); 
      gsl_matrix_view CCCC = gsl_matrix_view_array(CC, 1, bcols); 

      /* Compute C = A B */ 
      gsl_blas_dgemm (CblasNoTrans, CblasNoTrans, 1.0, &AAAA.matrix, &BBBB.matrix, 
          0.0, &CCCC.matrix); 

      MPI_Send(CC,bcols,MPI_DOUBLE_PRECISION, master, crow, MPI_COMM_WORLD); 
      MPI_Recv(buff,nsame,MPI_DOUBLE_PRECISION,master,MPI_ANY_TAG,MPI_COMM_WORLD,&status); 
     } 
    } 

    // output c here on master node //uncomment the below lines if I wish to see the output 
    // if (nid == master){ 
//  if (rowsent == arows){ 
//   //   cout << rowsent; 
//   int IND_F = 0; 
//   while (IND_F < arows){ 
//    int IND_K = 0; 
//    while (IND_K < bcols){ 
//     cout << "[" << IND_F << "]" << "[" << IND_K << "] = " << c[IND_F][IND_K] << " "; 
//     IND_K++; 
//    } 
//    cout << "\n"; 
//    IND_F++; 
//   } 
//  } 
// } 
    MPI_Finalize(); 
    //free any allocated space here 
    return 0; 
}; 

现在看似奇怪的是,当我增加矩阵的大小(例如从nsame = 500到nsame = 501),代码不再起作用。我收到以下错误:

mpirun noticed that process rank 0 with PID 0 on node Users-MacBook-Air exited on signal 11 (Segmentation fault: 11). 

我与尺寸的矩阵的其它组合尝试这样做,有总是似乎是矩阵本身的尺寸(这似乎的上限变化基于如何我改变不同的尺寸本身)。我也尝试修改矩阵的值,虽然这看起来没有改变任何东西。我知道在我的例子中有一些替代方法来初始化矩阵(例如使用向量),但是我只是想知道为什么我现在的任意大小矩阵乘法似乎只能在某种程度上起作用。

回答

2

你声明太多大的本地变量,这是导致堆栈空间相关的问题。特别是a是500x500双倍(250000个8字节元素或200万字节)。 b甚至更​​大。

您需要为部分或全部这些阵列动态分配空间。

可能有编译器选项来增加初始堆栈空间,但这不是一个好的长期解决方案。