2011-03-09 53 views
13

,我需要在10台计算机之间分发pgm文件的数据时,会发生分段错误。在Jonathan Dursi和Shawn Chin的帮助下,我整合了代码。 我可以编译我的程序,但它有分段错误。我跑,但没有发生当我在我以前的文章中运行并行程序并打开MPI

mpirun -np 10 ./exmpi_2 balloons.pgm output.pgm

结果是

[ubuntu:04803] *** Process received signal *** 
[ubuntu:04803] Signal: Segmentation fault (11) 
[ubuntu:04803] Signal code: Address not mapped (1) 
[ubuntu:04803] Failing at address: 0x7548d0c 
[ubuntu:04803] [ 0] [0x86b410] 
[ubuntu:04803] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x186b00] 
[ubuntu:04803] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04803] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x141bd6] 
[ubuntu:04803] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04803] *** End of error message *** 
-------------------------------------------------------------------------- 
mpirun noticed that process rank 1 with PID 4803 on node ubuntu exited on signal 11 (Segmentation fault). 
-------------------------------------------------------------------------- 

然后我试着用Valgrind的运行调试程序并生成

valgrind mpirun -np 10 ./exmpi_2 balloons.pgm output.pgm

的output.pgm结果是

==4632== Memcheck, a memory error detector 
==4632== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al. 
==4632== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for copyright info 
==4632== Command: mpirun -np 10 ./exmpi_2 2.pgm 10.pgm 
==4632== 
==4632== Syscall param sched_setaffinity(mask) points to unaddressable byte(s) 
==4632== at 0x4215D37: syscall (syscall.S:31) 
==4632== by 0x402B335: opal_paffinity_linux_plpa_api_probe_init (plpa_api_probe.c:56) 
==4632== by 0x402B7CC: opal_paffinity_linux_plpa_init (plpa_runtime.c:37) 
==4632== by 0x402B93C: opal_paffinity_linux_plpa_have_topology_information (plpa_map.c:494) 
==4632== by 0x402B180: linux_module_init (paffinity_linux_module.c:119) 
==4632== by 0x40BE2C3: opal_paffinity_base_select (paffinity_base_select.c:64) 
==4632== by 0x40927AC: opal_init (opal_init.c:295) 
==4632== by 0x4046767: orte_init (orte_init.c:76) 
==4632== by 0x804A82E: orterun (orterun.c:540) 
==4632== by 0x804A3EE: main (main.c:13) 
==4632== Address 0x0 is not stack'd, malloc'd or (recently) free'd 
==4632== 
[ubuntu:04638] *** Process received signal *** 
[ubuntu:04639] *** Process received signal *** 
[ubuntu:04639] Signal: Segmentation fault (11) 
[ubuntu:04639] Signal code: Address not mapped (1) 
[ubuntu:04639] Failing at address: 0x7548d0c 
[ubuntu:04639] [ 0] [0xc50410] 
[ubuntu:04639] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0xde4b00] 
[ubuntu:04639] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04639] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xd9fbd6] 
[ubuntu:04639] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04639] *** End of error message *** 
[ubuntu:04640] *** Process received signal *** 
[ubuntu:04640] Signal: Segmentation fault (11) 
[ubuntu:04640] Signal code: Address not mapped (1) 
[ubuntu:04640] Failing at address: 0x7548d0c 
[ubuntu:04640] [ 0] [0xdad410] 
[ubuntu:04640] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0xe76b00] 
[ubuntu:04640] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04640] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0xe31bd6] 
[ubuntu:04640] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04640] *** End of error message *** 
[ubuntu:04641] *** Process received signal *** 
[ubuntu:04641] Signal: Segmentation fault (11) 
[ubuntu:04641] Signal code: Address not mapped (1) 
[ubuntu:04641] Failing at address: 0x7548d0c 
[ubuntu:04641] [ 0] [0xe97410] 
[ubuntu:04641] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x1e8b00] 
[ubuntu:04641] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04641] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x1a3bd6] 
[ubuntu:04641] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04641] *** End of error message *** 
[ubuntu:04642] *** Process received signal *** 
[ubuntu:04642] Signal: Segmentation fault (11) 
[ubuntu:04642] Signal code: Address not mapped (1) 
[ubuntu:04642] Failing at address: 0x7548d0c 
[ubuntu:04642] [ 0] [0x92d410] 
[ubuntu:04642] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x216b00] 
[ubuntu:04642] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04642] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x1d1bd6] 
[ubuntu:04642] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04642] *** End of error message *** 
[ubuntu:04643] *** Process received signal *** 
[ubuntu:04643] Signal: Segmentation fault (11) 
[ubuntu:04643] Signal code: Address not mapped (1) 
[ubuntu:04643] Failing at address: 0x7548d0c 
[ubuntu:04643] [ 0] [0x8f4410] 
[ubuntu:04643] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x16bb00] 
[ubuntu:04643] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04643] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x126bd6] 
[ubuntu:04643] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04643] *** End of error message *** 
[ubuntu:04638] Signal: Segmentation fault (11) 
[ubuntu:04638] Signal code: Address not mapped (1) 
[ubuntu:04638] Failing at address: 0x7548d0c 
[ubuntu:04638] [ 0] [0x4f6410] 
[ubuntu:04638] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x222b00] 
[ubuntu:04638] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04638] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x1ddbd6] 
[ubuntu:04638] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04638] *** End of error message *** 
[ubuntu:04644] *** Process received signal *** 
[ubuntu:04644] Signal: Segmentation fault (11) 
[ubuntu:04644] Signal code: Address not mapped (1) 
[ubuntu:04644] Failing at address: 0x7548d0c 
[ubuntu:04644] [ 0] [0x61f410] 
[ubuntu:04644] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x1a3b00] 
[ubuntu:04644] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04644] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x15ebd6] 
[ubuntu:04644] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04644] *** End of error message *** 
[ubuntu:04645] *** Process received signal *** 
[ubuntu:04645] Signal: Segmentation fault (11) 
[ubuntu:04645] Signal code: Address not mapped (1) 
[ubuntu:04645] Failing at address: 0x7548d0c 
[ubuntu:04645] [ 0] [0x7a3410] 
[ubuntu:04645] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x1d5b00] 
[ubuntu:04645] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04645] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x190bd6] 
[ubuntu:04645] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04645] *** End of error message *** 
[ubuntu:04647] *** Process received signal *** 
[ubuntu:04647] Signal: Segmentation fault (11) 
[ubuntu:04647] Signal code: Address not mapped (1) 
[ubuntu:04647] Failing at address: 0x7548d0c 
[ubuntu:04647] [ 0] [0xf54410] 
[ubuntu:04647] [ 1] /lib/tls/i686/cmov/libc.so.6(fclose+0x1a0) [0x2bab00] 
[ubuntu:04647] [ 2] ./exmpi_2(main+0x78e) [0x80492c2] 
[ubuntu:04647] [ 3] /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x275bd6] 
[ubuntu:04647] [ 4] ./exmpi_2() [0x8048aa1] 
[ubuntu:04647] *** End of error message *** 
-------------------------------------------------------------------------- 
mpirun noticed that process rank 2 with PID 4639 on node ubuntu exited on signal 11 (Segmentation fault). 
-------------------------------------------------------------------------- 
6 total processes killed (some possibly by mpirun during cleanup) 
==4632== 
==4632== HEAP SUMMARY: 
==4632==  in use at exit: 158,751 bytes in 1,635 blocks 
==4632== total heap usage: 10,443 allocs, 8,808 frees, 15,854,537 bytes allocated 
==4632== 
==4632== LEAK SUMMARY: 
==4632== definitely lost: 81,655 bytes in 112 blocks 
==4632== indirectly lost: 5,108 bytes in 91 blocks 
==4632==  possibly lost: 1,043 bytes in 17 blocks 
==4632== still reachable: 70,945 bytes in 1,415 blocks 
==4632==   suppressed: 0 bytes in 0 blocks 
==4632== Rerun with --leak-check=full to see details of leaked memory 
==4632== 
==4632== For counts of detected and suppressed errors, rerun with: -v 
==4632== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 96 from 9) 

有人可以帮我解决这个问题。这是我的源代码

#include <stdio.h> 
#include <stdlib.h> 
#include <string.h> 
#include "mpi.h" 
#include <syscall.h> 

#define SIZE_X 640 
#define SIZE_Y 480 




int main(int argc, char **argv) 
{ 
FILE *FR,*FW; 
int ierr; 
int rank, size; 
int ncells; 
int greys[SIZE_X][SIZE_Y]; 
int rows,cols, maxval; 

int mystart, myend, myncells; 
const int IONODE=0; 
int *disps, *counts, *mydata; 
int *data; 
int i,j,temp1; 
char dummy[50]=""; 





ierr = MPI_Init(&argc, &argv); 
if (argc != 3) { 
    fprintf(stderr,"Usage: %s infile outfile\n",argv[0]); 
    fprintf(stderr,"outputs the negative of the input file.\n"); 
    return -1; 
}    

ierr = MPI_Comm_rank(MPI_COMM_WORLD, &rank); 
ierr = MPI_Comm_size(MPI_COMM_WORLD, &size); 
if (ierr) { 
    fprintf(stderr,"Catastrophic MPI problem; exiting\n"); 
    MPI_Abort(MPI_COMM_WORLD,1); 
} 

    if (rank == IONODE) { 
      //if (read_pgm(argv[1], &greys, &rows, &cols, &maxval)) { 
      // fprintf(stderr,"Could not read file; exiting\n"); 
       // MPI_Abort(MPI_COMM_WORLD,2); 

     rows=SIZE_X; 
     cols=SIZE_Y; 
     maxval=255; 
     FR=fopen(argv[1], "r+"); 

     fgets(dummy,50,FR); 
     do{ fgets(dummy,50,FR); } while(dummy[0]=='#'); 
     fgets(dummy,50,FR); 

    for (j = 0; j <cols; j++) 
    { 
     for (i = 0; i <rows; i++) 
     { 
      fscanf(FR,"%d",&temp1); 
     greys[i][j] = temp1; 
     } 
    } 
} 

    ncells = rows*cols; 
    disps = (int *)malloc(size * sizeof(int)); 
    counts= (int *)malloc(size * sizeof(int)); 
    data = &(greys[0][0]); /* we know all the data is contiguous */ 

/* everyone calculate their number of cells */ 
ierr = MPI_Bcast(&ncells, 1, MPI_INT, IONODE, MPI_COMM_WORLD); 
myncells = ncells/size; 
mystart = rank*myncells; 
myend = mystart + myncells - 1; 
if (rank == size-1) myend = ncells-1; 
myncells = (myend-mystart)+1; 
mydata = (int *)malloc(myncells * sizeof(int)); 

/* assemble the list of counts. Might not be equal if don't divide evenly. */ 
ierr = MPI_Gather(&myncells, 1, MPI_INT, counts, 1, MPI_INT, IONODE, MPI_COMM_WORLD); 
if (rank == IONODE) { 
    disps[0] = 0; 
    for (i=1; i<size; i++) { 
     disps[i] = disps[i-1] + counts[i-1]; 
    } 
} 

/* scatter the data */ 
ierr = MPI_Scatterv(data, counts, disps, MPI_INT, mydata, myncells, MPI_INT, IONODE, MPI_COMM_WORLD); 

/* everyone has to know maxval */ 
ierr = MPI_Bcast(&maxval, 1, MPI_INT, IONODE, MPI_COMM_WORLD); 

for (i=0; i<myncells; i++) 
    mydata[i] = maxval-mydata[i]; 

/* Gather the data */ 
ierr = MPI_Gatherv(mydata, myncells, MPI_INT, data, counts, disps, MPI_INT, IONODE, MPI_COMM_WORLD); 

if (rank == IONODE) 
{ 
//  write_pgm(argv[2], greys, rows, cols, maxval); 
    FW=fopen(argv[2], "w"); 
    fprintf(FW,"P2\n%d %d\n255\n",rows,cols);  
    for(j=0;j<cols;j++) 
    for(i=0;i<rows;i++) 
    fprintf(FW,"%d ", greys[i][j]); 
} 

free(mydata); 
if (rank == IONODE) { 
    free(counts); 
    free(disps); 
    //free(&(greys[0][0])); 
    //free(greys); 

} 
fclose(FR); 
fclose(FW); 
MPI_Finalize(); 
return 0; 
} 

这是输入图像http://orion.math.iastate.edu/burkardt/data/pgm/balloons.pgm

+0

哪条线路给段落错误? – suszterpatt 2011-03-09 15:08:10

回答

16

祝贺;代码差不多完全跑完了,几乎死在最后几行代码上。

valgrind会让问题更清楚一些,但是您必须更加棘手地运行带MPI的valgrind - 或者涉及程序启动器的任何事情。相反的:

valgrind mpirun -np 10 ./exmpi_2 balloons.pgm output.pgm

这确实的mpirun的Valgrind的,你真的不关心,你想要做

mpirun -np 10 valgrind ./exmpi_2 balloons.pgm output.pgm

- 也就是说,要启动10个valgrinds,每个运行一个进程'值得exmpi_2。如果你这样做(和你已经使用-g编译),你会发现接近尾声,Valgrind的输出如下:

==6303== Access not within mapped region at address 0x1 
==6303== at 0x387FA60C17: [email protected]@GLIBC_2.2.5 (in /lib64/libc-2.5.so) 
==6303== by 0x401222: main (pgm.c:124) 

..而这一切就是这么简单;如果只有一个进程首先处理了fopen() ed文件,则您有全部进程正在执行fclose()。只需使用

if (rank == IONODE) { 
    fclose(FR); 
    fclose(FW); 
} 

似乎为我工作,更换

fclose(FR); 
fclose(FW); 

+0

分散操作后,数据是在1d阵列..我尝试使用拉普拉斯算子进行边缘检测,但它需要2d中的数据。分散后可以是2d数据吗?分散后使用1d数据进行一些图像处理时遇到问题。 – arep 2011-03-23 08:31:43

相关问题