matrix multiplication with MPI [on hold]
- by user3695701
I'm working on an assignment on matrix multiplication with MPI. A*B=C. the requirement is that B should be vertically partitioned. Here's what I intend to do: broadcast matrix A to all processes and scatter B into several slices with each slice containing n/p columns. The following code only works when the number of process(p) is 1. when p1(say 2), I got
[cluster2:21080] *** Process received signal ***
[cluster2:21080] Signal: Segmentation fault (11)
[cluster2:21080] Signal code: Address not mapped (1)
[cluster2:21080] Failing at address: (nil)
[cluster2:21080] [ 0] /lib/libpthread.so.0(+0xf8f0) [0x7f49f38108f0]
[cluster2:21080] [ 1] /lib/libc.so.6(memcpy+0xe1) [0x7f49f35024c1]
[cluster2:21080] [ 2] /usr/lib/libmpi.so.0(ompi_convertor_unpack+0x121)[0x7f49f47c88e1]
[cluster2:21080] [ 3] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x8a26) [0x7f49f0dcea26]
[cluster2:21080] [ 4] /usr/lib/openmpi/lib/openmpi/mca_btl_tcp.so(+0x662c) [0x7f49efce462c]
[cluster2:21080] [ 5] /usr/lib/libopen-pal.so.0(+0x1ede8) [0x7f49f42e0de8]
[cluster2:21080] [ 6] /usr/lib/libopen-pal.so.0(opal_progress+0x99) [0x7f49f42d5369]
[cluster2:21080] [ 7] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x5585) [0x7f49f0dcb585]
[cluster2:21080] [ 8] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(+0xcc01) [0x7f49eeeb1c01]
[cluster2:21080] [ 9] /usr/lib/openmpi/lib/openmpi/mca_coll_tuned.so(+0x266c) [0x7f49eeea766c]
[cluster2:21080] [10] /usr/lib/openmpi/lib/openmpi/mca_coll_sync.so(+0x1388) [0x7f49ef0c0388]
[cluster2:21080] [11] /usr/lib/libmpi.so.0(MPI_Bcast+0x10e) [0x7f49f47d025e]
[cluster2:21080] [12] ./out(main+0x259) [0x401571]
[cluster2:21080] [13] /lib/libc.so.6(__libc_start_main+0xfd) [0x7f49f3498c8d]
[cluster2:21080] [14] ./out() [0x400f29]
[cluster2:21080] *** End of error message ***
Can someone help me? Thanks.
//matrices A and B
//double* A =(double *)malloc(n*n*sizeof(double));
//double* B =(double *)malloc(n*n*sizeof(double));
//code initializing A,B...
//n is the size of the matrix
//p is the number of processes
//myrank is the rank of calling process
MPI_Init (&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
MPI_Comm_size(MPI_COMM_WORLD, &p);
//broadcast A to all processes
MPI_Bcast (A, n*n, MPI_DOUBLE, 0, MPI_COMM_WORLD);
MPI_Datatype tmp_type, col_type;
// extract a slice from B
MPI_Type_vector(n, num_of_col_per_slice, n, MPI_DOUBLE, &tmp_type);
// position of the first (0) and each next (stride * sizeof(double) ) slice
MPI_Type_create_resized(tmp_type, 0, n * sizeof(double), &col_type);
MPI_Type_commit(&col_type);
//scatter a slice of B to each process
MPI_Scatter(B, 1, col_type, B+myrank*n/p, n * n/p, MPI_DOUBLE, 0, MPI_COMM_WORLD);
//use blas function to calculate A*sliceOfB and store the resulting slice to C
cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, n, n/p, n, 1.0, A, n, B+myrank*n/p, n, 0.0, C+myrank*n/p, n);
//gather all those resulting slices into C
MPI_Gather (C+myrank*n/p, n*n/p, MPI_DOUBLE, C, n*n/p, MPI_DOUBLE, 0, MPI_COMM_WORLD);