I have a question..
I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication..
relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c).
The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation:
(This program runs in a cluster and computers have 2 cores, thus I created 2 threads.)
The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP.
I tried to multiply two 600x600 matrices.
void Local_matrix_multiply(
LOCAL_MATRIX_T* local_A /* in */,
LOCAL_MATRIX_T* local_B /* in */,
LOCAL_MATRIX_T* local_C /* out */) {
int i, j, k;
chunk = CHUNKSIZE; // 100
#pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2)
{
/*
tid = omp_get_thread_num();
if(tid == 0){
nthreads = omp_get_num_threads();
printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads);
}
printf("Thread %d use the matrix: \n", tid);
*/
#pragma omp for schedule(static, chunk)
for (i = 0; i < Order(local_A); i++)
for (j = 0; j < Order(local_A); j++)
for (k = 0; k < Order(local_B); k++)
Entry(local_C,i,j) = Entry(local_C,i,j)
+ Entry(local_A,i,k)*Entry(local_B,k,j);
} //end pragma omp parallel
} /* Local_matrix_multiply */