openmp - Page 2 - Developer IT

OpenCL: does it play well with OpenMP, can I connect other languages to it, etc.

- by Cem Karan

The 1.0 spec for OpenCL just came out a few days ago (Spec is here) and I've just started to read through it. I want to know if it plays well with other high performance multiprocessing APIs like OpenMP (spec) and I want to know what I should learn. So, here are my basic questions: If I am already using OpenMP, will that break OpenCL or vice-versa? Is OpenCL more powerful than OpenMP? Or are they intended to be complementary? Is there a standard way of connecting an OpenCL program to a standard C99 program (or any other language)? What is it? Does anyone know if anyone is writing an OpenCL book? I'm reading the spec, but I've found books to be more helpful.

Read the article

Parallel Programming. Boost's MPI, OpenMP, TBB, or something else?

- by unknownthreat

Hello, I am totally a novice in parallel programming, but I do know how to program C++. Now, I am looking around for parallel programming library. I just want to give it a try, just for fun, and right now, I found 3 APIs, but I am not sure which one should I stick with. Right now, I see Boost's MPI, OpenMP and TBB. For anyone who have experienced with any of these 3 API (or any other parallelism API), could you please tell me the difference between these? Are there any factor to consider, like AMD or Intel architecture?

Read the article

Multi-Core Programming. Boost's MPI, OpenMP, TBB, or something else?

- by unknownthreat

Hello, I am totally a novice in Multi-Core Programming, but I do know how to program C++. Now, I am looking around for Multi-Core Programming library. I just want to give it a try, just for fun, and right now, I found 3 APIs, but I am not sure which one should I stick with. Right now, I see Boost's MPI, OpenMP and TBB. For anyone who have experienced with any of these 3 API (or any other API), could you please tell me the difference between these? Are there any factor to consider, like AMD or Intel architecture?

Read the article

Time with and without OpenMP

- by was

I have a question.. I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication.. relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c). The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation: (This program runs in a cluster and computers have 2 cores, thus I created 2 threads.) The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP. I tried to multiply two 600x600 matrices. void Local_matrix_multiply( LOCAL_MATRIX_T* local_A /* in */, LOCAL_MATRIX_T* local_B /* in */, LOCAL_MATRIX_T* local_C /* out */) { int i, j, k; chunk = CHUNKSIZE; // 100 #pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2) { /* tid = omp_get_thread_num(); if(tid == 0){ nthreads = omp_get_num_threads(); printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads); } printf("Thread %d use the matrix: \n", tid); */ #pragma omp for schedule(static, chunk) for (i = 0; i < Order(local_A); i++) for (j = 0; j < Order(local_A); j++) for (k = 0; k < Order(local_B); k++) Entry(local_C,i,j) = Entry(local_C,i,j) + Entry(local_A,i,k)*Entry(local_B,k,j); } //end pragma omp parallel } /* Local_matrix_multiply */

Read the article

How to break out of a nested parallel (OpenMP) Fortran loop idiomatically?

- by J.F. Sebastian

Here's sequential code: do i = 1, n do j = i+1, n if ("some_condition") then result = "here's result" return end if end do end do Is there a cleaner way to execute iterations of the outer loop concurrently other than: !$OMP PARALLEL private(i,j) !$OMP DO do i = 1, n if (found) goto 10 do j = i+1, n if (found) goto 10 if ("some_condition") then !$OMP CRITICAL !$OMP FLUSH if (.not.found) then found = .true. result = "here's result" end if !$OMP FLUSH !$OMP END CRITICAL goto 10 end if end do 10 continue end do !$OMP END DO NOWAIT !$OMP END PARALLEL

Read the article

What is a good method to use with multithreading to simulate this?

- by user1504257

I am writing a program in c++ in visual studio to be able to handle a line at a park. I have all of my customers in a line at the park and I want to be able to service them using multithreading with openmp. When I put the pragma and such in, I have multiple threads servicing the same customer at the same time for each and every customer I create, not what I want. I want for example, if I had two threads and four customers, that thread one to do customer one while thread 2 does customer 2. Then I would like thread 1 to do customer 3 and thread 2 customer 4 at the same time. I don't know if its possible or if there is a better way, but I need to use openmp. Thanks for your input.

Read the article

The correct usage of nested #pragma omp for directives

- by GoldenLee

The following code runs like a charm before OpenMP parallelization was applied. In fact, the following code was in a state of endless loop! I'm sure that's result from my incorrect use to the OpenMP directives. Would you please show me the correct way? Thank you very much. #pragma omp parallel for for (int nY = nYTop; nY <= nYBottom; nY++) { for (int nX = nXLeft; nX <= nXRight; nX++) { // Use look-up table for performance dLon = theApp.m_LonLatLUT.LonGrid()[nY][nX] + m_FavoriteSVISSRParams.m_dNadirLon; dLat = theApp.m_LonLatLUT.LatGrid()[nY][nX]; // If you don't want to use longitude/latitude look-up table, uncomment the following line //NOMGeoLocate.XYToGEO(dLon, dLat, nX, nY); if (dLon > 180 || dLat > 180) { continue; } if (Navigation.GeoToXY(dX, dY, dLon, dLat, 0) > 0) { continue; } // Skip void data scanline dY = dY - nScanlineOffset; // Compute coefficients as well as its four neighboring points' values nX1 = int(dX); nX2 = nX1 + 1; nY1 = int(dY); nY2 = nY1 + 1; dCx = dX - nX1; dCy = dY - nY1; dP1 = pIRChannelData->operator [](nY1)[nX1]; dP2 = pIRChannelData->operator [](nY1)[nX2]; dP3 = pIRChannelData->operator [](nY2)[nX1]; dP4 = pIRChannelData->operator [](nY2)[nX2]; // Bilinear interpolation usNomDataBlock[nY][nX] = (unsigned short)BilinearInterpolation(dCx, dCy, dP1, dP2, dP3, dP4); } }

Read the article

Running multiprocess applications from MATLAB

- by Jacob

I've written a multitprocess application in VC++ and tried to execute it with command line arguments with the system command. It runs, but only on one core --- any suggestions? Update:In fact, it doesn't even see the second core. I used OpenMP and used omp_get_max_threads() and omp_get_thread_num() to check and omp_get_max_threads() seems to be 1 when I execute the application from MATLAB but it's 2 (as is expected) if I run it from the command window.

Read the article

how to implement a "soft barrier" in multithreaded c++

- by Jason

I have some multithreaded c++ code with the following structure: do_thread_specific_work(); update_shared_variables(); //checkpoint A do_thread_specific_work_not_modifying_shared_variables(); //checkpoint B do_thread_specific_work_requiring_all_threads_have_updated_shared_variables(); What follows checkpoint B is work that could have started if all threads have reached only checkpoint A, hence my notion of a "soft barrier". Typically multithreading libraries only provide "hard barriers" in which all threads must reach some point before any can continue. Obviously a hard barrier could be used at checkpoint B. Using a soft barrier can lead to better execution time, especially since the work between checkpoints A and B may not be load-balanced between the threads (i.e. 1 slow thread who has reached checkpoint A but not B could be causing all the others to wait at the barrier just before checkpoint B). I've tried using atomics to synchronize things and I know with 100% certainty that is it NOT guaranteed to work. For example using openmp syntax, before the parallel section start with: shared_thread_counter = num_threads; //known at compile time #pragma omp flush Then at checkpoint A: #pragma omp atomic shared_thread_counter--; Then at checkpoint B (using polling): #pragma omp flush while (shared_thread_counter > 0) { usleep(1); //can be removed, but better to limit memory bandwidth #pragma omp flush } I've designed some experiments in which I use an atomic to indicate that some operation before it is finished. The experiment would work with 2 threads most of the time but consistently fail when I have lots of threads (like 20 or 30). I suspect this is because of the caching structure of modern CPUs. Even if one thread updates some other value before doing the atomic decrement, it is not guaranteed to be read by another thread in that order. Consider the case when the other value is a cache miss and the atomic decrement is a cache hit. So back to my question, how to CORRECTLY implement this "soft barrier"? Is there any built-in feature that guarantees such functionality? I'd prefer openmp but I'm familiar with most of the other common multithreading libraries. As a workaround right now, I'm using a hard barrier at checkpoint B and I've restructured my code to make the work between checkpoint A and B automatically load-balancing between the threads (which has been rather difficult at times). Thanks for any advice/insight :)

Read the article

parallelize using openmp

- by riham

I am trying to parallelize region in this form while(condition) { //work for loop //work } how could I do this

Read the article

Why is my computer not showing a speedup when I use parallel code?

- by Jared P

So I realize this question sounds stupid (and yes I am using a dual core), but I have tried two different libraries (Grand Central Dispatch and OpenMP), and when using clock() to time the code with and without the lines that make it parallel, the speed is the same. (for the record they were both using their own form of parallel for). They report being run on different threads, but perhaps they are running on the same core? Is there any way to check? (Both libraries are for C, I'm uncomfortable at lower layers.) This is super weird. Any ideas?

Read the article

How to printf a time_t variable as a floating point number?

- by soneangel

Hi guys, I'm using a time_t variable in C (openMP enviroment) to keep cpu execution time...I define a float value sum_tot_time to sum time for all cpu's...I mean sum_tot_time is the sum of cpu's time_t values. The problem is that printing the value sum_tot_time it appear as an integer or long, by the way without its decimal part! I tried in these ways: to printf sum_tot_time as a double being a double value to printf sum_tot_time as float being a float value to printf sum_tot_time as double being a time_t value to printf sum_tot_time as float being a time_t value Please help me!!

Read the article

Thread-safty of boost RNG

- by Maciej Piechotka

I have a loop which should be nicely pararellized by insering one openmp pragma: boost::normal_distribution<double> ddist(0, pow(retention, i - 1)); boost::variate_generator<gen &, BOOST_TYPEOF(ddist)> dgen(rng, ddist); // Diamond const std::uint_fast32_t dno = 1 << i - 1; // #pragma omp parallel for for (std::uint_fast32_t x = 0; x < dno; x++) for (std::uint_fast32_t y = 0; y < dno; y++) { const std::uint_fast32_t diff = size/dno; const std::uint_fast32_t x1 = x*diff, x2 = (x + 1)*diff; const std::uint_fast32_t y1 = y*diff, y2 = (y + 1)*diff; double avg = (arr[x1][y1] + arr[x1][y2] + arr[x2][y1] + arr[x2][y2])/4; arr[(x1 + x2)/2][(y1 + y2)/2] = avg + dgen(); } (unless I make an error each execution does not depend on others at all. Sorry that not all of code is inserted). However my question is - are boost RNG thread-safe? They seems to refer to gcc code for gcc so even if gcc code is thread-safe it may not be the case for other platforms.

Read the article

crash in calloc

- by mmd

I'm trying to debug a program I wrote. I ran it inside gdb and I managed to catch a SIGABRT from inside calloc(). I'm completely confused about how this can arise. Can it be a bug in gcc or even libc?? More details: My program uses OpenMP. I ran it through valgrind in single-threaded mode with no errors. I also use mmap() to load a 40GB file, but I doubt that is relevant. Inside gdb, I'm running with 30 threads. Several identical runs (same input&CL) finished correctly, until the problematic one that I caught. On the surface this suggests there might be a race condition of some type. However, the SIGABRT comes from calloc() which is out of my control. Here is some relevant gdb output: (gdb) info threads [...] * 11 Thread 0x7ffff0056700 (LWP 73449) 0x00007ffff6a948a5 in raise () from /lib64/libc.so.6 [...] (gdb) thread 11 [Switching to thread 11 (Thread 0x7ffff0056700 (LWP 73449))]#0 0x00007ffff6a948a5 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff6a948a5 in raise () from /lib64/libc.so.6 #1 0x00007ffff6a96085 in abort () from /lib64/libc.so.6 #2 0x00007ffff6ad1fe7 in __libc_message () from /lib64/libc.so.6 #3 0x00007ffff6ad7916 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007ffff6adb79f in _int_malloc () from /lib64/libc.so.6 #5 0x00007ffff6adbdd6 in calloc () from /lib64/libc.so.6 #6 0x000000000040e87f in my_calloc (re=0x7fff2867ef10, st=0, options=0x632020) at gmapper/../gmapper/../common/my-alloc.h:286 #7 read_get_hit_list_per_strand (re=0x7fff2867ef10, st=0, options=0x632020) at gmapper/mapping.c:1046 #8 0x000000000041308a in read_get_hit_list (re=<value optimized out>, options=0x632010, n_options=1) at gmapper/mapping.c:1239 #9 handle_read (re=<value optimized out>, options=0x632010, n_options=1) at gmapper/mapping.c:1806 #10 0x0000000000404f35 in launch_scan_threads (.omp_data_i=<value optimized out>) at gmapper/gmapper.c:557 #11 0x00007ffff7230502 in ?? () from /usr/lib64/libgomp.so.1 #12 0x00007ffff6dfc851 in start_thread () from /lib64/libpthread.so.0 #13 0x00007ffff6b4a11d in clone () from /lib64/libc.so.6 (gdb) f 6 #6 0x000000000040e87f in my_calloc (re=0x7fff2867ef10, st=0, options=0x632020) at gmapper/../gmapper/../common/my-alloc.h:286 286 res = calloc(size, 1); (gdb) p size $2 = 814080 (gdb) The function my_calloc() is just a wrapper, but the problem is not in there, as the real calloc() call looks legit. These are the limits set in the shell: $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2067285 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited The program is not out of memory, it's using 41GB on a machine with 256GB available: $ top -b -n 1 | grep gmapper 73437 user 20 0 41.5g 16g 15g T 0.0 6.6 55:17.24 gmapper-ls $ free -m total used free shared buffers cached Mem: 258437 195567 62869 0 82 189677 -/+ buffers/cache: 5807 252629 Swap: 0 0 0 I compiled using gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), with flags -g -O2 -DNDEBUG -mmmx -msse -msse2 -fopenmp -Wall -Wno-deprecated -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS.

Read the article

deleted......................

- by STingRaySC

deleted.............................

Read the article

Synchronisation construct inside pragma for

- by Sayan Ghosh

Hi, I have a program block like: for (iIndex1=0; iIndex1 < iSize; iIndex1++) { for (iIndex2=iIndex1+1; iIndex2 < iSize; iIndex2++) { iCount++; fDist =(*this)[iIndex1].distance( (*this)[iIndex2] ); m_oPDF.addPairDistance( fDist ); if ((bShowProgress) && (iCount % 1000000 == 0)) xyz_exception::ui()->progress( iCount, (size()-1)*((size()-1))/2 ); } } } } I have tried parallelising the inner and outer loop and by putting iCount in a critical region. What would be the best approach to parallelise this? If I wrap iCount with omp single or omp atomic then the code gives an error and I figured out that would be invalid inside omp for. I guess I am adding many extraneous stuffs to paralellise this. Need some advice... Thanks, Sayan

Read the article

Is there something like clock() that works better for parallel code?

- by Jared P

So I know that clock() measures clock cycles, and thus isn't very good for measuring time, and I know there are functions like omp_get_wtime() for getting the wall time, but it is frustrating for me that the wall time varies so much, and was wondering if there was some way to measure distinct clock cycles (only one cycle even if more than one thread executed in it). It has to be something relatively simple/native. Thanks

Read the article

Parallelism Patterns in Seismic Duck: Part 1

Applying parallel patterns to real programs Parallel computing - Programming - High-performance computing - Microsoft Visual Studio - OpenMP

Read the article

Are there deprecated practices for multithread and multiprocessor programming that I should no longer use?

- by DeveloperDon

In the early days of FORTRAN and BASIC, essentially all programs were written with GOTO statements. The result was spaghetti code and the solution was structured programming. Similarly, pointers can have difficult to control characteristics in our programs. C++ started with plenty of pointers, but use of references are recommended. Libraries like STL can reduce some of our dependency. There are also idioms to create smart pointers that have better characteristics, and some version of C++ permit references and managed code. Programming practices like inheritance and polymorphism use a lot of pointers behind the scenes (just as for, while, do structured programming generates code filled with branch instructions). Languages like Java eliminate pointers and use garbage collection to manage dynamically allocated data instead of depending on programmers to match all their new and delete statements. In my reading, I have seen examples of multi-process and multi-thread programming that don't seem to use semaphores. Do they use the same thing with different names or do they have new ways of structuring protection of resources from concurrent use? For example, a specific example of a system for multithread programming with multicore processors is OpenMP. It represents a critical region as follows, without the use of semaphores, which seem not to be included in the environment. th_id = omp_get_thread_num(); #pragma omp critical { cout << "Hello World from thread " << th_id << '\n'; } This example is an excerpt from: http://en.wikipedia.org/wiki/OpenMP Alternatively, similar protection of threads from each other using semaphores with functions wait() and signal() might look like this: wait(sem); th_id = get_thread_num(); cout << "Hello World from thread " << th_id << '\n'; signal(sem); In this example, things are pretty simple, and just a simple review is enough to show the wait() and signal() calls are matched and even with a lot of concurrency, thread safety is provided. But other algorithms are more complicated and use multiple semaphores (both binary and counting) spread across multiple functions with complex conditions that can be called by many threads. The consequences of creating deadlock or failing to make things thread safe can be hard to manage. Do these systems like OpenMP eliminate the problems with semaphores? Do they move the problem somewhere else? How do I transform my favorite semaphore using algorithm to not use semaphores anymore?

Read the article

Helping install mrcwa and solve problems with f2py in Ubuntu 14.04 LTS

- by user288160

I am sorry if this is the wrong section but I am starting to get desperate, please someone help me... I need to install the program mrcwa-20080820 (sourceforge.net/projects/mrcwa/) because a summer project that I am involved. I need to use it together with anaconda (store.continuum.io/cshop/anaconda/), I already installed Anaconda and apparently it is working. When I type: conda --version I got the expected answer. conda 3.5.2 If I tried to import numpy or scipy with python or simple type f2py there are no errors. So far so good. But when I tried to install this program sudo python setup.py install I got these errors: running install running build sh: 1: f2py: not found cp: cannot stat ‘mrcwaf.so’: No such file or directory running build_py running install_lib running install_egg_info Removing /usr/local/lib/python2.7/dist-packages/mrcwa-20080820.egg-info Writing /usr/local/lib/python2.7/dist-packages/mrcwa-20080820.egg-info Obs: I am trying to use intel fortran 64-bits and Ubuntu 14.04 LTS. So I was checking f2py and tried to execute the program hello world f2py -c -m hello hello.f from here: cens.ioc.ee/projects/f2py2e/index.html#usage and I had some problems too: running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building extension "hello" sources f2py options: [] f2py:> /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/hellomodule.c creating /tmp/tmpf8P4Y3/src.linux-x86_64-2.7 Reading fortran codes... Reading file 'hello.f' (format:fix,strict) Post-processing... Block: hello Block: foo Post-processing (stage 2)... Building modules... Building module "hello"... Constructing wrapper function "foo"... foo(a) Wrote C/API module "hello" to file "/tmp/tmpf8P4Y3/src.linux-x86_64-2.7 /hellomodule.c" adding '/tmp/tmpf8P4Y3/src.linux-x86_64-2.7/fortranobject.c' to sources. adding '/tmp/tmpf8P4Y3/src.linux-x86_64-2.7' to include_dirs. copying /home/felipe/.local/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.c -> /tmp/tmpf8P4Y3/src.linux-x86_64-2.7 copying /home/felipe/.local/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.h -> /tmp/tmpf8P4Y3/src.linux-x86_64-2.7 build_src: building npy-pkg config files running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext customize Gnu95FCompiler Could not locate executable gfortran Could not locate executable f95 customize IntelFCompiler Found executable /opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/ifort customize LaheyFCompiler Could not locate executable lf95 customize PGroupFCompiler Could not locate executable pgfortran customize AbsoftFCompiler Could not locate executable f90 Could not locate executable f77 customize NAGFCompiler customize VastFCompiler customize CompaqFCompiler Could not locate executable fort customize IntelItaniumFCompiler customize IntelEM64TFCompiler customize IntelEM64TFCompiler customize IntelEM64TFCompiler using build_ext building 'hello' extension compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC creating /tmp/tmpf8P4Y3/tmp creating /tmp/tmpf8P4Y3/tmp/tmpf8P4Y3 creating /tmp/tmpf8P4Y3/tmp/tmpf8P4Y3/src.linux-x86_64-2.7 compile options: '-I/tmp/tmpf8P4Y3/src.linux-x86_64-2.7 -I/home/felipe/.local/lib/python2.7/site-packages/numpy/core/include -I/home/felipe/anaconda/include/python2.7 -c' gcc: /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/hellomodule.c In file included from /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1761:0, from /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, from /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/fortranobject.h:13, from /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/hellomodule.c:17: /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by " \ ^ gcc: /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/fortranobject.c In file included from /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1761:0, from /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17, from /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4, from /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/fortranobject.h:13, from /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/fortranobject.c:2: /home/felipe/.local/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: #warning "Using deprecated NumPy API, disable it by " "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp] #warning "Using deprecated NumPy API, disable it by " \ ^ compiling Fortran sources Fortran f77 compiler: /opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/ifort -FI -fPIC -xhost -openmp -fp-model strict Fortran f90 compiler: /opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/ifort -FR -fPIC -xhost -openmp -fp-model strict Fortran fix compiler: /opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/ifort -FI -fPIC -xhost -openmp -fp-model strict compile options: '-I/tmp/tmpf8P4Y3/src.linux-x86_64-2.7 -I/home/felipe/.local /lib/python2.7/site-packages/numpy/core/include -I/home/felipe/anaconda/include/python2.7 -c' ifort:f77: hello.f /opt/intel/composer_xe_2013_sp1.3.174/bin/intel64/ifort -shared -shared -nofor_main /tmp/tmpf8P4Y3/tmp/tmpf8P4Y3/src.linux-x86_64-2.7/hellomodule.o /tmp/tmpf8P4Y3 /tmp/tmpf8P4Y3/src.linux-x86_64-2.7/fortranobject.o /tmp/tmpf8P4Y3/hello.o -L/home/felipe /anaconda/lib -lpython2.7 -o ./hello.so Removing build directory /tmp/tmpf8P4Y3 Please help me I am new in ubuntu and python. I really need this program, my advisor is waiting an answer. Thank you very much, Felipe Oliveira.

Read the article

Why aren't we programming on the GPU???

- by Chris

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the same algorithm on my GPU: 104.7 fps And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations: Serial implemetation on my home PC: 81.2 seconds All 4 CPU cores on my home PC (OpenMP): 24.5 seconds 32 processors on my school's super computer (MPI with master-worker): 1.92 seconds My home GPU (CUDA): 0.310 seconds 4 GPUs on my school's super computer (CUDA with static domain decomposition): 0.0547 seconds So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it. Also, what kinds of other speedups have you seen by offloading your computations to the GPU?

Read the article

Avoding multiple thread spawns in pthreads

- by madman

Hi StackOverflow, I have an application that is parallellized using pthreads. The application has a iterative routine call and a thread spawn within the rountine (pthread_create and pthread_join) to parallelize the computation intensive section in the routine. When I use an instrumenting tool like PIN to collect the statistics the tool reports statistics for several threads(no of threads x no of iterations). I beleive it is because it is spawning new set of threads each time the routine is called. How can I ensure that I create the thread only once and all successive calls use the threads that have been created first. When I do the same with OpenMP and then try to collect the statistics, I see that the threads are created only once. Is it beacause of the OpenMP runtime ? Thanks.

Read the article

Sortie de GCC 4.7 : pour ses 25 ans, le compilateur expérimente la gestion de la mémoire transactionnelle

Richard Stallman vient d'annoncer la sortie de gcc 4.7, qui coïncide avec l'anniversaire des 25 ans de ce célèbre compilateur. Cette nouvelle version propose de nombreuses nouveautés : - l'intégration (expérimentale) de la gestion de la mémoire transactionnelle - la prise en charge de nouvelles architectures (Haswell avec AVX2, Piledriver, ARM et Cortext-A7, SPARC, CR16, C6X, TILE-Gx et TILEPro) - l'amélioration de plusieurs langages et bibliothèques : C++11 (modèle de mémoire et atomics, initializer pour les données membres non-static, littérales définies par l'utilisateur, alias-declarations, delegating constructors, explicit override et syntaxe étendue de friend), C11, Fortran, OpenMP 3.1, amélioration du link-time optimization (LTO) - ...

Read the article

The way cores, processes, and threads work exactly?

- by unknownthreat

I need a bit of an advice for understanding how this whole procedure work exactly. If I am incorrect in any part described below, please correct me. In a single core CPU, it runs each process in the OS, jumping around from one process to another to utilize the best of itself. A process can also have many threads, in which the CPU core runs through these threads when it is running on the respective process. Now, on a multiple core CPU, Do the cores run in every process together, or can the cores run separately in different processes at one particular point of time? For instance, you have program A running two threads, can a duo core CPU run both threads of this program? I think the answer should be yes if we are using something like OpenMP. But while the cores are running in this OpenMP-embedded process, can one of the core simply switch to other process? For programs that are created for single core, when running at 100%, why the CPU utilization of each core are distributed? (ex. A duo core CPU of 80% and 20%. The utilization percentage of all cores always add up to 100% for this case.) Do the cores try help each other run each thread of each process in some ways? Frankly, I'm not sure how this works exactly. Any advice is appreciated.

Read the article

Unable to locate essential development tools Ubuntu 11.04

- by Anita 7

I'm using Ubuntu 11.04 (VMware). I aim to implement OpenMP. Im using gcc 4.5 compiler. I tried to install it by using the command sudo apt-get install gcc 4.5. Afterwards I proceed with gcc -fopenmp foo.c BUT the output was: gcc: foo.c: No such file or directory gcc: no input files –. Now I tried to install the package by using : ubuntu@ubuntu:~$ sudo apt-get install essential Reading package lists... Done Building dependency tree Reading state information... Done E: Unable to locate package essential. I also tried apt-cache search essential and after that sudo apt-get install essential-dev But the same error again, E: Unable to locate package essential-dev Any solution,please? Do I need to download any package? What should I do? Thank you in advance :))

Search Results

Search found 73 results on 3 pages for 'openmp'.

Page 2/3 | < Previous Page | 1 2 3 | Next Page >

- by Cem Karan

- by unknownthreat

- by unknownthreat

- by was

- by J.F. Sebastian

- by user1504257

- by GoldenLee

- by Jacob

- by Jason

- by riham

- by Jared P

- by soneangel

- by Maciej Piechotka

- by mmd

- by STingRaySC

- by Sayan Ghosh

- by Jared P

- by DeveloperDon

- by user288160

- by Chris

- by madman

- by unknownthreat

- by Anita 7

< Previous Page | 1 2 3 | Next Page >