parallel foreach - Page 31

MPIexec.exe Access denide

- by shake

I have installed microsoft compute cluster and MPI.net, now i have trouble to run program using mpiexec.exe on cluster. When i try to run it on console i get message: "Access Denied", and pop up: "mpiexec.exe is not valid win32 application". I tried google it, but found nothing. Pls help. :)

Read the article

.net 4.0 concurrent queue dictionary

- by freddy smith

I would like to use the new concurrent collections in .NET 4.0 to solve the following problem. The basic data structure I want to have is a producer consumer queue, there will be a single consumer and multiple producers. There are items of type A,B,C,D,E that will be added to this queue. Items of type A,B,C are added to the queue in the normal manner and processed in order. However items of type D or E can only exist in the queue zero or once. If one of these is to be added and there already exists another of the same type that has not yet been processed then this should update that other one in-place in the queue. The queue position would not change (i.e. would not go to the back of the queue) after the update. Which .NET 4.0 classes would be best for this?

Read the article

I don't understand how work call_once

- by SABROG

Please help me understand how work call_once Here is thread-safe code. I don't understand why this need Thread Local Storage and global_epoch variables. Variable _fast_pthread_once_per_thread_epoch can be changed to constant/enum like {FAST_PTHREAD_ONCE_INIT, BEING_INITIALIZED, FINISH_INITIALIZED}. Why needed count calls in global_epoch? I think this code can be rewriting with logc: if flag FINISH_INITIALIZED do nothing, else go to block with mutexes and this all. #ifndef FAST_PTHREAD_ONCE_H #define FAST_PTHREAD_ONCE_H #include #include typedef sig_atomic_t fast_pthread_once_t; #define FAST_PTHREAD_ONCE_INIT SIG_ATOMIC_MAX extern __thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch; #ifdef __cplusplus extern "C" { #endif extern void fast_pthread_once( pthread_once_t *once, void (*func)(void) ); inline static void fast_pthread_once_inline( fast_pthread_once_t *once, void (*func)(void) ) { fast_pthread_once_t x = *once; /* unprotected access */ if ( x _fast_pthread_once_per_thread_epoch ) { fast_pthread_once( once, func ); } } #ifdef __cplusplus } #endif #endif FAST_PTHREAD_ONCE_H Source fast_pthread_once.c The source is written in C. The lines of the primary function are numbered for reference in the subsequent correctness argument. #include "fast_pthread_once.h" #include static pthread_mutex_t mu = PTHREAD_MUTEX_INITIALIZER; /* protects global_epoch and all fast_pthread_once_t writes */ static pthread_cond_t cv = PTHREAD_COND_INITIALIZER; /* signalled whenever a fast_pthread_once_t is finalized */ #define BEING_INITIALIZED (FAST_PTHREAD_ONCE_INIT - 1) static fast_pthread_once_t global_epoch = 0; /* under mu */ __thread fast_pthread_once_t _fast_pthread_once_per_thread_epoch; static void check( int x ) { if ( x == 0 ) abort(); } void fast_pthread_once( fast_pthread_once_t *once, void (*func)(void) ) { /*01*/ fast_pthread_once_t x = *once; /* unprotected access */ /*02*/ if ( x _fast_pthread_once_per_thread_epoch ) { /*03*/ check( pthread_mutex_lock(µ) == 0 ); /*04*/ if ( *once == FAST_PTHREAD_ONCE_INIT ) { /*05*/ *once = BEING_INITIALIZED; /*06*/ check( pthread_mutex_unlock(µ) == 0 ); /*07*/ (*func)(); /*08*/ check( pthread_mutex_lock(µ) == 0 ); /*09*/ global_epoch++; /*10*/ *once = global_epoch; /*11*/ check( pthread_cond_broadcast(&cv;) == 0 ); /*12*/ } else { /*13*/ while ( *once == BEING_INITIALIZED ) { /*14*/ check( pthread_cond_wait(&cv;, µ) == 0 ); /*15*/ } /*16*/ } /*17*/ _fast_pthread_once_per_thread_epoch = global_epoch; /*18*/ check (pthread_mutex_unlock(µ) == 0); } } This code from BOOST: #ifndef BOOST_THREAD_PTHREAD_ONCE_HPP #define BOOST_THREAD_PTHREAD_ONCE_HPP // once.hpp // // (C) Copyright 2007-8 Anthony Williams // // Distributed under the Boost Software License, Version 1.0. (See // accompanying file LICENSE_1_0.txt or copy at // http://www.boost.org/LICENSE_1_0.txt) #include #include #include #include "pthread_mutex_scoped_lock.hpp" #include #include #include namespace boost { struct once_flag { boost::uintmax_t epoch; }; namespace detail { BOOST_THREAD_DECL boost::uintmax_t& get_once_per_thread_epoch(); BOOST_THREAD_DECL extern boost::uintmax_t once_global_epoch; BOOST_THREAD_DECL extern pthread_mutex_t once_epoch_mutex; BOOST_THREAD_DECL extern pthread_cond_t once_epoch_cv; } #define BOOST_ONCE_INITIAL_FLAG_VALUE 0 #define BOOST_ONCE_INIT {BOOST_ONCE_INITIAL_FLAG_VALUE} // Based on Mike Burrows fast_pthread_once algorithm as described in // http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2444.html template void call_once(once_flag& flag,Function f) { static boost::uintmax_t const uninitialized_flag=BOOST_ONCE_INITIAL_FLAG_VALUE; static boost::uintmax_t const being_initialized=uninitialized_flag+1; boost::uintmax_t const epoch=flag.epoch; boost::uintmax_t& this_thread_epoch=detail::get_once_per_thread_epoch(); if(epoch #endif I right understand, boost don't use atomic operation, so code from boost not thread-safe?

Read the article

Parallelism in Python

- by fmark

What are the options for achieving parallelism in Python? I want to perform a bunch of CPU bound calculations over some very large rasters, and would like to parallelise them. Coming from a C background, I am familiar with three approaches to parallelism: Message passing processes, possibly distributed across a cluster, e.g. MPI. Explicit shared memory parallelism, either using pthreads or fork(), pipe(), et. al Implicit shared memory parallelism, using OpenMP. Deciding on an approach to use is an exercise in trade-offs. In Python, what approaches are available and what are their characteristics? Is there a clusterable MPI clone? What are the preferred ways of achieving shared memory parallelism? I have heard reference to problems with the GIL, as well as references to tasklets. In short, what do I need to know about the different parallelization strategies in Python before choosing between them?

Read the article

How does Batcher Merge work at a high level?

- by Mike

I'm trying to grasp the concept of a Batcher Sort. However, most resources I've found online focus on proof entirely or on low-level pseudocode. Before I look at proofs, I'd like to understand how Batcher Sort works. Can someone give a high level overview of how Batcher Sort works(particularly the merge) without overly verbose pseudocode(I want to get the idea behind the Batcher Sort, not implement it)? Thanks!

Read the article

Python: Plot some data (matplotlib) without GIL

- by BandGap

Hello all, my problem is the GIL of course. While I'm analysing data it would be nice to present some plots in between (so it's not too boring waiting for results) But the GIL prevents this (and this is bringing me to the point of asking myself if Python was such a good idea in the first place). I can only display the plot, wait till the user closes it and commence calculations after that. A waste of time obviously. I already tried the subprocess and multiprocessing modules but can't seem to get them to work. Any thoughts on this one? Thanks

Read the article

programmatically controlling power sockets in the UK

- by cartoonfox

It's very simple. I want to plug a lamp into the UK mains supply. I want to be able to power it on and off from software - say from serial port commands, or by running a command-line or something I can get to from ruby or Java. I see lots written about how to do this with X10 with American power systems - but has anybody actually tried doing this in the UK? If you got this working: 1) Exactly what hardware did you use? 2) How do you control it from software? Thanks!

Read the article

parallelizing code using openmp

- by anubhav

Hi, The function below contains nested for loops. There are 3 of them. I have given the whole function below for easy understanding. I want to parallelize the code in the innermost for loop as it takes maximum CPU time. Then i can think about outer 2 for loops. I can see dependencies and internal inline functions in the innermost for loop . Can the innermost for loop be rewritten to enable parallelization using openmp pragmas. Please tell how. I am writing just the loop which i am interested in first and then the full function where this loop exists for referance. Interested in parallelizing the loop mentioned below. //* LOOP WHICH I WANT TO PARALLELIZE *// for (y = 0; y < 4; y++) { refptr = PelYline_11 (ref_pic, abs_y++, abs_x, img_height, img_width); LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; } The full function where this loop exists is below for referance. /*! *********************************************************************** * \brief * Setup the fast search for an macroblock *********************************************************************** */ void SetupFastFullPelSearch (short ref, int list) // <-- reference frame parameter, list0 or 1 { short pmv[2]; pel_t orig_blocks[256], *orgptr=orig_blocks, *refptr, *tem; // created pointer tem int offset_x, offset_y, x, y, range_partly_outside, ref_x, ref_y, pos, abs_x, abs_y, bindex, blky; int LineSadBlk0, LineSadBlk1, LineSadBlk2, LineSadBlk3; int max_width, max_height; int img_width, img_height; StorablePicture *ref_picture; pel_t *ref_pic; int** block_sad = BlockSAD[list][ref][7]; int search_range = max_search_range[list][ref]; int max_pos = (2*search_range+1) * (2*search_range+1); int list_offset = ((img->MbaffFrameFlag)&&(img->mb_data[img->current_mb_nr].mb_field))? img->current_mb_nr%2 ? 4 : 2 : 0; int apply_weights = ( (active_pps->weighted_pred_flag && (img->type == P_SLICE || img->type == SP_SLICE)) || (active_pps->weighted_bipred_idc && (img->type == B_SLICE))); ref_picture = listX[list+list_offset][ref]; //===== Use weighted Reference for ME ==== if (apply_weights && input->UseWeightedReferenceME) ref_pic = ref_picture->imgY_11_w; else ref_pic = ref_picture->imgY_11; max_width = ref_picture->size_x - 17; max_height = ref_picture->size_y - 17; img_width = ref_picture->size_x; img_height = ref_picture->size_y; //===== get search center: predictor of 16x16 block ===== SetMotionVectorPredictor (pmv, enc_picture->ref_idx, enc_picture->mv, ref, list, 0, 0, 16, 16); search_center_x[list][ref] = pmv[0] / 4; search_center_y[list][ref] = pmv[1] / 4; if (!input->rdopt) { //--- correct center so that (0,0) vector is inside --- search_center_x[list][ref] = max(-search_range, min(search_range, search_center_x[list][ref])); search_center_y[list][ref] = max(-search_range, min(search_range, search_center_y[list][ref])); } search_center_x[list][ref] += img->opix_x; search_center_y[list][ref] += img->opix_y; offset_x = search_center_x[list][ref]; offset_y = search_center_y[list][ref]; //===== copy original block for fast access ===== for (y = img->opix_y; y < img->opix_y+16; y++) for (x = img->opix_x; x < img->opix_x+16; x++) *orgptr++ = imgY_org [y][x]; //===== check if whole search range is inside image ===== if (offset_x >= search_range && offset_x <= max_width - search_range && offset_y >= search_range && offset_y <= max_height - search_range ) { range_partly_outside = 0; PelYline_11 = FastLine16Y_11; } else { range_partly_outside = 1; } //===== determine position of (0,0)-vector ===== if (!input->rdopt) { ref_x = img->opix_x - offset_x; ref_y = img->opix_y - offset_y; for (pos = 0; pos < max_pos; pos++) { if (ref_x == spiral_search_x[pos] && ref_y == spiral_search_y[pos]) { pos_00[list][ref] = pos; break; } } } //===== loop over search range (spiral search): get blockwise SAD ===== **// =====THIS IS THE PART WHERE NESTED FOR STARTS=====** for (pos = 0; pos < max_pos; pos++) // OUTERMOST FOR LOOP { abs_y = offset_y + spiral_search_y[pos]; abs_x = offset_x + spiral_search_x[pos]; if (range_partly_outside) { if (abs_y >= 0 && abs_y <= max_height && abs_x >= 0 && abs_x <= max_width ) { PelYline_11 = FastLine16Y_11; } else { PelYline_11 = UMVLine16Y_11; } } orgptr = orig_blocks; bindex = 0; for (blky = 0; blky < 4; blky++) // SECOND FOR LOOP { LineSadBlk0 = LineSadBlk1 = LineSadBlk2 = LineSadBlk3 = 0; for (y = 0; y < 4; y++) //INNERMOST FOR LOOP WHICH I WANT TO PARALLELIZE { refptr = PelYline_11 (ref_pic, abs_y++, abs_x, img_height, img_width); LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk0 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk1 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk2 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; LineSadBlk3 += byte_abs [*refptr++ - *orgptr++]; } block_sad[bindex++][pos] = LineSadBlk0; block_sad[bindex++][pos] = LineSadBlk1; block_sad[bindex++][pos] = LineSadBlk2; block_sad[bindex++][pos] = LineSadBlk3; } } //===== combine SAD's for larger block types ===== SetupLargerBlocks (list, ref, max_pos); //===== set flag marking that search setup have been done ===== search_setup_done[list][ref] = 1; } #endif // _FAST_FULL_ME_

Read the article

How to generate makefile targets from variables?

- by Ketil

I currently have a makefile to process some data. The makefile gets the inputs to the data processing by sourcing a CONFIG file, which defines the input data in a variable. Currently, I symlink the input files to a local directory, i.e. the makefile contains: tmp/%.txt: tmp ln -fs $(shell echo $(INPUTS) | tr ' ' '\n' | grep $(patsubst tmp/%,%,$@)) $@ This is not terribly elegant, but appears to work. Is there a better way? Basically, given INPUTS = /foo/bar.txt /zot/snarf.txt I would like to be able to have e.g. %.out: %.txt some command As well as targets to merge results depending on all $(INPUT) files. Also, apart from the kludgosity, the makefile doesn't work correctly with -j, something that is crucial for the analysis to complete in reasonable time. I guess that's a bug in GNU make, but any hints welcome.

Read the article

Is there an existing solution to the multithreaded data structure problem?

- by thr

I've had the need for a multi-threaded data structure that supports these claims: Allows multiple concurrent readers and writers Is sorted Is easy to reason about Fulfilling multiple readers and one writer is a lot easier, but I really would wan't to allow multiple writers. I've been doing research into this area, and I'm aware of ConcurrentSkipList (by Lea based on work by Fraser and Harris) as it's implemented in Java SE 6. I've also implemented my own version of a concurrent Skip List based on A Provably Correct Scalable Concurrent Skip List by Herlihy, Lev, Luchangco and Shavit. These two implementations are developed by people that are light years smarter then me, but I still (somewhat ashamed, because it is amazing work) have to ask the question if these are the two only viable implementations of a concurrent multi reader/writer data structures available today?

Read the article

Generate and merge data with python multiprocessing

- by Bobby

I have a list of starting data. I want to apply a function to the starting data that creates a few pieces of new data for each element in the starting data. Some pieces of the new data are the same and I want to remove them. The sequential version is essentially: def create_new_data_for(datum): """make a list of new data from some old datum""" return [datum.modified_copy(k) for k in datum.k_list] data = [some list of data] #some data to start with #generate a list of new data from the old data, we'll reduce it next newdata = [] for d in data: newdata.extend(create_new_data_for(d)) #now reduce the data under ".matches(other)" reduced = [] for d in newdata: for seen in reduced: if d.matches(seen): break #so we haven't seen anything like d yet seen.append(d) #now reduced is finished and is what we want! I want to speed this up with multiprocessing. I was thinking that I could use a multiprocessing.Queue for the generation. Each process would just put the stuff it creates on, and when the processes are reducing the data, they can just get the data from the Queue. But I'm not sure how to have the different process loop over reduced and modify it without any race conditions or other issues. What is the best way to do this safely? or is there a different way to accomplish this goal better?

Read the article

How do I retrieve the number of processors on C / Linux ?

- by Andrei Ciobanu

I am writing a small C application that use some threads for processing data. I want to be able to know the number of processors on a certain machine, without using system() & in combination to a small script. The only way i can think of is to parse /proc/cpuinfo. Any other useful suggestions ?

Read the article

How to Force an Exception from a Task to be Observed in a Continuation Task?

- by Richard

I have a task to perform an HttpWebRequest using Task<WebResponse>.Factory.FromAsync(req.BeginGetRespone, req.EndGetResponse) which can obviously fail with a WebException. To the caller I want to return a Task<HttpResult> where HttpResult is a helper type to encapsulate the response (or not). In this case a 4xx or 5xx response is not an exception. Therefore I've attached two continuations to the request task. One with TaskContinuationOptions OnlyOnRanToCompletion and the other with OnlyOnOnFaulted. And then wrapped the whole thing in a Task<HttpResult> to pick up the one result whichever continuation completes. Each of the three child tasks (request plus two continuations) is created with the AttachedToParent option. But when the caller waits on the returned outer task, an AggregateException is thrown is the request failed. I want to, in the on faulted continuation, observe the WebException so the client code can just look at the result. Adding a Wait in the on fault continuation throws, but a try-catch around this doesn't help. Nor does looking at the Exception property (as section "Observing Exceptions By Using the Task.Exception Property" hints here). I could install a UnobservedTaskException event handler to filter, but as the event offers no direct link to the faulted task this will likely interact outside this part of the application and is a case of a sledgehammer to crack a nut. Given an instance of a faulted Task<T> is there any means of flagging it as "fault handled"? Simplified code: public static Task<HttpResult> Start(Uri url) { var webReq = BuildHttpWebRequest(url); var result = new HttpResult(); var taskOuter = Task<HttpResult>.Factory.StartNew(() => { var tRequest = Task<WebResponse>.Factory.FromAsync( webReq.BeginGetResponse, webReq.EndGetResponse, null, TaskCreationOptions.AttachedToParent); var tError = tRequest.ContinueWith<HttpResult>( t => HandleWebRequestError(t, result), TaskContinuationOptions.AttachedToParent |TaskContinuationOptions.OnlyOnFaulted); var tSuccess = tRequest.ContinueWith<HttpResult>( t => HandleWebRequestSuccess(t, result), TaskContinuationOptions.AttachedToParent |TaskContinuationOptions.OnlyOnRanToCompletion); return result; }); return taskOuter; } with: private static HttpDownloaderResult HandleWebRequestError( Task<WebResponse> respTask, HttpResult result) { Debug.Assert(respTask.Status == TaskStatus.Faulted); Debug.Assert(respTask.Exception.InnerException is WebException); // Try and observe the fault: Doesn't help. try { respTask.Wait(); } catch (AggregateException e) { Log("HandleWebRequestError: waiting on antecedent task threw inner: " + e.InnerException.Message); } // ... populate result with details of the failure for the client ... return result; } (HandleWebRequestSuccess will eventually spin off further tasks to get the content of the response...) The client should be able to wait on the task and then look at its result, without it throwing due to a fault that is expected and already handled.

Read the article

LinqToSql - Parralel - DataContext and Parralel

- by Gregoire

In .NET 4 and multicore environment, does the linq to sql datacontext object take advantage of the new parallels if we use DataLoadOptions.LoadWith?

Read the article

How to do parrallel processing in Unix Shell script?

- by Bikram Agarwal

I have a shell script that transfers a build.xml file to a remote unix machine (devrsp02) and executes the ANT task wldeploy on that machine (devrsp02). Now, this wldeploy task takes around 15 minutes to complete and while this is running, the last line at the unix console is - "task {some digit} initialized". Once this task is complete, we get a "task Completed" msg and the next task in the script is executed only after that. But sometimes, there might be a problem with the weblogic domain and the deployment might be failing internally, with no effect on the status of the wldeploy task. The unix console will still be stuck at "task {some digit} initialized". The error of the deployment will be getting logged in a file called output.a So, what I want now is - Start a time counter before running wldeploy. If the wldeploy runs for more than 15 minutes, the following command should be run - tail -f output.a ## without terminating the wldeploy or cat output.a ## after terminating the wldeploy forcefully Point to be noted here is - I can't run the wldeploy task in background, as in that case the user won't get to know when the task is complete, which is crucial for this script. Could you please suggest anything to achieve this?

Read the article

Minimal "Task Queue" with stock Linux tools to leverage Multicore CPU

- by Manuel

What is the best/easiest way to build a minimal task queue system for Linux using bash and common tools? I have a file with 9'000 lines, each line has a bash command line, the commands are completely independent. command 1 > Logs/1.log command 2 > Logs/2.log command 3 > Logs/3.log ... My box has more than one core and I want to execute X tasks at the same time. I searched the web for a good way to do this. Apparently, a lot of people have this problem but nobody has a good solution so far. It would be nice if the solution had the following features: can interpret more than one command (e.g. command; command) can interpret stream redirects on the lines (e.g. ls > /tmp/ls.txt) only uses common Linux tools Bonus points if it works on other Unix-clones without too exotic requirements.

Read the article

What is the "task" in twitter Storm parallelism

- by John Wang

I'm trying to learn twitter storm by following the great article "Understanding the parallelism of a Storm topology" However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ? Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ? I think having multiple tasks in a thread, since a thread executes sequentially, only make the thread a kind of "cached" resource, which avoids spawning new thread for next task run. Am I correct? I may clear those confusion by myself after taking more time to investigate, but you know, we both love stackoverflow ;-) Thanks in advance.

Read the article

Multicore programming: the hard parts

- by Jon Harrop

I'm writing a book on multicore programming using .NET 4 and I'm curious to know what parts of multicore programming people have found difficult to grok or anticipate being difficult to grok?

Read the article

MPI4Py Scatter sendbuf Argument Type?

- by Noel

I'm having trouble with the Scatter function in the MPI4Py Python module. My assumption is that I should be able to pass it a single list for the sendbuffer. However, I'm getting a consistent error message when I do that, or indeed add the other two arguments, recvbuf and root: File "code/step3.py", line 682, in subbox_grid i = mpi_communicator.Scatter(station_range, station_data) File "Comm.pyx", line 427, in mpi4py.MPI.Comm.Scatter (src/ mpi4py_MPI.c:44993) File "message.pxi", line 321, in mpi4py.MPI._p_msg_cco.for_scatter (src/mpi4py_MPI.c:14497) File "message.pxi", line 232, in mpi4py.MPI._p_msg_cco.for_cco_send (src/mpi4py_MPI.c:13630) File "message.pxi", line 36, in mpi4py.MPI.message_simple (src/ mpi4py_MPI.c:11904) ValueError: message: expecting 2 or 3 items Here is the relevant code snipped, starting a few lines above 682 mentioned above. for station in stations #snip--do some stuff with station station_data = [] station_range = range(1,len(station)) mpi_communicator = MPI.COMM_WORLD i = mpi_communicator.Scatter(station_range, nsm) #snip--do some stuff with station[i] nsm = combine(avg, wt, dnew, nf1, nl1, wti[i], wtm, station[i].id) station_data = mpi_communicator.Gather(station_range, nsm) I've tried a number of combinations initializing station_range, but I must not be understanding the Scatter argument types properly. Does a Python/MPI guru have a clarification this?

Read the article

python multiprocess update dictionary synchronously

- by user1050325

I am trying to update one common dictionary through multiple processes. Could you please help me find out what is the problem with this code? I get the following output: inside function {1: 1, 2: -1} comes here inside function {1: 0, 2: 2} comes here {1: 0, 2: -1} Thanks. from multiprocessing import Lock, Process, Manager l= Lock() def computeCopyNum(test,val): l.acquire() test[val]=val print "inside function" print test l.release() return a=dict({1: 0, 2: -1}) procs=list() for i in range(1,3): p = Process(target=computeCopyNum, args=(a,i)) procs.append(p) p.start() for p in procs: p.join() print "comes here" print a

Read the article

simple process rollback question

- by OckhamsRazor

hi folks! while revising for an exam, i came across this simple question asking about rollbacks in processes. i understand how rollbacks occur, but i need some validation on my answer. The question: my confusion results from the fact that there is interprocess communication between the processes. does that change anything in terms of where to rollback? my answer would be R13, R23, R32 and R43. any help is greatly appreciated! thanks!

Read the article

How do you save the state of a computation in while using SNOW (or Multicore or ...)

- by James

From hard experience I've found it useful to occasionally save the state of my long computations to disk to start them up later if something fails. Can I do this in a distributed computation package in R (like SNOW or multicore)? It does not seem clear how this could be done since the master is collecting things from the slaves in a non-transparent way.

Read the article

Yet another C# Deadlock Debugging Question

- by Roo

Hi All, I have a multi-threaded application build in C# using VS2010 Professional. It's quite a large application and we've experienced the classing GUI cross-threading and deadlock issues before, but in the past month we've noticed the appears to lock up when left idle for around 20-30 minutes. The application is irresponsive and although it will repaint itself when other windows are dragged in front of the application and over it, the GUI still appears to be locked... interstingly (unlike if the GUI thread is being used for a considerable amount of time) the Close, Maximise and minimise buttons are also irresponsive and when clicked the little (Not Responding...) text is not displayed in the title of the application i.e. Windows still seems to think it's running fine. If I break/pause the application using the debugger, and view the threads that are running. There are 3 threads of our managed code that are running, and a few other worker threads whom the source code cannot be displayed for. The 3 threads that run are: The main/GUI thread A thread that loops indefinitely A thread that loops indefinitely If I step into threads 2 and 3, they appear to be looping correctly. They do not share locks (even with the main GUI thread) and they are not using the GUI thread at all. When stepping into the main/GUI thread however, it's broken on Application.Run... This problem screams deadlock to me, but what I don't understand is if it's deadlock, why can't I see the line of code the main/GUI thread is hanging on? Any help will be greatly appreciated! Let me know if you need more information... Cheers, Roo -----------------------------------------------------SOLUTION-------------------------------------------------- Okay, so the problem is now solved. Thanks to everyone for their suggestions! Much appreciated! I've marked the answer that solved my initial problem of determining where on the main/UI thread the application hangs (I handn't turned off the "Enable Just My Code" option). The overall issue I was experiencing was indeed Deadlock, however. After obtaining the call-stack and popping the top half of it into Google I came across this which explains exactly what I was experiencing... http://timl.net/ This references a lovely guide to debugging the issue... http://www.aaronlerch.com/blog/2008/12/15/debugging-ui/ This identified a control I was constructing off the GUI thread. I did know this, however, and was marshalling calls correctly, but what I didn't realise was that behind the scenes this Control was subscribing to an event or set of events that are triggered when e.g. a Windows session is unlocked or the screensaver exits. These calls are always made on the main/UI thread and were blocking when it saw the call was made on the incorrect thread. Kim explains in more detail here... http://krgreenlee.blogspot.com/2007/09/onuserpreferencechanged-hang.html In the end I found an alternative solution which did not require this Control off the main/UI thread. That appears to have solved the problem and the application no longer hangs. I hope this helps anyone who's confronted by a similar problem. Thanks again to everyone on here who helped! (and indirectly, the delightful bloggers I've referenced above!) Roo -----------------------------------------------------SOLUTION II-------------------------------------------------- Aren't threading issues delightful...you think you've solved it, and a month down the line it pops back up again. I still believe the solution above resolved an issue that would cause simillar behaviour, but we encountered the problem again. As we spent a while debugging this, I thought I'd update this question with our (hopefully) final solution: The problem appears to have been a bug in the Infragistics components in the WinForms 2010.1 release (no hot fixes). We had been running from around the time the freeze issue appeared (but had also added a bunch of other stuff too). After upgrading to WinForms 2010.3, we've yet to reproduce the issue (deja vu). See my question here for a bit more information: 'http://stackoverflow.com/questions/4077822/net-4-0-and-the-dreaded-onuserpreferencechanged-hang'. Hans has given a nice summary of the general issue. I hope this adds a little to the suggestions/information surrounding the nutorious OnUserPreferenceChanged Hang (or whatever you'd like to call it). Cheers, Roo

Read the article

What is the best MPI implementation

- by pvsnp

I have to implement MPI system in a cluster. If anyone here has any experience with MPI (MPICH/OpenMPI), I'd like to know which is better and how the performance can be boosted on a cluster of x86_64 boxes.

Read the article

How to synchronize cuda threads when they are in the same loop and we need to synchronize them to ex

- by Vickey

Hi all, I have written a code and Now I want to implement this on cuda GPU but I'm new to synchronization so please help me with this, It's little urgent to me. Below I'm presenting the code and I want to that LOOP1 to be executed by all threads (heance I want to this portion to take advantage of cuda and the remaining portion (the portion other from the LOOP1) is to be executed by only a single thread. do{ point_set = master_Q[(*num_mas) - 1].q; List* temp = point_set; List* pa = point_set; if(master_Q[num_mas[0] - 1].max) max_level = (int) (ceilf(il2 * log(master_Q[num_mas[0] - 1].max))); *num_mas = (*num_mas) - 1; while(point_set){ List* insert_ele = temp; while(temp){ insert_ele = temp; if((insert_ele->dist[insert_ele->dist_index-1] <= pow(2, max_level-1)) || (top_level == max_level)){ if(point_set == temp){ point_set = temp->next; pa = temp->next; } else{ pa->next = temp->next; } temp = NULL; List* new_point_set = point_set; float maximum_dist = 0; if(parent->p_index != insert_ele->point_index){ List* tmp = new_point_set; float *b = &(data[(insert_ele->point_index)*point_len]); **LOOP 1:** while(tmp){ float *c = &(data[(tmp->point_index)*point_len]); float sum = 0.; for(int j = 0; j < point_len; j+=2){ float d1 = b[j] - c[j]; float d2 = b[j+1] - c[j+1]; d1 *= d1; d2 *= d2; sum = sum + d1 + d2; } tmp->dist[tmp->dist_index] = sqrt(sum); if(maximum_dist < tmp->dist[tmp->dist_index]) maximum_dist = tmp->dist[tmp->dist_index]; tmp->dist_index = tmp->dist_index+1; tmp = tmp->next; } max_distance = maximum_dist; } while(new_point_set || insert_ele){ List* far, *par, *tmp, *tmp_new; far = NULL; tmp = new_point_set; tmp_new = NULL; float level_dist = pow(2, max_level-1); float maxdist = 0, maxp = 0; while(tmp){ if(tmp->dist[(tmp->dist_index)-1] > level_dist){ if(maxdist < tmp->dist[tmp->dist_index-1]) maxdist = tmp->dist[tmp->dist_index-1]; if(tmp == new_point_set){ new_point_set = tmp->next; par = tmp->next; } else{ par->next = tmp->next; } if(far == NULL){ far = tmp; tmp_new = far; } else{ tmp_new->next = tmp; tmp_new = tmp; } if(parent->p_index != insert_ele->point_index) tmp->dist_index = tmp->dist_index - 1; tmp = tmp->next; tmp_new->next = NULL; } else{ par = tmp; if(maxp < tmp->dist[(tmp->dist_index)-1]) maxp = tmp->dist[(tmp->dist_index)-1]; tmp = tmp->next; } } if(0 == maxp){ tmp = new_point_set; aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level--; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; while(tmp){ aloc_mem[*tree_index].p_index = tmp->point_index; aloc_mem[(*tree_index)].no_child = 0; aloc_mem[(*tree_index)].level = master_Q[(*cur_count_Q)-1].level; parent->children_index[parent->no_child] = *tree_index; parent->no_child = parent->no_child + 1; (*tree_index)++; tmp = tmp->next; } cur_count_Q[0] = cur_count_Q[0]-1; new_point_set = NULL; } master_Q[*num_mas].q = far; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].max = maxdist; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; if(0 != maxp){ aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; if(maxp){ int new_level = ((int) (ceilf(il2 * log(maxp)))) +1; if (new_level < (max_level-1)) max_level = new_level; else max_level--; } else max_level--; } if( 0 == maxp ) insert_ele = NULL; } } else{ if(NULL == temp->next){ master_Q[*num_mas].q = point_set; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; } pa = temp; temp = temp->next; } } if((*num_mas) > 1){ List *temp2 = master_Q[(*num_mas)-1].q; while(temp2){ List* temp3 = master_Q[(*num_mas)-2].q; master_Q[(*num_mas)-2].q = temp2; if((master_Q[(*num_mas)-1].parent)->p_index != (master_Q[(*num_mas)-2].parent)->p_index){ temp2->dist_index = temp2->dist_index - 1; } temp2 = temp2->next; master_Q[(*num_mas)-2].q->next = temp3; } num_mas[0] = num_mas[0]-1; } point_set = master_Q[(*num_mas)-1].q; temp = point_set; pa = point_set; parent = master_Q[(*num_mas)-1].parent; max_level = master_Q[(*num_mas)-1].level; if(master_Q[(*num_mas)-1].max) if( max_level > ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1) max_level = ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1; num_mas[0] = num_mas[0]-1; } }while(*num_mas > 0);

Search Results

Search found 6559 results on 263 pages for 'parallel foreach'.

Page 31/263 | < Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38 | Next Page >

- by shake

- by freddy smith

- by SABROG

- by fmark

- by Mike

- by BandGap

- by cartoonfox

- by anubhav

- by Ketil

- by thr

- by Bobby

- by Andrei Ciobanu

- by Richard

- by Gregoire

- by Bikram Agarwal

- by Manuel

- by John Wang

- by Jon Harrop

- by Noel

- by user1050325

- by OckhamsRazor

- by James

- by Roo

- by pvsnp

- by Vickey

< Previous Page | 27 28 29 30 31 32 33 34 35 36 37 38 | Next Page >