parallel for - Page 19 - Developer IT

How to generate makefile targets from variables?

- by Ketil

I currently have a makefile to process some data. The makefile gets the inputs to the data processing by sourcing a CONFIG file, which defines the input data in a variable. Currently, I symlink the input files to a local directory, i.e. the makefile contains: tmp/%.txt: tmp ln -fs $(shell echo $(INPUTS) | tr ' ' '\n' | grep $(patsubst tmp/%,%,$@)) $@ This is not terribly elegant, but appears to work. Is there a better way? Basically, given INPUTS = /foo/bar.txt /zot/snarf.txt I would like to be able to have e.g. %.out: %.txt some command As well as targets to merge results depending on all $(INPUT) files. Also, apart from the kludgosity, the makefile doesn't work correctly with -j, something that is crucial for the analysis to complete in reasonable time. I guess that's a bug in GNU make, but any hints welcome.

Read the article

Generate and merge data with python multiprocessing

- by Bobby

I have a list of starting data. I want to apply a function to the starting data that creates a few pieces of new data for each element in the starting data. Some pieces of the new data are the same and I want to remove them. The sequential version is essentially: def create_new_data_for(datum): """make a list of new data from some old datum""" return [datum.modified_copy(k) for k in datum.k_list] data = [some list of data] #some data to start with #generate a list of new data from the old data, we'll reduce it next newdata = [] for d in data: newdata.extend(create_new_data_for(d)) #now reduce the data under ".matches(other)" reduced = [] for d in newdata: for seen in reduced: if d.matches(seen): break #so we haven't seen anything like d yet seen.append(d) #now reduced is finished and is what we want! I want to speed this up with multiprocessing. I was thinking that I could use a multiprocessing.Queue for the generation. Each process would just put the stuff it creates on, and when the processes are reducing the data, they can just get the data from the Queue. But I'm not sure how to have the different process loop over reduced and modify it without any race conditions or other issues. What is the best way to do this safely? or is there a different way to accomplish this goal better?

Read the article

MPI4Py Scatter sendbuf Argument Type?

- by Noel

I'm having trouble with the Scatter function in the MPI4Py Python module. My assumption is that I should be able to pass it a single list for the sendbuffer. However, I'm getting a consistent error message when I do that, or indeed add the other two arguments, recvbuf and root: File "code/step3.py", line 682, in subbox_grid i = mpi_communicator.Scatter(station_range, station_data) File "Comm.pyx", line 427, in mpi4py.MPI.Comm.Scatter (src/ mpi4py_MPI.c:44993) File "message.pxi", line 321, in mpi4py.MPI._p_msg_cco.for_scatter (src/mpi4py_MPI.c:14497) File "message.pxi", line 232, in mpi4py.MPI._p_msg_cco.for_cco_send (src/mpi4py_MPI.c:13630) File "message.pxi", line 36, in mpi4py.MPI.message_simple (src/ mpi4py_MPI.c:11904) ValueError: message: expecting 2 or 3 items Here is the relevant code snipped, starting a few lines above 682 mentioned above. for station in stations #snip--do some stuff with station station_data = [] station_range = range(1,len(station)) mpi_communicator = MPI.COMM_WORLD i = mpi_communicator.Scatter(station_range, nsm) #snip--do some stuff with station[i] nsm = combine(avg, wt, dnew, nf1, nl1, wti[i], wtm, station[i].id) station_data = mpi_communicator.Gather(station_range, nsm) I've tried a number of combinations initializing station_range, but I must not be understanding the Scatter argument types properly. Does a Python/MPI guru have a clarification this?

Read the article

How do you save the state of a computation in while using SNOW (or Multicore or ...)

- by James

From hard experience I've found it useful to occasionally save the state of my long computations to disk to start them up later if something fails. Can I do this in a distributed computation package in R (like SNOW or multicore)? It does not seem clear how this could be done since the master is collecting things from the slaves in a non-transparent way.

Read the article

How to do parrallel processing in Unix Shell script?

- by Bikram Agarwal

I have a shell script that transfers a build.xml file to a remote unix machine (devrsp02) and executes the ANT task wldeploy on that machine (devrsp02). Now, this wldeploy task takes around 15 minutes to complete and while this is running, the last line at the unix console is - "task {some digit} initialized". Once this task is complete, we get a "task Completed" msg and the next task in the script is executed only after that. But sometimes, there might be a problem with the weblogic domain and the deployment might be failing internally, with no effect on the status of the wldeploy task. The unix console will still be stuck at "task {some digit} initialized". The error of the deployment will be getting logged in a file called output.a So, what I want now is - Start a time counter before running wldeploy. If the wldeploy runs for more than 15 minutes, the following command should be run - tail -f output.a ## without terminating the wldeploy or cat output.a ## after terminating the wldeploy forcefully Point to be noted here is - I can't run the wldeploy task in background, as in that case the user won't get to know when the task is complete, which is crucial for this script. Could you please suggest anything to achieve this?

Read the article

Minimal "Task Queue" with stock Linux tools to leverage Multicore CPU

- by Manuel

What is the best/easiest way to build a minimal task queue system for Linux using bash and common tools? I have a file with 9'000 lines, each line has a bash command line, the commands are completely independent. command 1 > Logs/1.log command 2 > Logs/2.log command 3 > Logs/3.log ... My box has more than one core and I want to execute X tasks at the same time. I searched the web for a good way to do this. Apparently, a lot of people have this problem but nobody has a good solution so far. It would be nice if the solution had the following features: can interpret more than one command (e.g. command; command) can interpret stream redirects on the lines (e.g. ls > /tmp/ls.txt) only uses common Linux tools Bonus points if it works on other Unix-clones without too exotic requirements.

Read the article

multi-core processing in R on windows XP - via doMC and foreach

- by Jan

Hi guys, I'm posting this question to ask for advice on how to optimize the use of multiple processors from R on a Windows XP machine. At the moment I'm creating 4 scripts (each script with e.g. for (i in 1:100) and (i in 101:200), etc) which I run in 4 different R sessions at the same time. This seems to use all the available cpu. I however would like to do this a bit more efficient. One solution could be to use the "doMC" and the "foreach" package but this is not possible in R on a Windows machine. e.g. library("foreach") library("strucchange") library("doMC") # would this be possible on a windows machine? registerDoMC(2) # for a computer with two cores (processors) ## Nile data with one breakpoint: the annual flows drop in 1898 ## because the first Ashwan dam was built data("Nile") plot(Nile) ## F statistics indicate one breakpoint fs.nile <- Fstats(Nile ~ 1) plot(fs.nile) breakpoints(fs.nile) # , hpc = "foreach" --> It would be great to test this. lines(breakpoints(fs.nile)) Any solutions or advice? Thanks, Jan

Read the article

What is the best MPI implementation

- by pvsnp

I have to implement MPI system in a cluster. If anyone here has any experience with MPI (MPICH/OpenMPI), I'd like to know which is better and how the performance can be boosted on a cluster of x86_64 boxes.

Read the article

Multicore programming: the hard parts

- by Jon Harrop

I'm writing a book on multicore programming using .NET 4 and I'm curious to know what parts of multicore programming people have found difficult to grok or anticipate being difficult to grok?

Read the article

simple process rollback question

- by OckhamsRazor

hi folks! while revising for an exam, i came across this simple question asking about rollbacks in processes. i understand how rollbacks occur, but i need some validation on my answer. The question: my confusion results from the fact that there is interprocess communication between the processes. does that change anything in terms of where to rollback? my answer would be R13, R23, R32 and R43. any help is greatly appreciated! thanks!

Read the article

Yet another C# Deadlock Debugging Question

- by Roo

Hi All, I have a multi-threaded application build in C# using VS2010 Professional. It's quite a large application and we've experienced the classing GUI cross-threading and deadlock issues before, but in the past month we've noticed the appears to lock up when left idle for around 20-30 minutes. The application is irresponsive and although it will repaint itself when other windows are dragged in front of the application and over it, the GUI still appears to be locked... interstingly (unlike if the GUI thread is being used for a considerable amount of time) the Close, Maximise and minimise buttons are also irresponsive and when clicked the little (Not Responding...) text is not displayed in the title of the application i.e. Windows still seems to think it's running fine. If I break/pause the application using the debugger, and view the threads that are running. There are 3 threads of our managed code that are running, and a few other worker threads whom the source code cannot be displayed for. The 3 threads that run are: The main/GUI thread A thread that loops indefinitely A thread that loops indefinitely If I step into threads 2 and 3, they appear to be looping correctly. They do not share locks (even with the main GUI thread) and they are not using the GUI thread at all. When stepping into the main/GUI thread however, it's broken on Application.Run... This problem screams deadlock to me, but what I don't understand is if it's deadlock, why can't I see the line of code the main/GUI thread is hanging on? Any help will be greatly appreciated! Let me know if you need more information... Cheers, Roo -----------------------------------------------------SOLUTION-------------------------------------------------- Okay, so the problem is now solved. Thanks to everyone for their suggestions! Much appreciated! I've marked the answer that solved my initial problem of determining where on the main/UI thread the application hangs (I handn't turned off the "Enable Just My Code" option). The overall issue I was experiencing was indeed Deadlock, however. After obtaining the call-stack and popping the top half of it into Google I came across this which explains exactly what I was experiencing... http://timl.net/ This references a lovely guide to debugging the issue... http://www.aaronlerch.com/blog/2008/12/15/debugging-ui/ This identified a control I was constructing off the GUI thread. I did know this, however, and was marshalling calls correctly, but what I didn't realise was that behind the scenes this Control was subscribing to an event or set of events that are triggered when e.g. a Windows session is unlocked or the screensaver exits. These calls are always made on the main/UI thread and were blocking when it saw the call was made on the incorrect thread. Kim explains in more detail here... http://krgreenlee.blogspot.com/2007/09/onuserpreferencechanged-hang.html In the end I found an alternative solution which did not require this Control off the main/UI thread. That appears to have solved the problem and the application no longer hangs. I hope this helps anyone who's confronted by a similar problem. Thanks again to everyone on here who helped! (and indirectly, the delightful bloggers I've referenced above!) Roo -----------------------------------------------------SOLUTION II-------------------------------------------------- Aren't threading issues delightful...you think you've solved it, and a month down the line it pops back up again. I still believe the solution above resolved an issue that would cause simillar behaviour, but we encountered the problem again. As we spent a while debugging this, I thought I'd update this question with our (hopefully) final solution: The problem appears to have been a bug in the Infragistics components in the WinForms 2010.1 release (no hot fixes). We had been running from around the time the freeze issue appeared (but had also added a bunch of other stuff too). After upgrading to WinForms 2010.3, we've yet to reproduce the issue (deja vu). See my question here for a bit more information: 'http://stackoverflow.com/questions/4077822/net-4-0-and-the-dreaded-onuserpreferencechanged-hang'. Hans has given a nice summary of the general issue. I hope this adds a little to the suggestions/information surrounding the nutorious OnUserPreferenceChanged Hang (or whatever you'd like to call it). Cheers, Roo

Read the article

python multiprocess update dictionary synchronously

- by user1050325

I am trying to update one common dictionary through multiple processes. Could you please help me find out what is the problem with this code? I get the following output: inside function {1: 1, 2: -1} comes here inside function {1: 0, 2: 2} comes here {1: 0, 2: -1} Thanks. from multiprocessing import Lock, Process, Manager l= Lock() def computeCopyNum(test,val): l.acquire() test[val]=val print "inside function" print test l.release() return a=dict({1: 0, 2: -1}) procs=list() for i in range(1,3): p = Process(target=computeCopyNum, args=(a,i)) procs.append(p) p.start() for p in procs: p.join() print "comes here" print a

Read the article

MongoDB: What's the point of using MapReduce without parallelism?

- by netvope

Quoting http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Parallelism As of right now, MapReduce jobs on a single mongod process are single threaded Without parallelism, what are the benefits of MapReduce compared to simpler or more traditional methods for queries and data aggregation? To avoid confusion: the question is NOT "what are the benefits of document-oriented DB over traditional relational DB"

Read the article

How to synchronize cuda threads when they are in the same loop and we need to synchronize them to ex

- by Vickey

Hi all, I have written a code and Now I want to implement this on cuda GPU but I'm new to synchronization so please help me with this, It's little urgent to me. Below I'm presenting the code and I want to that LOOP1 to be executed by all threads (heance I want to this portion to take advantage of cuda and the remaining portion (the portion other from the LOOP1) is to be executed by only a single thread. do{ point_set = master_Q[(*num_mas) - 1].q; List* temp = point_set; List* pa = point_set; if(master_Q[num_mas[0] - 1].max) max_level = (int) (ceilf(il2 * log(master_Q[num_mas[0] - 1].max))); *num_mas = (*num_mas) - 1; while(point_set){ List* insert_ele = temp; while(temp){ insert_ele = temp; if((insert_ele->dist[insert_ele->dist_index-1] <= pow(2, max_level-1)) || (top_level == max_level)){ if(point_set == temp){ point_set = temp->next; pa = temp->next; } else{ pa->next = temp->next; } temp = NULL; List* new_point_set = point_set; float maximum_dist = 0; if(parent->p_index != insert_ele->point_index){ List* tmp = new_point_set; float *b = &(data[(insert_ele->point_index)*point_len]); **LOOP 1:** while(tmp){ float *c = &(data[(tmp->point_index)*point_len]); float sum = 0.; for(int j = 0; j < point_len; j+=2){ float d1 = b[j] - c[j]; float d2 = b[j+1] - c[j+1]; d1 *= d1; d2 *= d2; sum = sum + d1 + d2; } tmp->dist[tmp->dist_index] = sqrt(sum); if(maximum_dist < tmp->dist[tmp->dist_index]) maximum_dist = tmp->dist[tmp->dist_index]; tmp->dist_index = tmp->dist_index+1; tmp = tmp->next; } max_distance = maximum_dist; } while(new_point_set || insert_ele){ List* far, *par, *tmp, *tmp_new; far = NULL; tmp = new_point_set; tmp_new = NULL; float level_dist = pow(2, max_level-1); float maxdist = 0, maxp = 0; while(tmp){ if(tmp->dist[(tmp->dist_index)-1] > level_dist){ if(maxdist < tmp->dist[tmp->dist_index-1]) maxdist = tmp->dist[tmp->dist_index-1]; if(tmp == new_point_set){ new_point_set = tmp->next; par = tmp->next; } else{ par->next = tmp->next; } if(far == NULL){ far = tmp; tmp_new = far; } else{ tmp_new->next = tmp; tmp_new = tmp; } if(parent->p_index != insert_ele->point_index) tmp->dist_index = tmp->dist_index - 1; tmp = tmp->next; tmp_new->next = NULL; } else{ par = tmp; if(maxp < tmp->dist[(tmp->dist_index)-1]) maxp = tmp->dist[(tmp->dist_index)-1]; tmp = tmp->next; } } if(0 == maxp){ tmp = new_point_set; aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level--; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; while(tmp){ aloc_mem[*tree_index].p_index = tmp->point_index; aloc_mem[(*tree_index)].no_child = 0; aloc_mem[(*tree_index)].level = master_Q[(*cur_count_Q)-1].level; parent->children_index[parent->no_child] = *tree_index; parent->no_child = parent->no_child + 1; (*tree_index)++; tmp = tmp->next; } cur_count_Q[0] = cur_count_Q[0]-1; new_point_set = NULL; } master_Q[*num_mas].q = far; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].max = maxdist; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; if(0 != maxp){ aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; if(maxp){ int new_level = ((int) (ceilf(il2 * log(maxp)))) +1; if (new_level < (max_level-1)) max_level = new_level; else max_level--; } else max_level--; } if( 0 == maxp ) insert_ele = NULL; } } else{ if(NULL == temp->next){ master_Q[*num_mas].q = point_set; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; } pa = temp; temp = temp->next; } } if((*num_mas) > 1){ List *temp2 = master_Q[(*num_mas)-1].q; while(temp2){ List* temp3 = master_Q[(*num_mas)-2].q; master_Q[(*num_mas)-2].q = temp2; if((master_Q[(*num_mas)-1].parent)->p_index != (master_Q[(*num_mas)-2].parent)->p_index){ temp2->dist_index = temp2->dist_index - 1; } temp2 = temp2->next; master_Q[(*num_mas)-2].q->next = temp3; } num_mas[0] = num_mas[0]-1; } point_set = master_Q[(*num_mas)-1].q; temp = point_set; pa = point_set; parent = master_Q[(*num_mas)-1].parent; max_level = master_Q[(*num_mas)-1].level; if(master_Q[(*num_mas)-1].max) if( max_level > ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1) max_level = ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1; num_mas[0] = num_mas[0]-1; } }while(*num_mas > 0);

Read the article

What is the "task" in twitter Storm parallelism

- by John Wang

I'm trying to learn twitter storm by following the great article "Understanding the parallelism of a Storm topology" However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ? Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ? I think having multiple tasks in a thread, since a thread executes sequentially, only make the thread a kind of "cached" resource, which avoids spawning new thread for next task run. Am I correct? I may clear those confusion by myself after taking more time to investigate, but you know, we both love stackoverflow ;-) Thanks in advance.

Read the article

How can one manage to fully use the newly enhanced Parallelism features in .NET 4.0?

- by Will Marcouiller

I am pretty much interested into using the newly enhanced Parallelism features in .NET 4.0. I have also seen some possibilities of using it in F#, as much as in C#. Despite, I can only see what PLINQ has to offer with, for example, the following: var query = from c in Customers.AsParallel() where (c.Name.Contains("customerNameLike") select c; There must for sure be some other use of this parallelism thing. Have you any other examples of using it? Is this particularly turned toward PLINQ, or are there other usage as easy as PLINQ? Thanks! =)

Read the article

What's the status of multicore programming in Haskell?

- by Don Stewart

What's the status of multicore programming in Haskell? What projects, tools, and libraries are available now? What experience reports have there been?

Read the article

How to printf a time_t variable as a floating point number?

- by soneangel

Hi guys, I'm using a time_t variable in C (openMP enviroment) to keep cpu execution time...I define a float value sum_tot_time to sum time for all cpu's...I mean sum_tot_time is the sum of cpu's time_t values. The problem is that printing the value sum_tot_time it appear as an integer or long, by the way without its decimal part! I tried in these ways: to printf sum_tot_time as a double being a double value to printf sum_tot_time as float being a float value to printf sum_tot_time as double being a time_t value to printf sum_tot_time as float being a time_t value Please help me!!

Read the article

Erlang message loops

- by Roger Alsing

How does message loops in erlang work, are they sync when it comes to processing messages? As far as I understand, the loop will start by "receive"ing a message and then perform something and hit another iteration of the loop. So that has to be sync? right? If multiple clients send messages to the same message loop, then all those messages are queued and performed one after another, or? To process multiple messages in parallell, you would have to spawn multiple message loops in different processes, right? Or did I misunderstand all of it?

Read the article

help me understand cuda

- by scatman

i am having some troubles understanding threads in NVIDIA gpu architecture with cuda. please could anybody clarify these info: an 8800 gpu has 16 SMs with 8 SPs each. so we have 128 SPs. i was viewing stanford's video presentation and it was saying that every SP is capable of running 96 threads cuncurrently. does this mean that it (SP) can run 96/32=3 warps concurrently? moreover, since every SP can run 96 threads and we have 8 SPs in every SM. does this mean that every SM can run 96*8=768 threads concurrently?? but if every SM can run a single Block at a time, and the maximum number of threads in a block is 512, so what is the purpose of running 768 threads concurrently and have a max of 512 threads? a more general question is:how are blocks,threads,and warps distributed to SMs and SPs? i read that every SM gets a single block to execute at a time and threads in a block is divided into warps (32 threads), and SPs execute warps.

Read the article

How do I make an already written concurrent program run on a GPU array?

- by memius

I have a neural network written in Erlang, and I just bought a GeForce GTX 260 card with a 240 core GPU on it. Is it trivial to use CUDA as glue to run this on the graphics card?

Read the article

Subdomain forwarding using .htaccess

- by RJ

I want to redirect a praticular subdomain to the main domain http(s)://dl.example.com/par1/par2 to http(s)://www.example.com/par1/par2 How to achieve the above using .htaccess Why i want to do this: Whenever any user download a file from my server, if the file is huge , then user cannot do any other operation until the file is downloaded completely...so the solution that i have thought is to forward the download request through subdomain so that the browser may continue with rest of the operation. Thanks

Read the article

Windows Azure: Parallelization of the code

- by veda

I have some matrix multiplication operation. I want to parallelize the execution of those operations through multiple processors.. This can be done on high performance computing cluster using MPI (Message Passing Interface). Like wise, can I do some parallelization in the cloud using multiple worker roles. Is there any means for doing that.

Read the article

MPI Large Data all to all transfer

- by csslayer

My application of MPI has some process that generate some large data. Say we have N+1 process (one for master control, others are workers), each of worker processes generate large data, which is now simply write to normal file, named file1, file2, ..., fileN. The size of each file may be quite different. Now I need to send all fileM to rank M process to do the next job, So it's just like all to all data transfer. My problem is how should I use MPI API to send these files efficiently? I used to use windows share folder to transfer these before, but I think it's not a good idea. I have think about MPI_file and MPI_All_to_all, but these functions seems not to be so suitable for my case. Simple MPI_Send and MPI_Recv seems hard to be used because every process need to transfer large data, and I don't want to use distributed file system for now.

Read the article

add uchar values in ushort array with sse2 or sse3

- by pompolus

i have an unsigned short dst[16][16] matrix and a larger unsigned char src[m][n] matrix. Now i have to access in the src matrix and add a 16x16 submatrix to dst, using sse2 or ss3. In a my older implementation, I was sure that my summed values ??were never greater than 256, so i could do this: for (int row = 0; row < 16; ++row) { __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src)); dst[row] = _mm_add_epi8(dst[row], subMat); src += W; // Step to next row i need to add } where W is an offset to reach the desired rows. This code works, but now my values in src are larger and summed could be greater than 256, so i need to store them as ushort. i've tried this: for (int row = 0; row < 16; ++row) { __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src)); dst[row] = _mm_add_epi16(dst[row], subMat); src += W; // Step to next row i need to add } but it doesn't work. I'm not so good with sse, so any help will be appreciated.

Search Results

Search found 1774 results on 71 pages for 'parallel for'.

Page 19/71 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >

- by Ketil

- by Bobby

- by Noel

- by James

- by Bikram Agarwal

- by Manuel

- by Jan

- by pvsnp

- by Jon Harrop

- by OckhamsRazor

- by Roo

- by user1050325

- by netvope

- by Vickey

- by John Wang

- by Will Marcouiller

- by Don Stewart

- by soneangel

- by Roger Alsing

- by scatman

- by memius

- by RJ

- by veda

- by csslayer

- by pompolus

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26 | Next Page >