Search Results

Search found 6559 results on 263 pages for 'parallel foreach'.

Page 29/263 | < Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >

  • Solving Combinatory Problems with LINQ /.NET4

    - by slf
    I saw this article pop-up in my MSDN RSS feed, and after reading through it, and the sourced article here I began to wonder about the solution. The rules are simple: Find a number consisting of 9 digits in which each of the digits from 1 to 9 appears only once. This number must also satisfy these divisibility requirements: The number should be divisible by 9. If the rightmost digit is removed, the remaining number should be divisible by 8. If the rightmost digit of the new number is removed, the remaining number should be divisible by 7. And so on, until there's only one digit (which will necessarily be divisible by 1). This is his proposed monster LINQ query: // C# and LINQ solution to the numeric problem presented in: // http://software.intel.com/en-us/blogs/2009/12/07/intel-parallel-studio-great-for-serial-code-too-episode-1/ int[] oneToNine = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 }; // the query var query = from i1 in oneToNine from i2 in oneToNine where i2 != i1 && (i1 * 10 + i2) % 2 == 0 from i3 in oneToNine where i3 != i2 && i3 != i1 && (i1 * 100 + i2 * 10 + i3) % 3 == 0 from i4 in oneToNine where i4 != i3 && i4 != i2 && i4 != i1 && (i1 * 1000 + i2 * 100 + i3 * 10 + i4) % 4 == 0 from i5 in oneToNine where i5 != i4 && i5 != i3 && i5 != i2 && i5 != i1 && (i1 * 10000 + i2 * 1000 + i3 * 100 + i4 * 10 + i5) % 5 == 0 from i6 in oneToNine where i6 != i5 && i6 != i4 && i6 != i3 && i6 != i2 && i6 != i1 && (i1 * 100000 + i2 * 10000 + i3 * 1000 + i4 * 100 + i5 * 10 + i6) % 6 == 0 from i7 in oneToNine where i7 != i6 && i7 != i5 && i7 != i4 && i7 != i3 && i7 != i2 && i7 != i1 && (i1 * 1000000 + i2 * 100000 + i3 * 10000 + i4 * 1000 + i5 * 100 + i6 * 10 + i7) % 7 == 0 from i8 in oneToNine where i8 != i7 && i8 != i6 && i8 != i5 && i8 != i4 && i8 != i3 && i8 != i2 && i8 != i1 && (i1 * 10000000 + i2 * 1000000 + i3 * 100000 + i4 * 10000 + i5 * 1000 + i6 * 100 + i7 * 10 + i8) % 8 == 0 from i9 in oneToNine where i9 != i8 && i9 != i7 && i9 != i6 && i9 != i5 && i9 != i4 && i9 != i3 && i9 != i2 && i9 != i1 let number = i1 * 100000000 + i2 * 10000000 + i3 * 1000000 + i4 * 100000 + i5 * 10000 + i6 * 1000 + i7 * 100 + i8 * 10 + i9 * 1 where number % 9 == 0 select number; // run it! foreach (int n in query) Console.WriteLine(n); Octavio states "Note that no attempt at all has been made to optimize the code", what I'd like to know is what if we DID attempt to optimize this code. Is this really the best this code can get? I'd like to know how we can do this best with .NET4, in particular doing as much in parallel as we possibly can. I'm not necessarily looking for an answer in pure LINQ, assume .NET4 in any form (managed c++, c#, etc all acceptable).

    Read the article

  • USB-to-Parallel adapter rated as USB 1.1 - will its driver lower speed of USB 2.0 bus?

    - by cumminjm
    I need to connect a printer with a parallel port to a computer with only a USB 2.0 bus. I'd like to do so without lowering the USB 2.0 bus speed to USB 1.1. All the USB-to-Parallel adapter cable found so far seem to be rated as USB 1.1 and they come with a CD which presumably contains a necessary driver. I haven't purchased one and tested it out yet, but if so, would the cable's driver lower speed of USB 2.0 bus?

    Read the article

  • Why does this script not open parallel gnome-terminals on a server?

    - by broiyan
    Why am I not able to have parallel gnome-terminals on my server while I can on my client. Here is a test that illustrates the problem. #!/bin/bash # this is the parent script gnome-terminal --command "./left.sh" sleep 10 gnome-terminal --command "./right.sh" #!/bin/bash echo "this is the left script" read -p "press any key to close this terminal" key #!/bin/bash echo "this is the right script" read -p "press any key to close this terminal" key When I run this on a regular ubuntu desktop (maverick) I see two terminals after 10 seconds. When I run this on a maverick server at a server farm, the second window does not appear until after I close the first one and wait 10 seconds. I am using tightvncserver to view the server desktop. (I could have simplified a bit more. The 10 second sleep is extraneous to the problem. In my real world application I need the first terminal to do some real work before starting the second. The problem probably still exists even if there is no sleep.)

    Read the article

  • Is it bad practice to run Node.js and apache in parallel?

    - by Camil Staps
    I have an idea in mind and would like to know if that's the way to go for my end application. Think of my application as a social networking system in which I want to implement chat functionality. For that, I'd like to push data from the server to the client. I have heard I could use Node.js for that. In the meanwhile, I want a 'regular' system for posting status updates and such, for which I'd like to use PHP and an apache server. The only way I can think of is having Node.js and apache running parallel. But is that the way to go here? I'd think there would be a somewhat neater solution for this.

    Read the article

  • How can I write a makefile to auto-detect and parallelize the build with GNU Make?

    - by xyld
    Not sure if this is possible in one Makefile alone, but I was hoping to write a Makefile in a way such that trying to build any target in the file auto-magically detects the number of processors on the current system and builds the target in parallel for the number of processors. Something like the below "pseudo-code" examples, but much cleaner? all: @make -j$(NUM_PROCESSORS) all Or: all: .inparallel ... build all here ... .inparallel: @make -j$(NUM_PROCESSORS) $(ORIGINAL_TARGET) In both cases, all you would have to type is: % make all Hopefully that makes sense.

    Read the article

  • Killing a deadlocked Task in .NET 4 TPL

    - by Dan Bryant
    I'd like to start using the Task Parallel Library, as this is the recommended framework going forward for performing asynchronous operations. One thing I haven't been able to find is any means of forcible Abort, such as what Thread.Abort provides. My particular concern is that I schedule tasks running code that I don't wish to completely trust. In particular, I can't be sure this untrusted code won't deadlock and therefore I can't be certain if a Task I schedule using this code will ever complete. I want to stay away from true AppDomain isolation (due to the overhead and complexity of marshaling), but I also don't want to leave a Task thread hanging around, deadlocked. Is there a way to do this in TPL?

    Read the article

  • No speed-up with useless printf's using OpenMP

    - by t2k32316
    I just wrote my first OpenMP program that parallelizes a simple for loop. I ran the code on my dual core machine and saw some speed up when going from 1 thread to 2 threads. However, I ran the same code on a school linux server and saw no speed-up. After trying different things, I finally realized that removing some useless printf statements caused the code to have significant speed-up. Below is the main part of the code that I parallelized: #pragma omp parallel for private(i) for(i = 2; i <= n; i++) { printf("useless statement"); prime[i-2] = is_prime(i); } I guess that the implementation of printf has significant overhead that OpenMP must be duplicating with each thread. What causes this overhead and why can OpenMP not overcome it?

    Read the article

  • object won't die (still references to it that I can't find)

    - by user288558
    I'm using parallel-python and start a new job server in a function. after the functions ends it still exists even though I didn't return it out of the function (I used weakref to test this). I guess there's still some references to this object somewhere. My two theories: It starts threads and it logs to root logger. My questions: can I somehow findout in which namespace there is still a reference to this object. I have the weakref reference. Does anyone know how to detach a logger? What other debug suggestions do people have? here is my testcode: def pptester(): js=pp.Server(ppservers=nodes) js.set_ncpus(0) fh=file('tmp.tmp.tmp','w') tmp=[] for i in range(200): tmp.append(js.submit(ppworktest,(),(),('os','subprocess'))) js.print_stats() return weakref.ref(js) thanks in advance Wolfgang

    Read the article

  • Running Awk command on a cluster

    - by alex
    How do you execute a Unix shell command (awk script, a pipe etc) on a cluster in parallel (step 1) and collect the results back to a central node (step 2) Hadoop seems to be a huge overkill with its 600k LOC and its performance is terrible (takes minutes just to initialize the job) i don't need shared memory, or - something like MPI/openMP as i dont need to synchronize or share anything, don't need a distributed VM or anything as complex Google's SawZall seems to work only with Google proprietary MapReduce API some distributed shell packages i found failed to compile, but there must be a simple way to run a data-centric batch job on a cluster, something as close as possible to native OS, may be using unix RPC calls i liked rsync simplicity but it seem to update remote notes sequentially, and you cant use it for executing scripts as afar as i know switching to Plan 9 or some other network oriented OS looks like another overkill i'm looking for a simple, distributed way to run awk scripts or similar - as close as possible to data with a minimal initialization overhead, in a nothing-shared, nothing-synchronized fashion Thanks Alex

    Read the article

  • RabbitMQ serializing messages from queue with multiple consumers

    - by Refefer
    Hi there, I'm having a problem where I have a queue set up in shared mode and multiple consumers bound to it. The issue is that it appears that rabbitmq is serializing the messages, that is, only one consumer at a time is able to run. I need this to be parallel, however, I can't seem to figure out how. Each consumer is running in its own process. There are plenty of messages in the queue. I'm using py-amqplib to interface with RabbitMQ. Any thoughts?

    Read the article

  • Linker library for OpenMP for Snow Leopard?

    - by unknownthreat
    Currently, I am trying out OpenMP on XCode 3.2.2 on Snow Leopard: #include <omp.h> #include <iostream> #include <stdio.h> int main (int argc, char * const argv[]) { #pragma omp parallel printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads()); return 0; } I didn't include any linking libraries yet, so the linker complains: "_omp_get_thread_num", referenced from: _main in main.o "_omp_get_num_threads", referenced from: _main in main.o OK, fine, no problem, I take a look in the existing framework, looking for keywords such as openmp or omp... here comes the problem, where is the linking library? Or should I say, what is the name of the linking library for openMP? Is it dylib, framework or what? Or do I need to get it from somewhere first?

    Read the article

  • Expert system for writing programs?

    - by aaa
    I am brainstorming an idea of developing a high level software to manipulate matrix algebra equations, tensor manipulations to be exact, to produce optimized C++ code using several criteria such as sizes of dimensions, available memory on the system, etc. Something which is similar in spirit to tensor contraction engine, TCE, but specifically oriented towards producing optimized rather than general code. The end result desired is software which is expert in producing parallel program in my domain. Does this sort of development fall on the category of expert systems? What other projects out there work in the same area of producing code given the constraints?

    Read the article

  • Cilk or Cilk++ or OpenMP

    - by Aman Deep Gautam
    I'm creating a multi-threaded application in Linux. here is the scenario: Suppose I am having x instance of a class BloomFilter and I have some y GB of data(greater than memory available). I need to test membership for this y GB of data in each of the bloom filter instance. It is pretty much clear that parallel programming will help to speed up the task moreover since I am only reading the data so it can be shared across all processes or threads. Now I am confused about which one to use Cilk, Cilk++ or OpenMP(which one is better). Also I am confused about which one to go for Multithreading or Multiprocessing

    Read the article

  • What's a good algorithm for searching arrays N and M, in order to find elements in N that also exist

    - by GenTiradentes
    I have two arrays, N and M. they are both arbitrarily sized, though N is usually smaller than M. I want to find out what elements in N also exist in M, in the fastest way possible. To give you an example of one possible instance of the program, N is an array 12 units in size, and M is an array 1,000 units in size. I want to find which elements in N also exist in M. (There may not be any matches.) The more parallel the solution, the better. I used to use a hash map for this, but it's not quite as efficient as I'd like it to be. Typing this out, I just thought of running a binary search of M on sizeof(N) independent threads. (Using CUDA) I'll see how this works, though other suggestions are welcome.

    Read the article

  • Physical Cores vs Virtual Cores in Parallelism

    - by Code Curiosity
    When it comes to virtualization, I have been deliberating on the relationship between the physical cores and the virtual cores, especially in how it effects applications employing parallelism. For example, in a VM scenario, if there are less physical cores than there are virtual cores, if that's possible, what's the effect or limits placed on the application's parallel processing? I'm asking, because in my environment, it's not disclosed as to what the physical architecture is. Is there still much advantage to parallelizing if the application lives on a dual core VM hosted on a single core physical machine?

    Read the article

  • Activate thread synchronically

    - by mayap
    Hi All, I'm using .Net 4.0 parallel library. The tasks I execute, ask to run some other task, sometimes synchronically and somethimes asynchronically, dependending on some conditions which are not known in advanced. For async call, i simply create new tasks and that's it. I don't know how to handly sync call: how to run it from the same thread, maybe that sync tasks will also ask to execute sync tasks recursively. all this issue is pretty new to me. thanks in advance.

    Read the article

  • Can I remove items from a ConcurrentDictionary from within an enumeration loop of that dictionary?

    - by the-locster
    So for example: ConcurrentDictionary<string,Payload> itemCache = GetItems(); foreach(KeyValuePair<string,Payload> kvPair in itemCache) { if(TestItemExpiry(kvPair.Value)) { // Remove expired item. Payload removedItem; itemCache.TryRemove(kvPair.Key, out removedItem); } } Obviously with an ordinary Dictionary this will throw an exception because removing items changes the dictionary's internal state during the life of the enumeration. It's my understanding that this is not the case for a ConcurrentDictionary as the provided IEnumerable handles internal state changing. Am I understanding this right? Is there a better pattern to use?

    Read the article

  • for vs. foreach vs. LINQ

    - by beccoblu
    When I write code in Visual Studio, ReSharper (God bless it!) often suggests me to change my old-school for loop in the more compact foreach form. And often, when I accept this change, ReSharper goes a step forward, and suggests me to change it again, in a shiny LINQ form. So, I wonder: are there some real advantages, in these improvements? In pretty simple code execution, I cannot see any speed boost (obviously), but I can see the code becoming less and less readable... So I wonder: is it worth it?

    Read the article

  • Ideas on frameworks in .NET that can be used for job processing and notifications

    - by Rajat Mehta
    Scenario: We have one instance of WCF windows service which exposes contracts like: AddNewJob(Job job), GetJobs(JobQuery query) etc. This service is consumed by 70-100 instances of client which is Windows Form based .NET app. Typically the service has 50-100 inward calls/minute to add or query jobs that are stored in a table on Sql Server. The same service is also responsible for processing these jobs in real time. It queries database every 5 seconds picks up the queued jobs and starts processing them. A job has 6 states. Queued, Pre-processing, Processing, Post-processing, Completed, Failed, Locked. Another responsibility on this service is to update all clients on every state change of every job. This means almost 200+ callbacks to clients per second. Question: This whole implementation is done using WCF Duplex bindings and works perfectly fine on small number of parallel jobs. Problem arises when we scale it up to 1000 jobs at a time. The notifications don't work as expected, it leads to memory overflow etc. Is there any standard framework that can provide a clean infrastructure for handling this scenario?? Apologies for the long explanation!

    Read the article

  • Stack usage with MMX intrinsics and Microsoft C++

    - by arik-funke
    I have an inline assembler loop that cumulatively adds elements from an int32 data array with MMX instructions. In particular, it uses the fact that the MMX registers can accommodate 16 int32s to calculate 16 different cumulative sums in parallel. I would now like to convert this piece of code to MMX intrinsics but I am afraid that I will suffer a performance penalty because one cannot explicitly intruct the compiler to use the 8 MMX registers to accomulate 16 independent sums. Can anybody comment on this and maybe propose a solution on how to convert the piece of code below to use intrinsics? == inline assembler (only part within the loop) == paddd mm0, [esi+edx+8*0] ; add first & second pair of int32 elements paddd mm1, [esi+edx+8*1] ; add third & fourth pair of int32 elements ... paddd mm2, [esi+edx+8*2] paddd mm3, [esi+edx+8*3] paddd mm4, [esi+edx+8*4] paddd mm5, [esi+edx+8*5] paddd mm6, [esi+edx+8*6] paddd mm7, [esi+edx+8*7] ; add 15th & 16th pair of int32 elements esi points to the beginning of the data array edx provides the offset in the data array for the current loop iteration the data array is arranged such that the elements for the 16 independent sums are interleaved.

    Read the article

  • How do I optimize this postfix expression tree for speed?

    - by Peter Stewart
    Thanks to the help I received in this post: I have a nice, concise recursive function to traverse a tree in postfix order: deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array. char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings. The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away. I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors. In python, I'd make the function a generator and access succeeding 'cargo's using .next(). But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

    Read the article

  • Thread management advice - Is TPL a good idea?

    - by Ian
    I'm hoping to get some advice on the use of thread managment and hopefully the task parallel library, because I'm not sure I've been going down the correct route. Probably best is that I give an outline of what I'm trying to do. Given a Problem I need to generate a Solution using a heuristic based algorithm. I start of by calculating a base solution, this operation I don't think can be parallelised so we don't need to worry about. Once the inital solution has been generated, I want to trigger n threads, which attempt to find a better solution. These threads need to do a couple of things: They need to be initalized with a different 'optimization metric'. In other words they are attempting to optimize different things, with a precedence level set within code. This means they all run slightly different calculation engines. I'm not sure if I can do this with the TPL.. If one of the threads finds a better solution that the currently best known solution (which needs to be shared across all threads) then it needs to update the best solution, and force a number of other threads to restart (again this depends on precedence levels of the optimization metrics). I may also wish to combine certain calculations across threads (e.g. keep a union of probabilities for a certain approach to the problem). This is probably more optional though. The whole system needs to be thread safe obviously and I want it to be running as fast as possible. I tried quite an implementation that involved managing my own threads and shutting them down etc, but it started getting quite complicated, and I'm now wondering if the TPL might be better. I'm wondering if anyone can offer any general guidance? Thanks...

    Read the article

  • What hash algorithms are paralellizable? Optimizing the hashing of large files utilizing on mult-co

    - by DanO
    I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

    Read the article

  • OpenMP: Get total number of running threads

    - by Konrad Rudolph
    I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function divide_and_conquer(Job job, int total_num_threads): if job.is_leaf(): # Recurrence base case. job.process() return left, right = job.divide() current_num_threads = omp_get_num_threads() if current_num_threads < total_num_threads: # (1) #pragma omp parallel num_threads(2) #pragma omp section divide_and_conquer(left, total_num_threads) #pragma omp section divide_and_conquer(right, total_num_threads) else: divide_and_conquer(left, total_num_threads) divide_and_conquer(right, total_num_threads) job = merge(left, right) If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level. I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

    Read the article

  • What hash algorithms are parallelizable? Optimizing the hashing of large files utilizing on multi-co

    - by DanO
    I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

    Read the article

< Previous Page | 25 26 27 28 29 30 31 32 33 34 35 36  | Next Page >