Search Results

Search found 1774 results on 71 pages for 'parallel'.

Page 15/71 | < Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >

How are you taking advantage of Multicore?

- by tgamblin

As someone in the world of HPC who came from the world of enterprise web development, I'm always curious to see how developers back in the "real world" are taking advantage of parallel computing. This is much more relevant now that all chips are going multicore, and it'll be even more relevant when there are thousands of cores on a chip instead of just a few. My questions are: How does this affect your software roadmap? I'm particularly interested in real stories about how multicore is affecting different software domains, so specify what kind of development you do in your answer (e.g. server side, client-side apps, scientific computing, etc). What are you doing with your existing code to take advantage of multicore machines, and what challenges have you faced? Are you using OpenMP, Erlang, Haskell, CUDA, TBB, UPC or something else? What do you plan to do as concurrency levels continue to increase, and how will you deal with hundreds or thousands of cores? If your domain doesn't easily benefit from parallel computation, then explaining why is interesting, too. Finally, I've framed this as a multicore question, but feel free to talk about other types of parallel computing. If you're porting part of your app to use MapReduce, or if MPI on large clusters is the paradigm for you, then definitely mention that, too. Update: If you do answer #5, mention whether you think things will change if there get to be more cores (100, 1000, etc) than you can feed with available memory bandwidth (seeing as how bandwidth is getting smaller and smaller per core). Can you still use the remaining cores for your application?

Read the article
How to decide between using PLINQ and LINQ at runtime?

- by Hamish Grubijan

Or decide between a parallel and a sequential operation in general. It is hard to know without testing whether parallel or sequential implementation is best due to overhead. Obviously it will take some time to train "the decider" which method to use. I would say that this method cannot be perfect, so it is probabilistic in nature. The x,y,z do influence "the decider". I think a very naive implementation would be to give both 1/2 chance at the beginning and then start favoring them according to past performance. This disregards x,y,z, however. I suspect that this question would be better answered by academics than practitioners. Anyhow, please share your heuristic, your experience if any, your tips on this. Sample code: public interface IComputer { decimal Compute(decimal x, decimal y, decimal z); } public class SequentialComputer : IComputer { public decimal Compute( ... // sequential implementation } public class ParallelComputer : IComputer { public decimal Compute( ... // parallel implementation } public class HybridComputer : IComputer { private SequentialComputer sc; private ParallelComputer pc; private TheDecider td; // Helps to decide between the two. public HybridComputer() { sc = new SequentialComputer(); pc = new ParallelComputer(); td = TheDecider(); } public decimal Compute(decimal x, decimal y, decimal z) { decimal result; decimal time; if (td.PickOneOfTwo() == 0) { // Time this and save result into time. result = sc.Compute(...); } else { // Time this and save result into time. result = pc.Compute(); } td.Train(time); return result; } }

Read the article
rsync to multiple destinations using same filelist?

- by Dylan B.

I'm wondering if it's possible for rsync to copy one directory to multiple remote destinations all in one go, or even in parallel. (not necessary, but would be useful.) Normally, something like the following would work just fine: $ rsync -Pav /junk user@host1:/backup $ rsync -Pav /junk user@host2:/backup $ rsync -Pav /junk user@host3:/backup And if that's the only option, I'll use that. However, /junk is located on a slow drive with quite a few files, and rebuilding the filelist of some ~12,000 files each time is agonizingly slow (~5 minutes) compared to the actual transfer/updating. Is it possible to do something like this, to accomplish the same thing: $ rsync -Pav /junk user@host1:/backup user@host2:/backup user@host3:/backup Thanks for looking!

Read the article
Building a new cluster for mathematical calculations (Win/Lin)

- by Muhammad Farhan

I would like to build a new cluster to perform heavy mathematical calculations in Matlab and Abaqus. One of my friend told me that distributed computing is way faster than parallel computing, which is very true after reading a bit on the internet. However, I have never clustered before. Current workstation I own: Dell Precision T5400 2 x Intel Xeon 2.5 GHz 16 GB RAM (2GB x 8) 1 x Western Digital 1TB HDD 7200 rpm 1 x nVidia Quadro FX4600 768MB GPU 1 x 870W PSU OS: Windows 7 Ultimate 64-bit 2nd WS: I can buy another WS similar configuration to the one I own I am not bothered about OS, I am willing to cluster with either Windows or Linux. However, my software are compatible with windows 64-bit only. Please help me setup a cluster. Thank you.

Read the article
How to execute a command on multiple hosts using IPv6 only?

- by math

First of all there is pdsh which is essentially a parallel distributed shell which may execute commands on a list of given hosts. However, I find myself in an IPv6 only problem setting. It seems that pdsh is not able to use IPv6, as I am getting error messages: pdsh -w ^hostnames my_command pdsh@myhost: gethostbyname("foobar") failed I also tried to use IPv6 addresses only, which also didn't work. So how do you run a single shell script for administrative purpose (no SGE stuff, or similar) on a bunch of hosts that is IPv6 reachable only?

Read the article
WinXP: Error 1167 -- Device (LPT1) not connected

- by Thomas Matthews

I am writing a program that opens LPT1 and writes a value to it. The WriteFile function is returning an error code of 1167, "The device is not connected". The Device Manager shows that LPT1 is present. I have a cable connected between a development board and the PC. The cable converts JTAG pin signals to signals on the parallel port. Power is applied and the cable is connected between the development board and the PC. The development board is powered on. I am using: Windows XP MS Visual Studio 2008, C language, console application, debug environment. Here is the relevant code fragments: HANDLE parallel_port_handle; void initializePort(void) { TCHAR * port_name = TEXT("LPT1:"); parallel_port_handle = CreateFile( port_name, GENERIC_READ | GENERIC_WRITE, 0, // must be opened with exclusive-access NULL, // default security attributes OPEN_EXISTING, // must use OPEN_EXISTING 0, // not overlapped I/O NULL // hTemplate must be NULL for comm devices ); if (parallel_port_handle == INVALID_HANDLE_VALUE) { // Handle the error. printf ("CreateFile failed with error %d.\n", GetLastError()); Pause(); exit(1); } return; } void writePort( unsigned char a_ucPins, unsigned char a_ucValue ) { DWORD dwResult; if ( a_ucValue ) { g_siIspPins = (unsigned char) (a_ucPins | g_siIspPins); } else { g_siIspPins = (unsigned char) (~a_ucPins & g_siIspPins); } /* This is a sample code for Windows/DOS without Windows Driver. */ // _outp( g_usOutPort, g_siIspPins ); //---------------------------------------------------------------------- // For Windows XP and later //---------------------------------------------------------------------- if(!WriteFile (parallel_port_handle, &g_siIspPins, 1, &dwResult, NULL)) { printf("Could not write to LPT1 (error %d)\n", GetLastError()); Pause(); return; } } If you believe this should be posted on Stack Overflow, please migrate it over (thanks).

Read the article
How do I get Java to use my multi-core processor?

- by Rudiger

I'm using a GZIPInputStream in my program, and I know that the performance would be helped if I could get Java running my program in parallel. In general, is there a command-line option for the standard VM to run on many cores? It's running on just one as it is. Thanks! Edit I'm running plain ol' Java SE 6 update 17 on Windows XP. Would putting the GZIPInputStream on a separate thread explicitly help? No! Do not put the GZIPInputStream on a separate thread! Do NOT multithread I/O! Edit 2 I suppose I/O is the bottleneck, as I'm reading and writing to the same disk... In general, though, is there a way to make GZIPInputStream faster? Or a replacement for GZIPInputStream that runs parallel? Edit 3 Code snippet I used: GZIPInputStream gzip = new GZIPInputStream(new FileInputStream(INPUT_FILENAME)); DataInputStream in = new DataInputStream(new BufferedInputStream(gzip));

Read the article
Does the chunk of the System.Collections.Concurrent.Partitioner need to be thread safe?

- by Scott Chamberlain

I am working with the Parallel libraries in .net 4 and I am creating a Partitioner and the example shown in the MSDN only has a chunk size of 1 (every time a new result is retrieved it hits the data source instead of the local cache. The version I am writing will pull 10000 SQL rows at a time then feed the rows from the cache until it is empty then pull another batch. Each partition in the Partitioner has its own chunk. I know every time I call to the IEnumerator in from the SQL data-source that needs to be thread safe but for use in a Parallel.ForEach do I need to make every call to the cache for the chunking thread safe?

Read the article
pipelined function

- by user289429

Can someone provide an example of how to use parallel table function in oracle pl/sql. We need to run massive queries for 15 years and combine the result. SELECT * FROM Table(TableFunction(cursor(SELECT * FROM year_table))) ...is what we want effectively. The innermost select will give all the years, and the table function will take each year and run massive query and returns a collection. The problem we have is that all years are being fed to one table function itself, we would rather prefer the table function being called in parallel for each of the year. We tried all sort of partitioning by hash and range and it didn't help. Also, can we drop the keyword PIPELINED from the function declaration? because we are not performing any transformation and just need the aggregate of the resultset.

Read the article
Do you know any build systems with decent support for parallelization?

- by dahpgjgamgan

Hi, I am looking for a build system (working on ms windows) that has good support for parallelization of tasks/targets (or whatever you call them). To be more specific - during build (that is initiated on MS Windows machine) I need to copy source files to a number of different machines (which are not necessarily running Windows) and start a remote job on each of them - and I really like to do that on all machines at once. Does anyone know a build system that's capable of executing such a task in parallel. From what I googled, the options currently available are: -j switch in make - but i don't know if nmake supports this -some custom nAnt tasks -msbuild has some form of support for parallelization - seems similiar to make (meaning you don't specify what to do in parallel, just specify that it would be nice to build things that way) -fake (f# make) is written in functional programming language which are known to have good parallelization support - but I'm not very skillful in functional programming area. Any other solutions I could explore?

Read the article
Why does Clojure hang after hacing performed my calculations?

- by Thomas

Hi all, I'm experimenting with filtering through elements in parallel. For each element, I need to perform a distance calculation to see if it is close enough to a target point. Never mind that data structures already exist for doing this, I'm just doing initial experiments for now. Anyway, I wanted to run some very basic experiments where I generate random vectors and filter them. Here's my implementation that does all of this (defn pfilter [pred coll] (map second (filter first (pmap (fn [item] [(pred item) item]) coll)))) (defn random-n-vector [n] (take n (repeatedly rand))) (defn distance [u v] (Math/sqrt (reduce + (map #(Math/pow (- %1 %2) 2) u v)))) (defn -main [& args] (let [[n-str vectors-str threshold-str] args n (Integer/parseInt n-str) vectors (Integer/parseInt vectors-str) threshold (Double/parseDouble threshold-str) random-vector (partial random-n-vector n) u (random-vector)] (time (println n vectors (count (pfilter (fn [v] (< (distance u v) threshold)) (take vectors (repeatedly random-vector)))))))) The code executes and returns what I expect, that is the parameter n (length of vectors), vectors (the number of vectors) and the number of vectors that are closer than a threshold to the target vector. What I don't understand is why the programs hangs for an additional minute before terminating. Here is the output of a run which demonstrates the error $ time lein run 10 100000 1.0 [null] 10 100000 12283 [null] "Elapsed time: 3300.856 msecs" real 1m6.336s user 0m7.204s sys 0m1.495s Any comments on how to filter in parallel in general are also more than welcome, as I haven't yet confirmed that pfilter actually works.

Read the article
How exactly do MbUnit's [Parallelizable] and DegreeOfParallelism work?

- by BenA

I thought I understood how MbUnit's parallel test execution worked, but the behaviour I'm seeing differs sufficiently much from my expectation that I suspect I'm missing something! I have a set of UI tests that I wish to run concurrently. All of the tests are in the same assembly, split across three different namespaces. All of the tests are completely independent of one another, so I'd like all of them to be eligible for parallel execution. To that end, I put the following in the AssemblyInfo.cs: [assembly: DegreeOfParallelism(8)] [assembly: Parallelizable(TestScope.All)] My understanding was that this combination of assembly attributes should cause all of the tests to be considered [Parallelizable], and that the test runner should use 8 threads during execution. My individual tests are marked with the [Test] attribute, and nothing else. None of them are data-driven. However, what I actually see is at most 5-6 threads being used, meaning that my test runs are taking longer than they should be. Am I missing something? Do I need to do anything else to ensure that all of my 8 threads are being used by the runner? N.B. The behaviour is the same irrespective of which runner I use. The GUI, command line and TD.Net runners all behave the same as described above, again leading me to think I've missed something. EDIT: As pointed out in the comments, I'm running v3.1 of MbUnit (update 2 build 397). The documentation suggests that the assembly level [parallelizable] attribute is available, but it does also seem to reference v3.2 of the framework despite that not yet being available. EDIT 2: To further clarify, the structure of my assembly is as follows: assembly - namespace - fixture - tests (each carrying only the [Test] attribute) - fixture - tests (each carrying only the [Test] attribute) - namespace - fixture - tests (each carrying only the [Test] attribute) - fixture - tests (each carrying only the [Test] attribute) - namespace - fixture - tests (each carrying only the [Test] attribute) - fixture - tests (each carrying only the [Test] attribute)

Read the article
[C++][OpenMP] Proper use of "atomic directive" to lock STL container

- by conradlee

I have a large number of sets of integers, which I have, in turn, put into a vector of pointers. I need to be able to update these sets of integers in parallel without causing a race condition. More specifically. I am using OpenMP's "parallel for" construct. For dealing with shared resources, OpenMP offers a handy "atomic directive," which allows one to avoid a race condition on a specific piece of memory without using locks. It would be convenient if I could use the "atomic directive" to prevent simultaneous updating to my integer sets, however, I'm not sure whether this is possible. Basically, I want to know whether the following code could lead to a race condition vector< set<int>* > membershipDirectory(numSets, new set<int>); #pragma omp for schedule(guided,expandChunksize) for(int i=0; i<100; i++) { set<int>* sp = membershipDirectory[5]; #pragma omp atomic sp->insert(45); } (Apologies for any syntax errors in the code---I hope you get the point) I have seen a similar example of this for incrementing an integer, but I'm not sure whether it works when working with a pointer to a container as in my case.

Read the article
Parallelizing for loop

- by vman049

I have MATLAB code which I'm trying to parallelize with a simple change from "for" to "parfor." I'm unable to do so because of an error I'm receiving on the variable "votes" which states: Valid indices for 'votes' are restricted in PARFOR loops. Explanation: For MATLAB to execute parfor loops efficiently, the amount of data sent to the MATLAB workers must be minimal. One of the ways MATLAB achieves this is by restricting the way variables can be indexed in parfor iterations. The indicated variable is indexed in a way that is incompatible with parfor. Suggested Action: Fix the indexing. For a description of the indexing restrictions, see “Sliced Variables” in the Parallel Computing Toolbox documentation. Below is my code: votes = zeros(num_layers, size(spikes, 1), size(SVMs_layer1, 1)); predDir = zeros(size(spikes, 1), 1); chronProb = zeros([num_layers, size(chronDists)]); for i = 1:num_layers switch i case 1 B = B1; k_elem_temp = k_elem1; rest_elem_temp = rest_elem1; case 2 B = B2; k_elem_temp = k_elem2; rest_elem_temp = rest_elem2; case 3 B = B3; k_elem_temp = k_elem3; rest_elem_temp = rest_elem3; end for j = 1:length(chronPred) if chronDists(i, j, :) ~= 0 parfor k = 1:8 chronProb(i, j, k) = logistic(B{k}(1) + chronDists(i, j, k).*(B{k}(2))); votes(i, j, k_elem_temp(k, :)) = votes(i, j, k_elem_temp(k, :)) + chronProb(i, j, k)/num_k(i)/num_layers; votes(i, j, rest_elem_temp(k, :)) = votes(i, j, rest_elem_temp(k, :)) + (1 - chronProb(i, j, k))/num_rest(i)/num_layers; end end end end Do you have any suggestions as to how I could adjust my code so that it runs in parallel? Thank you!

Read the article
Why don't xUnit frameworks allow tests to run in parallel?

- by Xavier Nodet

Do you know of any xUnit framework that allows to run tests in parallel, to make use of multiple cores in today's machine? I don't... If none (or so few) of them does it, maybe there is a reason... Is it that tests are usually so quick that people simply don't feel the need to paralellize them? Is there something deeper that precludes distributing (at least some of) the tests over multiple threads? Thanks!

Read the article
Relationship between "Task Parallel Library" and "Task-based Asynchronous Pattern"?

- by Sid

In the context of C#, .NET 4/4.5 used for an application running on a web-server, what is the relationship between "Task Parallel Library" and "Task-based Asynchronous Pattern"? I understand one is a library and the other is a pattern. But to dig deeper, is it like "The library is used by the pattern to enforce good practices". I'm also not clear if both are supported in .NET 4.0 (with awake and async keywords) Edit: Seems that awake and async are only in .NET 4.5 ...

Read the article
Branching strategy for parallel development that won't be in the same release?

- by Telastyn

My team is working on a product, which for business reasons needs to be released on a regular schedule. An issue has arisen where we want to do development in parallel for the upcoming release, as well as the 'next' release. This is to become standard practice, so it's not as straightforward as cutting a feature branch for the new work. We'll continually have 2+ teams working on different releases of the same product. Is there an SCM best practice for this sort of arrangement?

Read the article
STL algorithms and concurrent programming

- by Andrew

Hello everyone, Can any of STL algorithms/container operations like std::fill, std::transform be executed in parallel if I enable OpenMP for my compiler? I am working with MSVC 2008 at the moment. Or maybe there are other ways to make it concurrent? Thanks.

Read the article
How can I implement MapReduce using shell commands?

- by alex

How do you execute a Unix shell command (e.g awk one liner) on a cluster in parallel (step 1) and collect the results back to a central node (step 2)? Update: I've just found http://blog.last.fm/2009/04/06/mapreduce-bash-script It seems to do exactly what I need.

Read the article
Number of threads and thread numbers in Grand Central Dispatch

- by raphgott

I am using C and Grand Central Dispatch to parallelize some heavy computations. How can I get the number of threads used by GCD? Also is it possible to know on which thread a piece of code is currently running on? Basically I'd like to use sprng (parallel random numbers) with multiple streams and for that I need to know what stream id to use (and therefore what thread is being used).

Read the article
The simplest concurrency pattern

- by Ilya Kogan

Please, would you help me in reminding me of one of the simplest parallel programming techniques. How do I do the following in C#: Initial state: semaphore counter = 0 Thread 1: // Block until semaphore is signalled semaphore.Wait(); // wait until semaphore counter is 1 Thread 2: // Allow thread 1 to run: semaphore.Signal(); // increments from 0 to 1 It's not a mutex because there is no critical section, or rather you can say there is an infinite critical section. So what is it?

Read the article
Can .NET Task instances go out of scope during run?

- by Henry Jackson

If I have the following block of code in a method (using .NET 4 and the Task Parallel Library): var task = new Task(() => DoSomethingLongRunning()); task.Start(); and the method returns, will that task go out of scope and be garbage collected, or will it run to completion? I haven't noticed any issues with GCing, but want to make sure I'm not setting myself up for a race condition with the GC.

Read the article
openmp program elapsed time not scaling with increased threads

- by Griff

I've got this openmp fortran program doing an embarrassingly parallel problem - do loop over 512^3 elements. See output below. Why would there be such strange behavior in the elapsed time as a function of threads? I thought it would peak at a sweet spot then slowly degrade. This clearly isn't happening. Perhaps I misunderstand something about openmp. Threads, omp_get_wtime 1, 103.76298500015400 2, 65.346454000100493 4, 45.923643999965861 7, 38.074195000110194 8, 36.968765000114217 9, 39.45981499995105 10,40.753379000118002 12,39.577559999888763 14,37.909950000001118

Read the article
openmp sections running sequentially

- by chi42

I have the following code: #pragma omp parallel sections private(x,y,cpsrcptr) firstprivate(srcptr) lastprivate(srcptr) { #pragma omp section { //stuff } #pragma omp section { //stuff } } According to the Zoom profiler, two threads are created, one thread executes both the sections, and the other thread simply blocks! Has anyone encountered anything like this before? (And yes, I do have a dual core machine).

Read the article
Reading info from printer on USB port

- by Hein du Plessis

In the days of parallel printers one used to be able to send a command on LPT1 and receive back standard info such as life count etc. Now, with USB devices, have we lost that capability? Or is there still a way to read the info?

Read the article

< Previous Page | 11 12 13 14 15 16 17 18 19 20 21 22 | Next Page >