parallel computing - Page 36

RabbitMQ serializing messages from queue with multiple consumers

- by Refefer

Hi there, I'm having a problem where I have a queue set up in shared mode and multiple consumers bound to it. The issue is that it appears that rabbitmq is serializing the messages, that is, only one consumer at a time is able to run. I need this to be parallel, however, I can't seem to figure out how. Each consumer is running in its own process. There are plenty of messages in the queue. I'm using py-amqplib to interface with RabbitMQ. Any thoughts?

Read the article

object won't die (still references to it that I can't find)

- by user288558

I'm using parallel-python and start a new job server in a function. after the functions ends it still exists even though I didn't return it out of the function (I used weakref to test this). I guess there's still some references to this object somewhere. My two theories: It starts threads and it logs to root logger. My questions: can I somehow findout in which namespace there is still a reference to this object. I have the weakref reference. Does anyone know how to detach a logger? What other debug suggestions do people have? here is my testcode: def pptester(): js=pp.Server(ppservers=nodes) js.set_ncpus(0) fh=file('tmp.tmp.tmp','w') tmp=[] for i in range(200): tmp.append(js.submit(ppworktest,(),(),('os','subprocess'))) js.print_stats() return weakref.ref(js) thanks in advance Wolfgang

Read the article

what is a data serialization system?

- by Yang

according to Apache AVRO project, "Avro is a serialization system". By saying data serialization system, does it mean that avro is a product or api? also, I am not quit sure about what a data serialization system is? for now, my understanding is that it is a protocol that defines how data object is passed over the network. Can anyone help explain it in an intuitive way that it is easier for people with limited distributed computing background to understand? Thanks in advance!

Read the article

What considerations should be made for a web app to be released on a cloud hosted system?

- by Rhubarb

I have a web app that is primarily a WordPress app, but it pulls content from a Django app, simply by calling a service that uses Django models. My understanding of cloud computing is a bit vague. If the site needs to scale up with short notice, does the cloud provider (Amazon, Rackspace, whomever) simply spin up new instances (copies) of my initially configured server? How is state managed between all of them? Are there any good primers on this subject? It's hard to find much out there without getting caught up in the marketing.

Read the article

Linker library for OpenMP for Snow Leopard?

- by unknownthreat

Currently, I am trying out OpenMP on XCode 3.2.2 on Snow Leopard: #include <omp.h> #include <iostream> #include <stdio.h> int main (int argc, char * const argv[]) { #pragma omp parallel printf("Hello from thread %d, nthreads %d\n", omp_get_thread_num(), omp_get_num_threads()); return 0; } I didn't include any linking libraries yet, so the linker complains: "_omp_get_thread_num", referenced from: _main in main.o "_omp_get_num_threads", referenced from: _main in main.o OK, fine, no problem, I take a look in the existing framework, looking for keywords such as openmp or omp... here comes the problem, where is the linking library? Or should I say, what is the name of the linking library for openMP? Is it dylib, framework or what? Or do I need to get it from somewhere first?

Read the article

Expert system for writing programs?

- by aaa

I am brainstorming an idea of developing a high level software to manipulate matrix algebra equations, tensor manipulations to be exact, to produce optimized C++ code using several criteria such as sizes of dimensions, available memory on the system, etc. Something which is similar in spirit to tensor contraction engine, TCE, but specifically oriented towards producing optimized rather than general code. The end result desired is software which is expert in producing parallel program in my domain. Does this sort of development fall on the category of expert systems? What other projects out there work in the same area of producing code given the constraints?

Read the article

Cilk or Cilk++ or OpenMP

- by Aman Deep Gautam

I'm creating a multi-threaded application in Linux. here is the scenario: Suppose I am having x instance of a class BloomFilter and I have some y GB of data(greater than memory available). I need to test membership for this y GB of data in each of the bloom filter instance. It is pretty much clear that parallel programming will help to speed up the task moreover since I am only reading the data so it can be shared across all processes or threads. Now I am confused about which one to use Cilk, Cilk++ or OpenMP(which one is better). Also I am confused about which one to go for Multithreading or Multiprocessing

Read the article

Physical Cores vs Virtual Cores in Parallelism

- by Code Curiosity

When it comes to virtualization, I have been deliberating on the relationship between the physical cores and the virtual cores, especially in how it effects applications employing parallelism. For example, in a VM scenario, if there are less physical cores than there are virtual cores, if that's possible, what's the effect or limits placed on the application's parallel processing? I'm asking, because in my environment, it's not disclosed as to what the physical architecture is. Is there still much advantage to parallelizing if the application lives on a dual core VM hosted on a single core physical machine?

Read the article

Why are Asynchronous processes not called Synchronous?

- by Balk

So I'm a little confused by this terminology. Everyone refers to "Asynchronous" computing as running different processes on seperate threads, which gives the illusion that these processes are running at the same time. This is not the definition of the word asynchronous. a·syn·chro·nous –adjective 1. not occurring at the same time. 2. (of a computer or other electrical machine) having each operation started only after the preceding operation is completed. What am I not understanding here?

Read the article

what changes when your input is giga/terabyte sized?

- by Wang

I just took my first baby step today into real scientific computing today when I was shown a data set where the smallest file is 48000 fields by 1600 rows (haplotypes for several people, for chromosome 22). And this is considered tiny. I write Python, so I've spent the last few hours reading about HDF5, and Numpy, and PyTable, but I still feel like I'm not really grokking what a terabyte-sized data set actually means for me as a programmer. For example, someone pointed out that with larger data sets, it becomes impossible to read the whole thing into memory, not because the machine has insufficient RAM, but because the architecture has insufficient address space! It blew my mind. What other assumptions have I been relying in the classroom that just don't work with input this big? What kinds of things do I need to start doing or thinking about differently? (This doesn't have to be Python specific.)

Read the article

What's a good algorithm for searching arrays N and M, in order to find elements in N that also exist

- by GenTiradentes

I have two arrays, N and M. they are both arbitrarily sized, though N is usually smaller than M. I want to find out what elements in N also exist in M, in the fastest way possible. To give you an example of one possible instance of the program, N is an array 12 units in size, and M is an array 1,000 units in size. I want to find which elements in N also exist in M. (There may not be any matches.) The more parallel the solution, the better. I used to use a hash map for this, but it's not quite as efficient as I'd like it to be. Typing this out, I just thought of running a binary search of M on sizeof(N) independent threads. (Using CUDA) I'll see how this works, though other suggestions are welcome.

Read the article

c++ programming for clusters and HPC

- by Abruzzo Forte e Gentile

HI All I need to write a scientific application in C++ doing a lot of computations and using a lot of memory. I have part of the job but due to high requirements in terms of resources I was thinking to start moving to OpenMPI. Before doing that I have a simple curiosity: If I understood the principle of OpenMPI is the developer that has the task of splitting the jobs over different nodes calling SEND and RECEIVE based on node available at that time. Do you know if it does exist some library or OS or whatever that has this capability letting my code reamain as it is now? Basically something that connects all computers and let share as one their memory and CPU? I am a bit confused because of the high material available on the topic. Should I look at cloud computing? or Distributed Shared Memory? Can you help me or address me a bit? Thanks

Read the article

How can I call an executable to run on a separate machine within a program on my own machine (win xp

- by Mr. H.

My objective is to write a program which will call another executable on a separate computer(all with win xp) with parameters determined at run-time, then repeat for several more computers, and then collect the results. In short, I'm working on a grid-computing project. The algorithm itself being used is already coded in FORTRAN, but we are looking for an efficient way to run it on many computers at once. I suppose one way to accomplish this would be to upload a script to each computer and then run said script on each computer, all automatically and dependent on my own parameters. But how can I write a program which will write to, upload, and run a script on a separate computer? I had considered GridGain, but the algorithm is already coded and in a different language, so that is ruled out. My current guess at accomplishing this task is using Expect (wiki/Expect), but I have no knowledge of the tool. Any advice appreciated.

Read the article

Activate thread synchronically

- by mayap

Hi All, I'm using .Net 4.0 parallel library. The tasks I execute, ask to run some other task, sometimes synchronically and somethimes asynchronically, dependending on some conditions which are not known in advanced. For async call, i simply create new tasks and that's it. I don't know how to handly sync call: how to run it from the same thread, maybe that sync tasks will also ask to execute sync tasks recursively. all this issue is pretty new to me. thanks in advance.

Read the article

"cloud architecture" concepts in a system architecture diagrams

- by markus

If you design a distributed application for easy scale-out, or you just want to make use of any of the new “cloud computing” offerings by Amazon, Google or Microsoft, there are some typical concepts or components you usually end up using: distributed blob storage (aka S3) asynchronous, durable message queues (aka SQS) non-Relational-/non-transactional databases (like SimpleDB, Google BigTable, Azure SQL Services) distributed background worker pool load-balanced, edge-service processes handling user requests (often virtualized) distributed caches (like memcached) CDN (content delivery network like Akamai) Now when it comes to design and sketch an architecture that makes use of such patterns, are there any commonly used symbols I could use? Or even a download with some cool Visio stencils? :) It doesn’t have to be a formal system like UML but I think it would be great if there were symbols that everyone knows and understands, like we have commonly used shapes for databases or a documents, for example. I think it would be important to not mix it up with traditional concepts like a normal file system (local or network server/SAN), or a relational database. Simply speaking, I want to be able to draw some conclusions about an application’s scalability or data consistency issues by just looking at the system architecture overview diagram. Update: Thank you very much for your answers. I like the idea of putting a small "cloud symbol" on the traditional symbols. However I leave this thread open just in case someone will find specific symbols (maybe in a book or so) - or uploaded some pimped up Visio stencils ;)

Read the article

When should one use the following: Amazon EC2, Google App Engine, Microsoft Azure and Salesforce.com

- by vicky21

I am asking this in very general sense. Both from cloud provider and cloud consumer's perspective. Also the question is not for any specific kind of application (in fact the intention is to know which type of applications/domains can fit into which of the cloud slab -SaaS PaaS IaaS). My understanding so far is: IaaS: Raw Hardware (Processors, Networks, Storage). PaaS: OS, System Softwares, Development Framework, Virtual Machines. SaaS: Software Applications. It would be great if Stackoverflower's can share their understanding and experiences of cloud computing concept. EDIT: Ok, I will put it in more specific way - Amazon EC2: You don't have control over hardware layer. But you can take your choice of OS image, Dev Framework (.NET, J2EE, LAMP) and Application and put it on EC2 hardware. Can you deploy an applications built with Google App Engine or Azure on EC2? Google App Engine: You don't have control over hardware and OS and you get a specific Dev Framework to build your application. Can you take any existing Java or Python application and port it to GAE? Or vice versa, can applications that were built on GAE be taken out of GAE and ported to any Application Server like Websphere or Weblogic? Azure: You don't have control over hardware and OS and you get a specific Dev Framework to build your application. Can you take any existing .NET application and port it to Azure? Or vice versa, can applications that were built on Azure be taken out of Azure and ported to any Application Server like Biztalk?

Read the article

RMI no such object in table, Server communication error

- by ben-casey

My goal is to create a Distributed computing program that launches a server and client at the same time. I need it to be able to install on a couple of machines and have all the machines communicating with each other, i.e. Master node and 5 slave nodes all from one application. My problem is that I cannot properly use unicastRef, I'm thinking that it is a problem with launching everything on the same port, is there a better way I am overlooking? this is part of my code (the part that matters) try { RMIServer obj = new RMIServer(); obj.start(5225); } catch (Exception e) { e.printStackTrace(); } try { System.out.println("We are slave's "); Registry rr = LocateRegistry.getRegistry("127.0.0.1", Store.PORT, new RClient()); Call ss = (Call) rr.lookup("FILLER"); System.out.println(ss.getHello()); } catch (Exception e) { e.printStackTrace(); } } this is my main class (above) this is the server class (below) public RMIServer() { } public void start(int port) throws Exception { try { Registry registry = LocateRegistry.createRegistry(port, new RClient(), new RServer()); Call stuff = new Call(); registry.bind("FILLER", stuff); System.out.println("Server ready"); } catch (Exception e) { System.err.println("Server exception: " + e.toString()); e.printStackTrace(); } } I don't know what I am missing or what I am overlooking but the output looks like this. Listen on 5225 Listen on 8776 Server ready We are slave's Listen on 8776 java.rmi.NoSuchObjectException: no such object in table at sun.rmi.transport.StreamRemoteCall.exceptionReceivedFromServer(StreamRemoteCall.java:255) at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:233) at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:359) at sun.rmi.registry.RegistryImpl_Stub.lookup(Unknown Source) at Main.main(Main.java:62) line 62 is this ::: Call ss = (Call) rr.lookup("FILLER");

Read the article

Moving a ASP.NET application to the cloud

- by user102533

I am new to cloud computing, so please bear with me here. I have an existing ASP.NET application with SQL Server 2008 hosted on a Virtual Private Server. Here's what it briefly does: The front end accepts user's requests and adds them to a DB table A Windows Service running in the background picks up the request, processes it and sets a flag. The Windows Services also creates a file for the user to download. User downloads file I'd like to move this web application with the service to the cloud. The architecture I envision is that I'll have 1 Web server in which I will install the front end and the windows service. I'll also have a cloud files server for file storage. The windows service should somehow create a file and transfer it to the cloud file server (I assume this is possible?) My questions: Does the architecture look like I am going in the right direction? I know Amazon has been providing cloud services for a long time. If I want to do minimal changes to my application, should I go with Amazon, Rackspace, Azure or some other provider? I understand that I would not only pay for file storage and web server but also for the bandwidth of users downloading the file and the windows servic uploading the file to the cloud server. Can I assume these costs are negligible? Should I go with VPS + Cloud Files combination to begin with? Any other thoughts/suggestions?

Read the article

Stack usage with MMX intrinsics and Microsoft C++

- by arik-funke

I have an inline assembler loop that cumulatively adds elements from an int32 data array with MMX instructions. In particular, it uses the fact that the MMX registers can accommodate 16 int32s to calculate 16 different cumulative sums in parallel. I would now like to convert this piece of code to MMX intrinsics but I am afraid that I will suffer a performance penalty because one cannot explicitly intruct the compiler to use the 8 MMX registers to accomulate 16 independent sums. Can anybody comment on this and maybe propose a solution on how to convert the piece of code below to use intrinsics? == inline assembler (only part within the loop) == paddd mm0, [esi+edx+8*0] ; add first & second pair of int32 elements paddd mm1, [esi+edx+8*1] ; add third & fourth pair of int32 elements ... paddd mm2, [esi+edx+8*2] paddd mm3, [esi+edx+8*3] paddd mm4, [esi+edx+8*4] paddd mm5, [esi+edx+8*5] paddd mm6, [esi+edx+8*6] paddd mm7, [esi+edx+8*7] ; add 15th & 16th pair of int32 elements esi points to the beginning of the data array edx provides the offset in the data array for the current loop iteration the data array is arranged such that the elements for the 16 independent sums are interleaved.

Read the article

Ideas on frameworks in .NET that can be used for job processing and notifications

- by Rajat Mehta

Scenario: We have one instance of WCF windows service which exposes contracts like: AddNewJob(Job job), GetJobs(JobQuery query) etc. This service is consumed by 70-100 instances of client which is Windows Form based .NET app. Typically the service has 50-100 inward calls/minute to add or query jobs that are stored in a table on Sql Server. The same service is also responsible for processing these jobs in real time. It queries database every 5 seconds picks up the queued jobs and starts processing them. A job has 6 states. Queued, Pre-processing, Processing, Post-processing, Completed, Failed, Locked. Another responsibility on this service is to update all clients on every state change of every job. This means almost 200+ callbacks to clients per second. Question: This whole implementation is done using WCF Duplex bindings and works perfectly fine on small number of parallel jobs. Problem arises when we scale it up to 1000 jobs at a time. The notifications don't work as expected, it leads to memory overflow etc. Is there any standard framework that can provide a clean infrastructure for handling this scenario?? Apologies for the long explanation!

Read the article

How do I optimize this postfix expression tree for speed?

- by Peter Stewart

Thanks to the help I received in this post: I have a nice, concise recursive function to traverse a tree in postfix order: deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array. char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings. The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away. I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors. In python, I'd make the function a generator and access succeeding 'cargo's using .next(). But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

Read the article

OpenMP: Get total number of running threads

- by Konrad Rudolph

I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function divide_and_conquer(Job job, int total_num_threads): if job.is_leaf(): # Recurrence base case. job.process() return left, right = job.divide() current_num_threads = omp_get_num_threads() if current_num_threads < total_num_threads: # (1) #pragma omp parallel num_threads(2) #pragma omp section divide_and_conquer(left, total_num_threads) #pragma omp section divide_and_conquer(right, total_num_threads) else: divide_and_conquer(left, total_num_threads) divide_and_conquer(right, total_num_threads) job = merge(left, right) If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level. I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

Read the article

What hash algorithms are paralellizable? Optimizing the hashing of large files utilizing on mult-co

- by DanO

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

Read the article

Thread management advice - Is TPL a good idea?

- by Ian

I'm hoping to get some advice on the use of thread managment and hopefully the task parallel library, because I'm not sure I've been going down the correct route. Probably best is that I give an outline of what I'm trying to do. Given a Problem I need to generate a Solution using a heuristic based algorithm. I start of by calculating a base solution, this operation I don't think can be parallelised so we don't need to worry about. Once the inital solution has been generated, I want to trigger n threads, which attempt to find a better solution. These threads need to do a couple of things: They need to be initalized with a different 'optimization metric'. In other words they are attempting to optimize different things, with a precedence level set within code. This means they all run slightly different calculation engines. I'm not sure if I can do this with the TPL.. If one of the threads finds a better solution that the currently best known solution (which needs to be shared across all threads) then it needs to update the best solution, and force a number of other threads to restart (again this depends on precedence levels of the optimization metrics). I may also wish to combine certain calculations across threads (e.g. keep a union of probabilities for a certain approach to the problem). This is probably more optional though. The whole system needs to be thread safe obviously and I want it to be running as fast as possible. I tried quite an implementation that involved managing my own threads and shutting them down etc, but it started getting quite complicated, and I'm now wondering if the TPL might be better. I'm wondering if anyone can offer any general guidance? Thanks...

Read the article

What hash algorithms are parallelizable? Optimizing the hashing of large files utilizing on multi-co

- by DanO

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

Search Results

Search found 3521 results on 141 pages for 'parallel computing'.

Page 36/141 | < Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >

- by Refefer

- by user288558

- by Yang

- by Rhubarb

- by unknownthreat

- by aaa

- by Aman Deep Gautam

- by Code Curiosity

- by Balk

- by Wang

- by GenTiradentes

- by Abruzzo Forte e Gentile

- by Mr. H.

- by mayap

- by markus

- by vicky21

- by ben-casey

- by user102533

- by arik-funke

- by Rajat Mehta

- by Peter Stewart

- by Konrad Rudolph

- by DanO

- by Ian

- by DanO

< Previous Page | 32 33 34 35 36 37 38 39 40 41 42 43 | Next Page >