parallel - Page 17 - Developer IT

Physical Cores vs Virtual Cores in Parallelism

- by Code Curiosity

When it comes to virtualization, I have been deliberating on the relationship between the physical cores and the virtual cores, especially in how it effects applications employing parallelism. For example, in a VM scenario, if there are less physical cores than there are virtual cores, if that's possible, what's the effect or limits placed on the application's parallel processing? I'm asking, because in my environment, it's not disclosed as to what the physical architecture is. Is there still much advantage to parallelizing if the application lives on a dual core VM hosted on a single core physical machine?

Read the article

Activate thread synchronically

- by mayap

Hi All, I'm using .Net 4.0 parallel library. The tasks I execute, ask to run some other task, sometimes synchronically and somethimes asynchronically, dependending on some conditions which are not known in advanced. For async call, i simply create new tasks and that's it. I don't know how to handly sync call: how to run it from the same thread, maybe that sync tasks will also ask to execute sync tasks recursively. all this issue is pretty new to me. thanks in advance.

Read the article

Stack usage with MMX intrinsics and Microsoft C++

- by arik-funke

I have an inline assembler loop that cumulatively adds elements from an int32 data array with MMX instructions. In particular, it uses the fact that the MMX registers can accommodate 16 int32s to calculate 16 different cumulative sums in parallel. I would now like to convert this piece of code to MMX intrinsics but I am afraid that I will suffer a performance penalty because one cannot explicitly intruct the compiler to use the 8 MMX registers to accomulate 16 independent sums. Can anybody comment on this and maybe propose a solution on how to convert the piece of code below to use intrinsics? == inline assembler (only part within the loop) == paddd mm0, [esi+edx+8*0] ; add first & second pair of int32 elements paddd mm1, [esi+edx+8*1] ; add third & fourth pair of int32 elements ... paddd mm2, [esi+edx+8*2] paddd mm3, [esi+edx+8*3] paddd mm4, [esi+edx+8*4] paddd mm5, [esi+edx+8*5] paddd mm6, [esi+edx+8*6] paddd mm7, [esi+edx+8*7] ; add 15th & 16th pair of int32 elements esi points to the beginning of the data array edx provides the offset in the data array for the current loop iteration the data array is arranged such that the elements for the 16 independent sums are interleaved.

Read the article

Ideas on frameworks in .NET that can be used for job processing and notifications

- by Rajat Mehta

Scenario: We have one instance of WCF windows service which exposes contracts like: AddNewJob(Job job), GetJobs(JobQuery query) etc. This service is consumed by 70-100 instances of client which is Windows Form based .NET app. Typically the service has 50-100 inward calls/minute to add or query jobs that are stored in a table on Sql Server. The same service is also responsible for processing these jobs in real time. It queries database every 5 seconds picks up the queued jobs and starts processing them. A job has 6 states. Queued, Pre-processing, Processing, Post-processing, Completed, Failed, Locked. Another responsibility on this service is to update all clients on every state change of every job. This means almost 200+ callbacks to clients per second. Question: This whole implementation is done using WCF Duplex bindings and works perfectly fine on small number of parallel jobs. Problem arises when we scale it up to 1000 jobs at a time. The notifications don't work as expected, it leads to memory overflow etc. Is there any standard framework that can provide a clean infrastructure for handling this scenario?? Apologies for the long explanation!

Read the article

How do I optimize this postfix expression tree for speed?

- by Peter Stewart

Thanks to the help I received in this post: I have a nice, concise recursive function to traverse a tree in postfix order: deque <char*> d; void Node::postfix() { if (left != __nullptr) { left->postfix(); } if (right != __nullptr) { right->postfix(); } d.push_front(cargo); return; }; This is an expression tree. The branch nodes are operators randomly selected from an array, and the leaf nodes are values or the variable 'x', also randomly selected from an array. char *values[10]={"1.0","2.0","3.0","4.0","5.0","6.0","7.0","8.0","9.0","x"}; char *ops[4]={"+","-","*","/"}; As this will be called billions of times during a run of the genetic algorithm of which it is a part, I'd like to optimize it for speed. I have a number of questions on this topic which I will ask in separate postings. The first is: how can I get access to each 'cargo' as it is found. That is: instead of pushing 'cargo' onto a deque, and then processing the deque to get the value, I'd like to start processing it right away. I don't yet know about parallel processing in c++, but this would ideally be done concurrently on two different processors. In python, I'd make the function a generator and access succeeding 'cargo's using .next(). But I'm using c++ to speed up the python implementation. I'm thinking that this kind of tree has been around for a long time, and somebody has probably optimized it already. Any Ideas? Thanks

Read the article

Thread management advice - Is TPL a good idea?

- by Ian

I'm hoping to get some advice on the use of thread managment and hopefully the task parallel library, because I'm not sure I've been going down the correct route. Probably best is that I give an outline of what I'm trying to do. Given a Problem I need to generate a Solution using a heuristic based algorithm. I start of by calculating a base solution, this operation I don't think can be parallelised so we don't need to worry about. Once the inital solution has been generated, I want to trigger n threads, which attempt to find a better solution. These threads need to do a couple of things: They need to be initalized with a different 'optimization metric'. In other words they are attempting to optimize different things, with a precedence level set within code. This means they all run slightly different calculation engines. I'm not sure if I can do this with the TPL.. If one of the threads finds a better solution that the currently best known solution (which needs to be shared across all threads) then it needs to update the best solution, and force a number of other threads to restart (again this depends on precedence levels of the optimization metrics). I may also wish to combine certain calculations across threads (e.g. keep a union of probabilities for a certain approach to the problem). This is probably more optional though. The whole system needs to be thread safe obviously and I want it to be running as fast as possible. I tried quite an implementation that involved managing my own threads and shutting them down etc, but it started getting quite complicated, and I'm now wondering if the TPL might be better. I'm wondering if anyone can offer any general guidance? Thanks...

Read the article

What hash algorithms are paralellizable? Optimizing the hashing of large files utilizing on mult-co

- by DanO

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

Read the article

OpenMP: Get total number of running threads

- by Konrad Rudolph

I need to know the total number of threads that my application has spawned via OpenMP. Unfortunately, the omp_get_num_threads() function does not work here since it only yields the number of threads in the current team. However, my code runs recursively (divide and conquer, basically) and I want to spawn new threads as long as there are still idle processors, but no more. Is there a way to get around the limitations of omp_get_num_threads and get the total number of running threads? If more detail is required, consider the following pseudo-code that models my workflow quite closely: function divide_and_conquer(Job job, int total_num_threads): if job.is_leaf(): # Recurrence base case. job.process() return left, right = job.divide() current_num_threads = omp_get_num_threads() if current_num_threads < total_num_threads: # (1) #pragma omp parallel num_threads(2) #pragma omp section divide_and_conquer(left, total_num_threads) #pragma omp section divide_and_conquer(right, total_num_threads) else: divide_and_conquer(left, total_num_threads) divide_and_conquer(right, total_num_threads) job = merge(left, right) If I call this code with a total_num_threads value of 4, the conditional annotated with (1) will always evaluate to true (because each thread team will contain at most two threads) and thus the code will always spawn two new threads, no matter how many threads are already running at a higher level. I am searching for a platform-independent way of determining the total number of threads that are currently running in my application.

Read the article

What hash algorithms are parallelizable? Optimizing the hashing of large files utilizing on multi-co

- by DanO

I'm interested in optimizing the hashing of some large files (optimizing wall clock time). The I/O has been optimized well enough already and the I/O device (local SSD) is only tapped at about 25% of capacity, while one of the CPU cores is completely maxed-out. I have more cores available, and in the future will likely have even more cores. So far I've only been able to tap into more cores if I happen to need multiple hashes of the same file, say an MD5 AND a SHA256 at the same time. I can use the same I/O stream to feed two or more hash algorithms, and I get the faster algorithms done for free (as far as wall clock time). As I understand most hash algorithms, each new bit changes the entire result, and it is inherently challenging/impossible to do in parallel. Are any of the mainstream hash algorithms parallelizable? Are there any non-mainstream hashes that are parallelizable (and that have at least a sample implementation available)? As future CPUs will trend toward more cores and a leveling off in clock speed, is there any way to improve the performance of file hashing? (other than liquid nitrogen cooled overclocking?) or is it inherently non-parallelizable?

Read the article

The best way to predict performance without actually porting the code?

- by ardiyu07

I believe there are people with the same experience with me, where he/she must give a (estimated) performance report of porting a program from sequential to parallel with some designated multicore hardwares, with a very few amount of time given. For instance, if a 10K LoC sequential program was given and executes on Intel i7-3770k (not vectorized) in 100 ms, how long would it take to run if one parallelizes the code to a Tesla C2075 with NVIDIA CUDA, given that all kinds of parallelizing optimization techniques were done? (but you're only given 2-4 days to report the performance? assume that you didn't know the algorithm at all. Or perhaps it'd be safer if we just assume that it's an impossible situation to finish the job) Therefore, I'm wondering, what most likely be the fastest way to give such performance report? Is it safe to calculate solely by the hardware's capability, such as GFLOPs peak and memory bandwidth rate? Is there a mathematical way to calculate it? If there is, please prove your method with the corresponding problem description and the algorithm, and also the target hardwares' specifications. Or perhaps there already exists such tool to (roughly) estimate code porting? (Please don't the answer: 'kill yourself is the fastest way.')

Read the article

Performing a Depth First Search iteratively using async/parallel processing?

- by Prabhu

Here is a method that does a DFS search and returns a list of all items given a top level item id. How could I modify this to take advantage of parallel processing? Currently, the call to get the sub items is made one by one for each item in the stack. It would be nice if I could get the sub items for multiple items in the stack at the same time, and populate my return list faster. How could I do this (either using async/await or TPL, or anything else) in a thread safe manner? private async Task<IList<Item>> GetItemsAsync(string topItemId) { var items = new List<Item>(); var topItem = await GetItemAsync(topItemId); Stack<Item> stack = new Stack<Item>(); stack.Push(topItem); while (stack.Count > 0) { var item = stack.Pop(); items.Add(item); var subItems = await GetSubItemsAsync(item.SubId); foreach (var subItem in subItems) { stack.Push(subItem); } } return items; } I was thinking of something along these lines, but it's not coming together: var tasks = stack.Select(async item => { items.Add(item); var subItems = await GetSubItemsAsync(item.SubId); foreach (var subItem in subItems) { stack.Push(subItem); } }).ToList(); if (tasks.Any()) await Task.WhenAll(tasks); The language I'm using is C#.

Read the article

Does my TPL partitioner cause a deadlock?

- by Scott Chamberlain

I am starting to write my first parallel applications. This partitioner will enumerate over a IDataReader pulling chunkSize records at a time from the data-source. protected class DataSourcePartitioner<object[]> : System.Collections.Concurrent.Partitioner<object[]> { private readonly System.Data.IDataReader _Input; private readonly int _ChunkSize; public DataSourcePartitioner(System.Data.IDataReader input, int chunkSize = 10000) : base() { if (chunkSize < 1) throw new ArgumentOutOfRangeException("chunkSize"); _Input = input; _ChunkSize = chunkSize; } public override bool SupportsDynamicPartitions { get { return true; } } public override IList<IEnumerator<object[]>> GetPartitions(int partitionCount) { var dynamicPartitions = GetDynamicPartitions(); var partitions = new IEnumerator<object[]>[partitionCount]; for (int i = 0; i < partitionCount; i++) { partitions[i] = dynamicPartitions.GetEnumerator(); } return partitions; } public override IEnumerable<object[]> GetDynamicPartitions() { return new ListDynamicPartitions(_Input, _ChunkSize); } private class ListDynamicPartitions : IEnumerable<object[]> { private System.Data.IDataReader _Input; int _ChunkSize; private object _ChunkLock = new object(); public ListDynamicPartitions(System.Data.IDataReader input, int chunkSize) { _Input = input; _ChunkSize = chunkSize; } public IEnumerator<object[]> GetEnumerator() { while (true) { List<object[]> chunk = new List<object[]>(_ChunkSize); lock(_Input) { for (int i = 0; i < _ChunkSize; ++i) { if (!_Input.Read()) break; var values = new object[_Input.FieldCount]; _Input.GetValues(values); chunk.Add(values); } if (chunk.Count == 0) yield break; } var chunkEnumerator = chunk.GetEnumerator(); lock(_ChunkLock) //Will this cause a deadlock? { while (chunkEnumerator.MoveNext()) { yield return chunkEnumerator.Current; } } } } IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<object[]>)this).GetEnumerator(); } } } I wanted IEnumerable object it passed back to be thread safe (the .Net example was so I am assuming PLINQ and TPL could need it) will the lock on _ChunkLock near the bottom help provide thread safety or will it cause a deadlock? From the documentation I could not tell if the lock would be released on the yeld return. Also if there is built in functionality to .net that will do what I am trying to do I would much rather use that. And if you find any other problems with the code I would appreciate it.

Read the article

Does my GetEnumerator cause a deadlock?

- by Scott Chamberlain

I am starting to write my first parallel applications. This partitioner will enumerate over a IDataReader pulling chunkSize records at a time from the data-source. TLDR; version private object _Lock = new object(); public IEnumerator GetEnumerator() { var infoSource = myInforSource.GetEnumerator(); //Will this cause a deadlock if two threads lock (_Lock) //use the enumator at the same time? { while (infoSource.MoveNext()) { yield return infoSource.Current; } } } full code protected class DataSourcePartitioner<object[]> : System.Collections.Concurrent.Partitioner<object[]> { private readonly System.Data.IDataReader _Input; private readonly int _ChunkSize; public DataSourcePartitioner(System.Data.IDataReader input, int chunkSize = 10000) : base() { if (chunkSize < 1) throw new ArgumentOutOfRangeException("chunkSize"); _Input = input; _ChunkSize = chunkSize; } public override bool SupportsDynamicPartitions { get { return true; } } public override IList<IEnumerator<object[]>> GetPartitions(int partitionCount) { var dynamicPartitions = GetDynamicPartitions(); var partitions = new IEnumerator<object[]>[partitionCount]; for (int i = 0; i < partitionCount; i++) { partitions[i] = dynamicPartitions.GetEnumerator(); } return partitions; } public override IEnumerable<object[]> GetDynamicPartitions() { return new ListDynamicPartitions(_Input, _ChunkSize); } private class ListDynamicPartitions : IEnumerable<object[]> { private System.Data.IDataReader _Input; int _ChunkSize; private object _ChunkLock = new object(); public ListDynamicPartitions(System.Data.IDataReader input, int chunkSize) { _Input = input; _ChunkSize = chunkSize; } public IEnumerator<object[]> GetEnumerator() { while (true) { List<object[]> chunk = new List<object[]>(_ChunkSize); lock(_Input) { for (int i = 0; i < _ChunkSize; ++i) { if (!_Input.Read()) break; var values = new object[_Input.FieldCount]; _Input.GetValues(values); chunk.Add(values); } if (chunk.Count == 0) yield break; } var chunkEnumerator = chunk.GetEnumerator(); lock(_ChunkLock) //Will this cause a deadlock? { while (chunkEnumerator.MoveNext()) { yield return chunkEnumerator.Current; } } } } IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<object[]>)this).GetEnumerator(); } } } I wanted IEnumerable object it passed back to be thread safe (the MSDN example was so I am assuming PLINQ and TPL could need it) will the lock on _ChunkLock near the bottom help provide thread safety or will it cause a deadlock? From the documentation I could not tell if the lock would be released on the yeld return. Also if there is built in functionality to .net that will do what I am trying to do I would much rather use that. And if you find any other problems with the code I would appreciate it.

Read the article

Applying the Knuth-Plass algorithm (or something better?) to read two books with different length and amount of chapters in parallel

- by user147133

I have a Bible reading plan that covers the whole Bible in 180 days. For the most of the time, I read 5 chapters in the Old Testament and 1 or 2 (1.5) chapters in the New Testament each day. The problem is that some chapters are longer than others (for example Psalm 119 which is 7 times longer than a average chapter in the Bible), and the plan I'm following doesn't take that in count. I end up with some days having a lot more to read than others. I thought I could use programming to make myself a better plan. I have a datastructure with a list of all chapters in the bible and their length in number of lines. (I found that the number of lines is the best criteria, but it could have been number of verses or number of words as well) I then started to think about this problem as a line wrap problem. Think of a chapter like a word, a day like a line and the whole plan as a paragraph. The "length" of a word (a chapter) is the number of lines in that chapter. I could then generate the best possible reading plan by applying a simplified Knuth-Plass algorithm to find the best breakpoints. This works well if I want to read the Bible from beginning to end. But I want to read a little from the new testament each day in parallel with the old testament. Of course I can run the Knuth-Plass algorithm on the Old Testament first, then on the New Testament and get two separate plans. But those plans merged is not a optimal plan. Worst-case days (days with extra much reading) in the New Testament plan will randomly occur on the same days as the worst-case days in the Old Testament. Since the New Testament have about 180*1.5 chapters, the plan is generally to read one chapter the first day, two the second, one the third etc... And I would like the plan for the Old Testament to compensate for this alternating length. So I will need a new and better algorithm, or I will have to use the Knuth-Plass algorithm in a way that I've not figured out. I think this could be a interesting and challenging nut for people interested in algorithms, so therefore I wanted to see if any of you have a good solution in mind.

Read the article

How to add correct cancellation when downloading a file with the example in the samples of the new P

- by Mike

Hello everybody, I have downloaded the last samples of the Parallel Programming team, and I don't succeed in adding correctly the possibility to cancel the download of a file. Here is the code I ended to have: var wreq = (HttpWebRequest)WebRequest.Create(uri); // Fire start event DownloadStarted(this, new DownloadStartedEventArgs(remoteFilePath)); long totalBytes = 0; wreq.DownloadDataInFileAsync(tmpLocalFile, cancellationTokenSource.Token, allowResume, totalBytesAction => { totalBytes = totalBytesAction; }, readBytes => { Log.Debug("Progression : {0} / {1} => {2}%", readBytes, totalBytes, 100 * (double)readBytes / totalBytes); DownloadProgress(this, new DownloadProgressEventArgs(remoteFilePath, readBytes, totalBytes, (int)(100 * readBytes / totalBytes))); }) .ContinueWith( (antecedent ) => { if (antecedent.IsFaulted) Log.Debug(antecedent.Exception.Message); //Fire end event SetEndDownload(antecedent.IsCanceled, antecedent.Exception, tmpLocalFile, 0); }, cancellationTokenSource.Token); I want to fire an end event after the download is finished, hence the ContinueWith. I slightly changed the code of the samples to add the CancellationToken and the 2 delegates to get the size of the file to download, and the progression of the download: return webRequest.GetResponseAsync() .ContinueWith(response => { if (totalBytesAction != null) totalBytesAction(response.Result.ContentLength); response.Result.GetResponseStream().WriteAllBytesAsync(filePath, ct, resumeDownload, progressAction).Wait(ct); }, ct); I had to add the call to the Wait function, because if I don't, the method exits and the end event is fired too early. Here are the modified method extensions (lot of code, apologies :p) public static Task WriteAllBytesAsync(this Stream stream, string filePath, CancellationToken ct, bool resumeDownload = false, Action<long> progressAction = null) { if (stream == null) throw new ArgumentNullException("stream"); // Copy from the source stream to the memory stream and return the copied data return stream.CopyStreamToFileAsync(filePath, ct, resumeDownload, progressAction); } public static Task CopyStreamToFileAsync(this Stream source, string destinationPath, CancellationToken ct, bool resumeDownload = false, Action<long> progressAction = null) { if (source == null) throw new ArgumentNullException("source"); if (destinationPath == null) throw new ArgumentNullException("destinationPath"); // Open the output file for writing var destinationStream = FileAsync.OpenWrite(destinationPath); // Copy the source to the destination stream, then close the output file. return CopyStreamToStreamAsync(source, destinationStream, ct, progressAction).ContinueWith(t => { var e = t.Exception; destinationStream.Close(); if (e != null) throw e; }, ct, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Current); } public static Task CopyStreamToStreamAsync(this Stream source, Stream destination, CancellationToken ct, Action<long> progressAction = null) { if (source == null) throw new ArgumentNullException("source"); if (destination == null) throw new ArgumentNullException("destination"); return Task.Factory.Iterate(CopyStreamIterator(source, destination, ct, progressAction)); } private static IEnumerable<Task> CopyStreamIterator(Stream input, Stream output, CancellationToken ct, Action<long> progressAction = null) { // Create two buffers. One will be used for the current read operation and one for the current // write operation. We'll continually swap back and forth between them. byte[][] buffers = new byte[2][] { new byte[BUFFER_SIZE], new byte[BUFFER_SIZE] }; int filledBufferNum = 0; Task writeTask = null; int readBytes = 0; // Until there's no more data to be read or cancellation while (true) { ct.ThrowIfCancellationRequested(); // Read from the input asynchronously var readTask = input.ReadAsync(buffers[filledBufferNum], 0, buffers[filledBufferNum].Length); // If we have no pending write operations, just yield until the read operation has // completed. If we have both a pending read and a pending write, yield until both the read // and the write have completed. yield return writeTask == null ? readTask : Task.Factory.ContinueWhenAll(new[] { readTask, writeTask }, tasks => tasks.PropagateExceptions()); // If no data was read, nothing more to do. if (readTask.Result <= 0) break; readBytes += readTask.Result; if (progressAction != null) progressAction(readBytes); // Otherwise, write the written data out to the file writeTask = output.WriteAsync(buffers[filledBufferNum], 0, readTask.Result); // Swap buffers filledBufferNum ^= 1; } } So basically, at the end of the chain of called methods, I let the CancellationToken throw an OperationCanceledException if a Cancel has been requested. What I hoped was to get IsFaulted == true in the appealing code and to fire the end event with the canceled flags and the correct exception. But what I get is an unhandled exception on the line response.Result.GetResponseStream().WriteAllBytesAsync(filePath, ct, resumeDownload, progressAction).Wait(ct); telling me that I don't catch an AggregateException. I've tried various things, but I don't succeed to make the whole thing work properly. Does anyone of you have played enough with that library and may help me? Thanks in advance Mike

Read the article

How to declare a(n) vector/array of reducer objects in Cilk++?

- by Jin

Hi All, I had a problem when I am using Cilk++, an extension to C++ for parallel computing. I found that I can't declare a vector of reducer objects: typedef cilk::reducer_opadd<int> T_reducer; vector<T_reducer> bitmiss_vec; for (int i = 0; i < 24; ++i) { T_reducer r; bitmiss_vec.push_back(r); } However, when I compile the code with Cilk++, it complains at the push_back() line: cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Tp*, const _Tp&) [with _Tp = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:601: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/ext/new_allocator.h:107: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:252: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:256: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In static member function ‘static _BI2 std::__copy_backward<_BoolType, std::random_access_iterator_tag>::__copy_b(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool _BoolType = false]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:465: instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:474: instantiated from ‘static _BI2 std::__copy_backward_normal<<anonymous>, <anonymous> >::__copy_b_n(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool <anonymous> = false, bool <anonymous> = false]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:540: instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:253: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:433: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = cilk::reducer_opadd<int>, _T2 = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:87: instantiated from ‘_ForwardIterator std::__uninitialized_copy_aux(_InputIterator, _InputIterator, _ForwardIterator, std::__false_type) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:114: instantiated from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:254: instantiated from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp>) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*, _Tp = cilk::reducer_opadd<int>]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:275: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_construct.h:81: error: within this context make: *** [geneAttack] Error 1 jinchen@galactica:~/workspace/biometrics/genAttack$ make cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack geneAttack.cilk: In function ‘int cilk cilk_main(int, char**)’: geneAttack.cilk:670: error: expected primary-expression before ‘,’ token geneAttack.cilk:670: error: expected primary-expression before ‘}’ token geneAttack.cilk:674: error: ‘bitmiss_vec’ was not declared in this scope make: *** [geneAttack] Error 1 The Cilk++ manule says it supports array/vector of reducers, although there are performance issues to consider: "If you create a large number of reducers (for example, an array or vector of reducers) you must be aware that there is an overhead at steal and reduce that is proportional to the number of reducers in the program. " Anyone knows what is going on? How should I declare/use vector of reducers? Thank you

Read the article

How to declare a vector or array of reducer objects in Cilk++?

- by Jin

Hi All, I had a problem when I am using Cilk++, an extension to C++ for parallel computing. I found that I can't declare a vector of reducer objects: typedef cilk::reducer_opadd<int> T_reducer; vector<T_reducer> bitmiss_vec; for (int i = 0; i < 24; ++i) { T_reducer r; bitmiss_vec.push_back(r); } However, when I compile the code with Cilk++, it complains at the push_back() line: cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Tp*, const _Tp&) [with _Tp = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:601: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/ext/new_allocator.h:107: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:252: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:256: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In static member function ‘static _BI2 std::__copy_backward<_BoolType, std::random_access_iterator_tag>::__copy_b(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool _BoolType = false]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:465: instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:474: instantiated from ‘static _BI2 std::__copy_backward_normal<<anonymous>, <anonymous> >::__copy_b_n(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool <anonymous> = false, bool <anonymous> = false]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:540: instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:253: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:433: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = cilk::reducer_opadd<int>, _T2 = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:87: instantiated from ‘_ForwardIterator std::__uninitialized_copy_aux(_InputIterator, _InputIterator, _ForwardIterator, std::__false_type) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:114: instantiated from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:254: instantiated from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp>) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*, _Tp = cilk::reducer_opadd<int>]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:275: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_construct.h:81: error: within this context make: *** [geneAttack] Error 1 jinchen@galactica:~/workspace/biometrics/genAttack$ make cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack geneAttack.cilk: In function ‘int cilk cilk_main(int, char**)’: geneAttack.cilk:670: error: expected primary-expression before ‘,’ token geneAttack.cilk:670: error: expected primary-expression before ‘}’ token geneAttack.cilk:674: error: ‘bitmiss_vec’ was not declared in this scope make: *** [geneAttack] Error 1 The Cilk++ manule says it supports array/vector of reducers, although there are performance issues to consider: "If you create a large number of reducers (for example, an array or vector of reducers) you must be aware that there is an overhead at steal and reduce that is proportional to the number of reducers in the program. " Anyone knows what is going on? How should I declare/use vector of reducers? Thank you

Read the article

Visual Studio 2010 Brings Parallelism Mainstream

Visual Studio 2010 contains innovations in the parallel computing space to enable developers to cope with parallel applications Microsoft Visual Studio - Parallel computing - Microsoft - Programming - Tools

Read the article

Why aren't my JQuery .ajax requests being made in parallel?

- by Ryan Olson

I am trying to make two ajax requests in parallel using jQuery like this: var sources = ["source1", "source2"]; $(sources).each(function() { var source = this; $.ajax({ async: true, type: "POST", data: {post: "data", in: "here"}, url: "/my/url/" + source, success: function(data) { process_result(data); } }); }); I got the basic structure from this question, but my requests still aren't being made in parallel. "source1" takes a while to complete, and I can see on the server that the second request isn't made until the first is completed. As far as I can tell, I don't have any other active requests, so I don't think it's a problem with the maximum number of parallel requests for the browser. Am I missing something else here?

Read the article

How can I use foreach and fork together to do something in parallel?

- by kaushalmodi

This question is not UVM specific but the example that I am working on is UVM related. I have an array of agents in my UVM environment and I would like to launch a sequence on all of them in parallel. If I do the below: foreach (env.agt[i]) begin seq.start(env.agt[i].sqr); end , the sequence seq first executes on env.agt[0].sqr. Once that gets over, it then executes on env.agt[1].sqr and so on. I want to implement a foreach-fork statement so that seq is executed in parallel on all agt[i] sequencers. No matter how I order the fork-join and foreach, I am not able to achieve that. Can you please help me get that parallel sequence launching behavior? Thanks.

Read the article

Which parallel sorting algorithm has the best average case performance?

- by Craig P. Motlin

Sorting takes O(n log n) in the serial case. If we have O(n) processors we would hope for a linear speedup. O(log n) parallel algorithms exist but they have a very high constant. They also aren't applicable on commodity hardware which doesn't have anywhere near O(n) processors. With p processors, reasonable algorithms should take O(n/p log n/p) time. In the serial case, quick sort has the best runtime complexity on average. A parallel quick sort algorithm is easy to implement (see here and here). However it doesn't perform well since the very first step is to partition the whole collection on a single core. I have found information on many parallel sort algorithms but so far I have not seen anything pointing to a clear winner. I'm looking to sort lists of 1 million to 100 million elements in a JVM language running on 8 to 32 cores.

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

Any task-control algorithms programming practices?

- by NumberFour

Hi, I was just wondering if there's any field which concerns the task-control programming (or at least that's the way I call it). For a better explanation of task-control consider the following scenario: An application (master-thread) waits for a command - which might be a particular action or a set of actions the application should perform. When a command is received the master-thread creates a task (= spawns an independent thread which actually does the action) and adds a record in it's task-list - thus keeping track of the time of execution, thread handle, task priority...etc. The master-thread awaits for any other incoming commands while taking care of all the tasks - e.g: kills tasks running too long, prioritizes tasks with higher priorities, kills a task on a request of another task, limits the number of currently running tasks, allows task scheduling, cleans finished tasks (threads) and so on. The model is pretty similar to what we can see in OS dealing with running processes. Are there any good practices programming such task-models or is there some theoretical work done in this field? Maybe my question is too generalized, but at least I wanted to know whether there are any experiences working on such models or if there's a better approach. Thanks for any answers.

Read the article

Why aren't we programming on the GPU???

- by Chris

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the same algorithm on my GPU: 104.7 fps And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations: Serial implemetation on my home PC: 81.2 seconds All 4 CPU cores on my home PC (OpenMP): 24.5 seconds 32 processors on my school's super computer (MPI with master-worker): 1.92 seconds My home GPU (CUDA): 0.310 seconds 4 GPUs on my school's super computer (CUDA with static domain decomposition): 0.0547 seconds So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it. Also, what kinds of other speedups have you seen by offloading your computations to the GPU?

Read the article

Segmentation fault on MPI, runs properly on OpenMP

- by Bellman

Hi, I am trying to run a program on a computer cluster. The structure of the program is the following: PROGRAM something ... CALL subroutine1(...) ... END PROGRAM SUBROUTINE subroutine1(...) ... DO i=1,n CALL subroutine2(...) ENDDO ... END SUBROUTINE SUBROUTINE subroutine2(...) ... CALL subroutine3(...) CALL subroutine4(...) ... END SUBROUTINE The idea is to parallelize the loop that calls subroutine2. Main program basically only makes the call to subroutine1 and only its arguments are declared. I use two alternatives. On the one hand, I write OpenMP clauses arround the loop. On the other hand, I add an IF conditional branch arround the call and I use MPI to share the results. In the OpenMP case, I add CALL KMP_SET_STACKSIZE(402653184) at the beginning of the main program and I can run it with 8 threads on an 8 core machine. When I run it (on the same 8 core machine) with MPI (either using 8 or 1 processors) it crashes just when makes the call to subroutine3 with a segmentation fault (signal 11) error. If I comment subroutine4, then it doesn't crash (notice that it crashed just when calling subroutine3 and it works when commenting subroutine4). I compile with mpif90 using MPICH2 libraries and the following flags: -O3 -fpscomp logicals -openmp -threads -m64 -xS. The machine has EM64T architecture and I use a Debian Linux distribution. I set ulimit -s hard before running the program. Any ideas on what is going on? Has it something to do with stack size? Thanks in advance

Search Results

Search found 1774 results on 71 pages for 'parallel'.

Page 17/71 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >

- by Code Curiosity

- by mayap

- by arik-funke

- by Rajat Mehta

- by Peter Stewart

- by Ian

- by DanO

- by Konrad Rudolph

- by DanO

- by ardiyu07

- by Prabhu

- by Scott Chamberlain

- by Scott Chamberlain

- by user147133

- by Mike

- by Jin

- by Jin

- by Ryan Olson

- by kaushalmodi

- by Craig P. Motlin

- by Shaun

- by NumberFour

- by Chris

- by Bellman

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24 | Next Page >