parallel foreach - Page 30

The best way to predict performance without actually porting the code?

- by ardiyu07

I believe there are people with the same experience with me, where he/she must give a (estimated) performance report of porting a program from sequential to parallel with some designated multicore hardwares, with a very few amount of time given. For instance, if a 10K LoC sequential program was given and executes on Intel i7-3770k (not vectorized) in 100 ms, how long would it take to run if one parallelizes the code to a Tesla C2075 with NVIDIA CUDA, given that all kinds of parallelizing optimization techniques were done? (but you're only given 2-4 days to report the performance? assume that you didn't know the algorithm at all. Or perhaps it'd be safer if we just assume that it's an impossible situation to finish the job) Therefore, I'm wondering, what most likely be the fastest way to give such performance report? Is it safe to calculate solely by the hardware's capability, such as GFLOPs peak and memory bandwidth rate? Is there a mathematical way to calculate it? If there is, please prove your method with the corresponding problem description and the algorithm, and also the target hardwares' specifications. Or perhaps there already exists such tool to (roughly) estimate code porting? (Please don't the answer: 'kill yourself is the fastest way.')

Read the article

Does my TPL partitioner cause a deadlock?

- by Scott Chamberlain

I am starting to write my first parallel applications. This partitioner will enumerate over a IDataReader pulling chunkSize records at a time from the data-source. protected class DataSourcePartitioner<object[]> : System.Collections.Concurrent.Partitioner<object[]> { private readonly System.Data.IDataReader _Input; private readonly int _ChunkSize; public DataSourcePartitioner(System.Data.IDataReader input, int chunkSize = 10000) : base() { if (chunkSize < 1) throw new ArgumentOutOfRangeException("chunkSize"); _Input = input; _ChunkSize = chunkSize; } public override bool SupportsDynamicPartitions { get { return true; } } public override IList<IEnumerator<object[]>> GetPartitions(int partitionCount) { var dynamicPartitions = GetDynamicPartitions(); var partitions = new IEnumerator<object[]>[partitionCount]; for (int i = 0; i < partitionCount; i++) { partitions[i] = dynamicPartitions.GetEnumerator(); } return partitions; } public override IEnumerable<object[]> GetDynamicPartitions() { return new ListDynamicPartitions(_Input, _ChunkSize); } private class ListDynamicPartitions : IEnumerable<object[]> { private System.Data.IDataReader _Input; int _ChunkSize; private object _ChunkLock = new object(); public ListDynamicPartitions(System.Data.IDataReader input, int chunkSize) { _Input = input; _ChunkSize = chunkSize; } public IEnumerator<object[]> GetEnumerator() { while (true) { List<object[]> chunk = new List<object[]>(_ChunkSize); lock(_Input) { for (int i = 0; i < _ChunkSize; ++i) { if (!_Input.Read()) break; var values = new object[_Input.FieldCount]; _Input.GetValues(values); chunk.Add(values); } if (chunk.Count == 0) yield break; } var chunkEnumerator = chunk.GetEnumerator(); lock(_ChunkLock) //Will this cause a deadlock? { while (chunkEnumerator.MoveNext()) { yield return chunkEnumerator.Current; } } } } IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<object[]>)this).GetEnumerator(); } } } I wanted IEnumerable object it passed back to be thread safe (the .Net example was so I am assuming PLINQ and TPL could need it) will the lock on _ChunkLock near the bottom help provide thread safety or will it cause a deadlock? From the documentation I could not tell if the lock would be released on the yeld return. Also if there is built in functionality to .net that will do what I am trying to do I would much rather use that. And if you find any other problems with the code I would appreciate it.

Read the article

Does my GetEnumerator cause a deadlock?

- by Scott Chamberlain

I am starting to write my first parallel applications. This partitioner will enumerate over a IDataReader pulling chunkSize records at a time from the data-source. TLDR; version private object _Lock = new object(); public IEnumerator GetEnumerator() { var infoSource = myInforSource.GetEnumerator(); //Will this cause a deadlock if two threads lock (_Lock) //use the enumator at the same time? { while (infoSource.MoveNext()) { yield return infoSource.Current; } } } full code protected class DataSourcePartitioner<object[]> : System.Collections.Concurrent.Partitioner<object[]> { private readonly System.Data.IDataReader _Input; private readonly int _ChunkSize; public DataSourcePartitioner(System.Data.IDataReader input, int chunkSize = 10000) : base() { if (chunkSize < 1) throw new ArgumentOutOfRangeException("chunkSize"); _Input = input; _ChunkSize = chunkSize; } public override bool SupportsDynamicPartitions { get { return true; } } public override IList<IEnumerator<object[]>> GetPartitions(int partitionCount) { var dynamicPartitions = GetDynamicPartitions(); var partitions = new IEnumerator<object[]>[partitionCount]; for (int i = 0; i < partitionCount; i++) { partitions[i] = dynamicPartitions.GetEnumerator(); } return partitions; } public override IEnumerable<object[]> GetDynamicPartitions() { return new ListDynamicPartitions(_Input, _ChunkSize); } private class ListDynamicPartitions : IEnumerable<object[]> { private System.Data.IDataReader _Input; int _ChunkSize; private object _ChunkLock = new object(); public ListDynamicPartitions(System.Data.IDataReader input, int chunkSize) { _Input = input; _ChunkSize = chunkSize; } public IEnumerator<object[]> GetEnumerator() { while (true) { List<object[]> chunk = new List<object[]>(_ChunkSize); lock(_Input) { for (int i = 0; i < _ChunkSize; ++i) { if (!_Input.Read()) break; var values = new object[_Input.FieldCount]; _Input.GetValues(values); chunk.Add(values); } if (chunk.Count == 0) yield break; } var chunkEnumerator = chunk.GetEnumerator(); lock(_ChunkLock) //Will this cause a deadlock? { while (chunkEnumerator.MoveNext()) { yield return chunkEnumerator.Current; } } } } IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<object[]>)this).GetEnumerator(); } } } I wanted IEnumerable object it passed back to be thread safe (the MSDN example was so I am assuming PLINQ and TPL could need it) will the lock on _ChunkLock near the bottom help provide thread safety or will it cause a deadlock? From the documentation I could not tell if the lock would be released on the yeld return. Also if there is built in functionality to .net that will do what I am trying to do I would much rather use that. And if you find any other problems with the code I would appreciate it.

Read the article

Applying the Knuth-Plass algorithm (or something better?) to read two books with different length and amount of chapters in parallel

- by user147133

I have a Bible reading plan that covers the whole Bible in 180 days. For the most of the time, I read 5 chapters in the Old Testament and 1 or 2 (1.5) chapters in the New Testament each day. The problem is that some chapters are longer than others (for example Psalm 119 which is 7 times longer than a average chapter in the Bible), and the plan I'm following doesn't take that in count. I end up with some days having a lot more to read than others. I thought I could use programming to make myself a better plan. I have a datastructure with a list of all chapters in the bible and their length in number of lines. (I found that the number of lines is the best criteria, but it could have been number of verses or number of words as well) I then started to think about this problem as a line wrap problem. Think of a chapter like a word, a day like a line and the whole plan as a paragraph. The "length" of a word (a chapter) is the number of lines in that chapter. I could then generate the best possible reading plan by applying a simplified Knuth-Plass algorithm to find the best breakpoints. This works well if I want to read the Bible from beginning to end. But I want to read a little from the new testament each day in parallel with the old testament. Of course I can run the Knuth-Plass algorithm on the Old Testament first, then on the New Testament and get two separate plans. But those plans merged is not a optimal plan. Worst-case days (days with extra much reading) in the New Testament plan will randomly occur on the same days as the worst-case days in the Old Testament. Since the New Testament have about 180*1.5 chapters, the plan is generally to read one chapter the first day, two the second, one the third etc... And I would like the plan for the Old Testament to compensate for this alternating length. So I will need a new and better algorithm, or I will have to use the Knuth-Plass algorithm in a way that I've not figured out. I think this could be a interesting and challenging nut for people interested in algorithms, so therefore I wanted to see if any of you have a good solution in mind.

Read the article

How to add correct cancellation when downloading a file with the example in the samples of the new P

- by Mike

Hello everybody, I have downloaded the last samples of the Parallel Programming team, and I don't succeed in adding correctly the possibility to cancel the download of a file. Here is the code I ended to have: var wreq = (HttpWebRequest)WebRequest.Create(uri); // Fire start event DownloadStarted(this, new DownloadStartedEventArgs(remoteFilePath)); long totalBytes = 0; wreq.DownloadDataInFileAsync(tmpLocalFile, cancellationTokenSource.Token, allowResume, totalBytesAction => { totalBytes = totalBytesAction; }, readBytes => { Log.Debug("Progression : {0} / {1} => {2}%", readBytes, totalBytes, 100 * (double)readBytes / totalBytes); DownloadProgress(this, new DownloadProgressEventArgs(remoteFilePath, readBytes, totalBytes, (int)(100 * readBytes / totalBytes))); }) .ContinueWith( (antecedent ) => { if (antecedent.IsFaulted) Log.Debug(antecedent.Exception.Message); //Fire end event SetEndDownload(antecedent.IsCanceled, antecedent.Exception, tmpLocalFile, 0); }, cancellationTokenSource.Token); I want to fire an end event after the download is finished, hence the ContinueWith. I slightly changed the code of the samples to add the CancellationToken and the 2 delegates to get the size of the file to download, and the progression of the download: return webRequest.GetResponseAsync() .ContinueWith(response => { if (totalBytesAction != null) totalBytesAction(response.Result.ContentLength); response.Result.GetResponseStream().WriteAllBytesAsync(filePath, ct, resumeDownload, progressAction).Wait(ct); }, ct); I had to add the call to the Wait function, because if I don't, the method exits and the end event is fired too early. Here are the modified method extensions (lot of code, apologies :p) public static Task WriteAllBytesAsync(this Stream stream, string filePath, CancellationToken ct, bool resumeDownload = false, Action<long> progressAction = null) { if (stream == null) throw new ArgumentNullException("stream"); // Copy from the source stream to the memory stream and return the copied data return stream.CopyStreamToFileAsync(filePath, ct, resumeDownload, progressAction); } public static Task CopyStreamToFileAsync(this Stream source, string destinationPath, CancellationToken ct, bool resumeDownload = false, Action<long> progressAction = null) { if (source == null) throw new ArgumentNullException("source"); if (destinationPath == null) throw new ArgumentNullException("destinationPath"); // Open the output file for writing var destinationStream = FileAsync.OpenWrite(destinationPath); // Copy the source to the destination stream, then close the output file. return CopyStreamToStreamAsync(source, destinationStream, ct, progressAction).ContinueWith(t => { var e = t.Exception; destinationStream.Close(); if (e != null) throw e; }, ct, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Current); } public static Task CopyStreamToStreamAsync(this Stream source, Stream destination, CancellationToken ct, Action<long> progressAction = null) { if (source == null) throw new ArgumentNullException("source"); if (destination == null) throw new ArgumentNullException("destination"); return Task.Factory.Iterate(CopyStreamIterator(source, destination, ct, progressAction)); } private static IEnumerable<Task> CopyStreamIterator(Stream input, Stream output, CancellationToken ct, Action<long> progressAction = null) { // Create two buffers. One will be used for the current read operation and one for the current // write operation. We'll continually swap back and forth between them. byte[][] buffers = new byte[2][] { new byte[BUFFER_SIZE], new byte[BUFFER_SIZE] }; int filledBufferNum = 0; Task writeTask = null; int readBytes = 0; // Until there's no more data to be read or cancellation while (true) { ct.ThrowIfCancellationRequested(); // Read from the input asynchronously var readTask = input.ReadAsync(buffers[filledBufferNum], 0, buffers[filledBufferNum].Length); // If we have no pending write operations, just yield until the read operation has // completed. If we have both a pending read and a pending write, yield until both the read // and the write have completed. yield return writeTask == null ? readTask : Task.Factory.ContinueWhenAll(new[] { readTask, writeTask }, tasks => tasks.PropagateExceptions()); // If no data was read, nothing more to do. if (readTask.Result <= 0) break; readBytes += readTask.Result; if (progressAction != null) progressAction(readBytes); // Otherwise, write the written data out to the file writeTask = output.WriteAsync(buffers[filledBufferNum], 0, readTask.Result); // Swap buffers filledBufferNum ^= 1; } } So basically, at the end of the chain of called methods, I let the CancellationToken throw an OperationCanceledException if a Cancel has been requested. What I hoped was to get IsFaulted == true in the appealing code and to fire the end event with the canceled flags and the correct exception. But what I get is an unhandled exception on the line response.Result.GetResponseStream().WriteAllBytesAsync(filePath, ct, resumeDownload, progressAction).Wait(ct); telling me that I don't catch an AggregateException. I've tried various things, but I don't succeed to make the whole thing work properly. Does anyone of you have played enough with that library and may help me? Thanks in advance Mike

Read the article

How to declare a(n) vector/array of reducer objects in Cilk++?

- by Jin

Hi All, I had a problem when I am using Cilk++, an extension to C++ for parallel computing. I found that I can't declare a vector of reducer objects: typedef cilk::reducer_opadd<int> T_reducer; vector<T_reducer> bitmiss_vec; for (int i = 0; i < 24; ++i) { T_reducer r; bitmiss_vec.push_back(r); } However, when I compile the code with Cilk++, it complains at the push_back() line: cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Tp*, const _Tp&) [with _Tp = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:601: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/ext/new_allocator.h:107: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:252: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:256: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In static member function ‘static _BI2 std::__copy_backward<_BoolType, std::random_access_iterator_tag>::__copy_b(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool _BoolType = false]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:465: instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:474: instantiated from ‘static _BI2 std::__copy_backward_normal<<anonymous>, <anonymous> >::__copy_b_n(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool <anonymous> = false, bool <anonymous> = false]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:540: instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:253: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:433: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = cilk::reducer_opadd<int>, _T2 = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:87: instantiated from ‘_ForwardIterator std::__uninitialized_copy_aux(_InputIterator, _InputIterator, _ForwardIterator, std::__false_type) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:114: instantiated from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:254: instantiated from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp>) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*, _Tp = cilk::reducer_opadd<int>]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:275: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_construct.h:81: error: within this context make: *** [geneAttack] Error 1 jinchen@galactica:~/workspace/biometrics/genAttack$ make cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack geneAttack.cilk: In function ‘int cilk cilk_main(int, char**)’: geneAttack.cilk:670: error: expected primary-expression before ‘,’ token geneAttack.cilk:670: error: expected primary-expression before ‘}’ token geneAttack.cilk:674: error: ‘bitmiss_vec’ was not declared in this scope make: *** [geneAttack] Error 1 The Cilk++ manule says it supports array/vector of reducers, although there are performance issues to consider: "If you create a large number of reducers (for example, an array or vector of reducers) you must be aware that there is an overhead at steal and reduce that is proportional to the number of reducers in the program. " Anyone knows what is going on? How should I declare/use vector of reducers? Thank you

Read the article

How to declare a vector or array of reducer objects in Cilk++?

- by Jin

Hi All, I had a problem when I am using Cilk++, an extension to C++ for parallel computing. I found that I can't declare a vector of reducer objects: typedef cilk::reducer_opadd<int> T_reducer; vector<T_reducer> bitmiss_vec; for (int i = 0; i < 24; ++i) { T_reducer r; bitmiss_vec.push_back(r); } However, when I compile the code with Cilk++, it complains at the push_back() line: cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void __gnu_cxx::new_allocator<_Tp>::construct(_Tp*, const _Tp&) [with _Tp = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:601: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/ext/new_allocator.h:107: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In member function ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:252: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:256: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In static member function ‘static _BI2 std::__copy_backward<_BoolType, std::random_access_iterator_tag>::__copy_b(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool _BoolType = false]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:465: instantiated from ‘_BI2 std::__copy_backward_aux(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:474: instantiated from ‘static _BI2 std::__copy_backward_normal<<anonymous>, <anonymous> >::__copy_b_n(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*, bool <anonymous> = false, bool <anonymous> = false]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:540: instantiated from ‘_BI2 std::copy_backward(_BI1, _BI1, _BI2) [with _BI1 = cilk::reducer_opadd<int>*, _BI2 = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:253: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:230: error: ‘cilk::reducer_opadd<Type>& cilk::reducer_opadd<Type>::operator=(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_algobase.h:433: error: within this context /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h: In function ‘void std::_Construct(_T1*, const _T2&) [with _T1 = cilk::reducer_opadd<int>, _T2 = cilk::reducer_opadd<int>]’: /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:87: instantiated from ‘_ForwardIterator std::__uninitialized_copy_aux(_InputIterator, _InputIterator, _ForwardIterator, std::__false_type) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:114: instantiated from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_uninitialized.h:254: instantiated from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp>) [with _InputIterator = cilk::reducer_opadd<int>*, _ForwardIterator = cilk::reducer_opadd<int>*, _Tp = cilk::reducer_opadd<int>]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/vector.tcc:275: instantiated from ‘void std::vector<_Tp, _Alloc>::_M_insert_aux(__gnu_cxx::__normal_iterator<typename std::_Vector_base<_Tp, _Alloc>::_Tp_alloc_type::pointer, std::vector<_Tp, _Alloc> >, const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_vector.h:605: instantiated from ‘void std::vector<_Tp, _Alloc>::push_back(const _Tp&) [with _Tp = cilk::reducer_opadd<int>, _Alloc = std::allocator<cilk::reducer_opadd<int> >]’ geneAttack.cilk:667: instantiated from here /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/cilk++/reducer_opadd.h:229: error: ‘cilk::reducer_opadd<Type>::reducer_opadd(const cilk::reducer_opadd<Type>&) [with Type = int]’ is private /usr/local/cilk/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.2.4/../../../../include/c++/4.2.4/bits/stl_construct.h:81: error: within this context make: *** [geneAttack] Error 1 jinchen@galactica:~/workspace/biometrics/genAttack$ make cilk++ geneAttack.cilk -O1 -g -lcilkutil -o geneAttack geneAttack.cilk: In function ‘int cilk cilk_main(int, char**)’: geneAttack.cilk:670: error: expected primary-expression before ‘,’ token geneAttack.cilk:670: error: expected primary-expression before ‘}’ token geneAttack.cilk:674: error: ‘bitmiss_vec’ was not declared in this scope make: *** [geneAttack] Error 1 The Cilk++ manule says it supports array/vector of reducers, although there are performance issues to consider: "If you create a large number of reducers (for example, an array or vector of reducers) you must be aware that there is an overhead at steal and reduce that is proportional to the number of reducers in the program. " Anyone knows what is going on? How should I declare/use vector of reducers? Thank you

Read the article

Visual Studio 2010 Brings Parallelism Mainstream

Visual Studio 2010 contains innovations in the parallel computing space to enable developers to cope with parallel applications Microsoft Visual Studio - Parallel computing - Microsoft - Programming - Tools

Read the article

Which parallel sorting algorithm has the best average case performance?

- by Craig P. Motlin

Sorting takes O(n log n) in the serial case. If we have O(n) processors we would hope for a linear speedup. O(log n) parallel algorithms exist but they have a very high constant. They also aren't applicable on commodity hardware which doesn't have anywhere near O(n) processors. With p processors, reasonable algorithms should take O(n/p log n/p) time. In the serial case, quick sort has the best runtime complexity on average. A parallel quick sort algorithm is easy to implement (see here and here). However it doesn't perform well since the very first step is to partition the whole collection on a single core. I have found information on many parallel sort algorithms but so far I have not seen anything pointing to a clear winner. I'm looking to sort lists of 1 million to 100 million elements in a JVM language running on 8 to 32 cores.

Read the article

Why aren't my JQuery .ajax requests being made in parallel?

- by Ryan Olson

I am trying to make two ajax requests in parallel using jQuery like this: var sources = ["source1", "source2"]; $(sources).each(function() { var source = this; $.ajax({ async: true, type: "POST", data: {post: "data", in: "here"}, url: "/my/url/" + source, success: function(data) { process_result(data); } }); }); I got the basic structure from this question, but my requests still aren't being made in parallel. "source1" takes a while to complete, and I can see on the server that the second request isn't made until the first is completed. As far as I can tell, I don't have any other active requests, so I don't think it's a problem with the maximum number of parallel requests for the browser. Am I missing something else here?

Read the article

Upload File to Windows Azure Blob in Chunks through ASP.NET MVC, JavaScript and HTML5

- by Shaun

Originally posted on: http://geekswithblogs.net/shaunxu/archive/2013/07/01/upload-file-to-windows-azure-blob-in-chunks-through-asp.net.aspxMany people are using Windows Azure Blob Storage to store their data in the cloud. Blob storage provides 99.9% availability with easy-to-use API through .NET SDK and HTTP REST. For example, we can store JavaScript files, images, documents in blob storage when we are building an ASP.NET web application on a Web Role in Windows Azure. Or we can store our VHD files in blob and mount it as a hard drive in our cloud service. If you are familiar with Windows Azure, you should know that there are two kinds of blob: page blob and block blob. The page blob is optimized for random read and write, which is very useful when you need to store VHD files. The block blob is optimized for sequential/chunk read and write, which has more common usage. Since we can upload block blob in blocks through BlockBlob.PutBlock, and them commit them as a whole blob with invoking the BlockBlob.PutBlockList, it is very powerful to upload large files, as we can upload blocks in parallel, and provide pause-resume feature. There are many documents, articles and blog posts described on how to upload a block blob. Most of them are focus on the server side, which means when you had received a big file, stream or binaries, how to upload them into blob storage in blocks through .NET SDK. But the problem is, how can we upload these large files from client side, for example, a browser. This questioned to me when I was working with a Chinese customer to help them build a network disk production on top of azure. The end users upload their files from the web portal, and then the files will be stored in blob storage from the Web Role. My goal is to find the best way to transform the file from client (end user’s machine) to the server (Web Role) through browser. In this post I will demonstrate and describe what I had done, to upload large file in chunks with high speed, and save them as blocks into Windows Azure Blob Storage. Traditional Upload, Works with Limitation The simplest way to implement this requirement is to create a web page with a form that contains a file input element and a submit button. 1: @using (Html.BeginForm("About", "Index", FormMethod.Post, new { enctype = "multipart/form-data" })) 2: { 3: <input type="file" name="file" /> 4: <input type="submit" value="upload" /> 5: } And then in the backend controller, we retrieve the whole content of this file and upload it in to the blob storage through .NET SDK. We can split the file in blocks and upload them in parallel and commit. The code had been well blogged in the community. 1: [HttpPost] 2: public ActionResult About(HttpPostedFileBase file) 3: { 4: var container = _client.GetContainerReference("test"); 5: container.CreateIfNotExists(); 6: var blob = container.GetBlockBlobReference(file.FileName); 7: var blockDataList = new Dictionary<string, byte[]>(); 8: using (var stream = file.InputStream) 9: { 10: var blockSizeInKB = 1024; 11: var offset = 0; 12: var index = 0; 13: while (offset < stream.Length) 14: { 15: var readLength = Math.Min(1024 * blockSizeInKB, (int)stream.Length - offset); 16: var blockData = new byte[readLength]; 17: offset += stream.Read(blockData, 0, readLength); 18: blockDataList.Add(Convert.ToBase64String(BitConverter.GetBytes(index)), blockData); 19: 20: index++; 21: } 22: } 23: 24: Parallel.ForEach(blockDataList, (bi) => 25: { 26: blob.PutBlock(bi.Key, new MemoryStream(bi.Value), null); 27: }); 28: blob.PutBlockList(blockDataList.Select(b => b.Key).ToArray()); 29: 30: return RedirectToAction("About"); 31: } This works perfect if we selected an image, a music or a small video to upload. But if I selected a large file, let’s say a 6GB HD-movie, after upload for about few minutes the page will be shown as below and the upload will be terminated. In ASP.NET there is a limitation of request length and the maximized request length is defined in the web.config file. It’s a number which less than about 4GB. So if we want to upload a really big file, we cannot simply implement in this way. Also, in Windows Azure, a cloud service network load balancer will terminate the connection if exceed the timeout period. From my test the timeout looks like 2 - 3 minutes. Hence, when we need to upload a large file we cannot just use the basic HTML elements. Besides the limitation mentioned above, the simple HTML file upload cannot provide rich upload experience such as chunk upload, pause and pause-resume. So we need to find a better way to upload large file from the client to the server. Upload in Chunks through HTML5 and JavaScript In order to break those limitation mentioned above we will try to upload the large file in chunks. This takes some benefit to us such as - No request size limitation: Since we upload in chunks, we can define the request size for each chunks regardless how big the entire file is. - No timeout problem: The size of chunks are controlled by us, which means we should be able to make sure request for each chunk upload will not exceed the timeout period of both ASP.NET and Windows Azure load balancer. It was a big challenge to upload big file in chunks until we have HTML5. There are some new features and improvements introduced in HTML5 and we will use them to implement our solution. In HTML5, the File interface had been improved with a new method called “slice”. It can be used to read part of the file by specifying the start byte index and the end byte index. For example if the entire file was 1024 bytes, file.slice(512, 768) will read the part of this file from the 512nd byte to 768th byte, and return a new object of interface called "Blob”, which you can treat as an array of bytes. In fact, a Blob object represents a file-like object of immutable, raw data. The File interface is based on Blob, inheriting blob functionality and expanding it to support files on the user's system. For more information about the Blob please refer here. File and Blob is very useful to implement the chunk upload. We will use File interface to represent the file the user selected from the browser and then use File.slice to read the file in chunks in the size we wanted. For example, if we wanted to upload a 10MB file with 512KB chunks, then we can read it in 512KB blobs by using File.slice in a loop. Assuming we have a web page as below. User can select a file, an input box to specify the block size in KB and a button to start upload. 1: <div> 2: <input type="file" id="upload_files" name="files[]" /><br /> 3: Block Size: <input type="number" id="block_size" value="512" name="block_size" />KB<br /> 4: <input type="button" id="upload_button_blob" name="upload" value="upload (blob)" /> 5: </div> Then we can have the JavaScript function to upload the file in chunks when user clicked the button. 1: <script type="text/javascript"> 1: 2: $(function () { 3: $("#upload_button_blob").click(function () { 4: }); 5: });</script> Firstly we need to ensure the client browser supports the interfaces we are going to use. Just try to invoke the File, Blob and FormData from the “window” object. If any of them is “undefined” the condition result will be “false” which means your browser doesn’t support these premium feature and it’s time for you to get your browser updated. FormData is another new feature we are going to use in the future. It could generate a temporary form for us. We will use this interface to create a form with chunk and associated metadata when invoked the service through ajax. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: if (window.File && window.Blob && window.FormData) { 4: alert("Your brwoser is awesome, let's rock!"); 5: } 6: else { 7: alert("Oh man plz update to a modern browser before try is cool stuff out."); 8: return; 9: } 10: }); Each browser supports these interfaces by their own implementation and currently the Blob, File and File.slice are supported by Chrome 21, FireFox 13, IE 10, Opera 12 and Safari 5.1 or higher. After that we worked on the files the user selected one by one since in HTML5, user can select multiple files in one file input box. 1: var files = $("#upload_files")[0].files; 2: for (var i = 0; i < files.length; i++) { 3: var file = files[i]; 4: var fileSize = file.size; 5: var fileName = file.name; 6: } Next, we calculated the start index and end index for each chunks based on the size the user specified from the browser. We put them into an array with the file name and the index, which will be used when we upload chunks into Windows Azure Blob Storage as blocks since we need to specify the target blob name and the block index. At the same time we will store the list of all indexes into another variant which will be used to commit blocks into blob in Azure Storage once all chunks had been uploaded successfully. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: var blockSizeInKB = $("#block_size").val(); 14: var blockSize = blockSizeInKB * 1024; 15: var blocks = []; 16: var offset = 0; 17: var index = 0; 18: var list = ""; 19: while (offset < fileSize) { 20: var start = offset; 21: var end = Math.min(offset + blockSize, fileSize); 22: 23: blocks.push({ 24: name: fileName, 25: index: index, 26: start: start, 27: end: end 28: }); 29: list += index + ","; 30: 31: offset = end; 32: index++; 33: } 34: } 35: }); Now we have all chunks’ information ready. The next step should be upload them one by one to the server side, and at the server side when received a chunk it will upload as a block into Blob Storage, and finally commit them with the index list through BlockBlobClient.PutBlockList. But since all these invokes are ajax calling, which means not synchronized call. So we need to introduce a new JavaScript library to help us coordinate the asynchronize operation, which named “async.js”. You can download this JavaScript library here, and you can find the document here. I will not explain this library too much in this post. We will put all procedures we want to execute as a function array, and pass into the proper function defined in async.js to let it help us to control the execution sequence, in series or in parallel. Hence we will define an array and put the function for chunk upload into this array. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: 5: // start to upload each files in chunks 6: var files = $("#upload_files")[0].files; 7: for (var i = 0; i < files.length; i++) { 8: var file = files[i]; 9: var fileSize = file.size; 10: var fileName = file.name; 11: // calculate the start and end byte index for each blocks(chunks) 12: // with the index, file name and index list for future using 13: ... ... 14: 15: // define the function array and push all chunk upload operation into this array 16: blocks.forEach(function (block) { 17: putBlocks.push(function (callback) { 18: }); 19: }); 20: } 21: }); 22: }); As you can see, I used File.slice method to read each chunks based on the start and end byte index we calculated previously, and constructed a temporary HTML form with the file name, chunk index and chunk data through another new feature in HTML5 named FormData. Then post this form to the backend server through jQuery.ajax. This is the key part of our solution. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: blocks.forEach(function (block) { 15: putBlocks.push(function (callback) { 16: // load blob based on the start and end index for each chunks 17: var blob = file.slice(block.start, block.end); 18: // put the file name, index and blob into a temporary from 19: var fd = new FormData(); 20: fd.append("name", block.name); 21: fd.append("index", block.index); 22: fd.append("file", blob); 23: // post the form to backend service (asp.net mvc controller action) 24: $.ajax({ 25: url: "/Home/UploadInFormData", 26: data: fd, 27: processData: false, 28: contentType: "multipart/form-data", 29: type: "POST", 30: success: function (result) { 31: if (!result.success) { 32: alert(result.error); 33: } 34: callback(null, block.index); 35: } 36: }); 37: }); 38: }); 39: } 40: }); Then we will invoke these functions one by one by using the async.js. And once all functions had been executed successfully I invoked another ajax call to the backend service to commit all these chunks (blocks) as the blob in Windows Azure Storage. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.series(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); That’s all in the client side. The outline of our logic would be - Calculate the start and end byte index for each chunks based on the block size. - Defined the functions of reading the chunk form file and upload the content to the backend service through ajax. - Execute the functions defined in previous step with “async.js”. - Commit the chunks by invoking the backend service in Windows Azure Storage finally. Save Chunks as Blocks into Blob Storage In above we finished the client size JavaScript code. It uploaded the file in chunks to the backend service which we are going to implement in this step. We will use ASP.NET MVC as our backend service, and it will receive the chunks, upload into Windows Azure Bob Storage in blocks, then finally commit as one blob. As in the client side we uploaded chunks by invoking the ajax call to the URL "/Home/UploadInFormData", I created a new action under the Index controller and it only accepts HTTP POST request. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: } 8: catch (Exception e) 9: { 10: error = e.ToString(); 11: } 12: 13: return new JsonResult() 14: { 15: Data = new 16: { 17: success = string.IsNullOrWhiteSpace(error), 18: error = error 19: } 20: }; 21: } Then I retrieved the file name, index and the chunk content from the Request.Form object, which was passed from our client side. And then, used the Windows Azure SDK to create a blob container (in this case we will use the container named “test”.) and create a blob reference with the blob name (same as the file name). Then uploaded the chunk as a block of this blob with the index, since in Blob Storage each block must have an index (ID) associated with so that finally we can put all blocks as one blob by specifying their block ID list. 1: [HttpPost] 2: public JsonResult UploadInFormData() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var index = int.Parse(Request.Form["index"]); 9: var file = Request.Files[0]; 10: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 11: 12: var container = _client.GetContainerReference("test"); 13: container.CreateIfNotExists(); 14: var blob = container.GetBlockBlobReference(name); 15: blob.PutBlock(id, file.InputStream, null); 16: } 17: catch (Exception e) 18: { 19: error = e.ToString(); 20: } 21: 22: return new JsonResult() 23: { 24: Data = new 25: { 26: success = string.IsNullOrWhiteSpace(error), 27: error = error 28: } 29: }; 30: } Next, I created another action to commit the blocks into blob once all chunks had been uploaded. Similarly, I retrieved the blob name from the Request.Form. I also retrieved the chunks ID list, which is the block ID list from the Request.Form in a string format, split them as a list, then invoked the BlockBlob.PutBlockList method. After that our blob will be shown in the container and ready to be download. 1: [HttpPost] 2: public JsonResult Commit() 3: { 4: var error = string.Empty; 5: try 6: { 7: var name = Request.Form["name"]; 8: var list = Request.Form["list"]; 9: var ids = list 10: .Split(',') 11: .Where(id => !string.IsNullOrWhiteSpace(id)) 12: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 13: .ToArray(); 14: 15: var container = _client.GetContainerReference("test"); 16: container.CreateIfNotExists(); 17: var blob = container.GetBlockBlobReference(name); 18: blob.PutBlockList(ids); 19: } 20: catch (Exception e) 21: { 22: error = e.ToString(); 23: } 24: 25: return new JsonResult() 26: { 27: Data = new 28: { 29: success = string.IsNullOrWhiteSpace(error), 30: error = error 31: } 32: }; 33: } Now we finished all code we need. The whole process of uploading would be like this below. Below is the full client side JavaScript code. 1: <script type="text/javascript" src="~/Scripts/async.js"></script> 2: <script type="text/javascript"> 3: $(function () { 4: $("#upload_button_blob").click(function () { 5: // assert the browser support html5 6: if (window.File && window.Blob && window.FormData) { 7: alert("Your brwoser is awesome, let's rock!"); 8: } 9: else { 10: alert("Oh man plz update to a modern browser before try is cool stuff out."); 11: return; 12: } 13: 14: // start to upload each files in chunks 15: var files = $("#upload_files")[0].files; 16: for (var i = 0; i < files.length; i++) { 17: var file = files[i]; 18: var fileSize = file.size; 19: var fileName = file.name; 20: 21: // calculate the start and end byte index for each blocks(chunks) 22: // with the index, file name and index list for future using 23: var blockSizeInKB = $("#block_size").val(); 24: var blockSize = blockSizeInKB * 1024; 25: var blocks = []; 26: var offset = 0; 27: var index = 0; 28: var list = ""; 29: while (offset < fileSize) { 30: var start = offset; 31: var end = Math.min(offset + blockSize, fileSize); 32: 33: blocks.push({ 34: name: fileName, 35: index: index, 36: start: start, 37: end: end 38: }); 39: list += index + ","; 40: 41: offset = end; 42: index++; 43: } 44: 45: // define the function array and push all chunk upload operation into this array 46: var putBlocks = []; 47: blocks.forEach(function (block) { 48: putBlocks.push(function (callback) { 49: // load blob based on the start and end index for each chunks 50: var blob = file.slice(block.start, block.end); 51: // put the file name, index and blob into a temporary from 52: var fd = new FormData(); 53: fd.append("name", block.name); 54: fd.append("index", block.index); 55: fd.append("file", blob); 56: // post the form to backend service (asp.net mvc controller action) 57: $.ajax({ 58: url: "/Home/UploadInFormData", 59: data: fd, 60: processData: false, 61: contentType: "multipart/form-data", 62: type: "POST", 63: success: function (result) { 64: if (!result.success) { 65: alert(result.error); 66: } 67: callback(null, block.index); 68: } 69: }); 70: }); 71: }); 72: 73: // invoke the functions one by one 74: // then invoke the commit ajax call to put blocks into blob in azure storage 75: async.series(putBlocks, function (error, result) { 76: var data = { 77: name: fileName, 78: list: list 79: }; 80: $.post("/Home/Commit", data, function (result) { 81: if (!result.success) { 82: alert(result.error); 83: } 84: else { 85: alert("done!"); 86: } 87: }); 88: }); 89: } 90: }); 91: }); 92: </script> And below is the full ASP.NET MVC controller code. 1: public class HomeController : Controller 2: { 3: private CloudStorageAccount _account; 4: private CloudBlobClient _client; 5: 6: public HomeController() 7: : base() 8: { 9: _account = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("DataConnectionString")); 10: _client = _account.CreateCloudBlobClient(); 11: } 12: 13: public ActionResult Index() 14: { 15: ViewBag.Message = "Modify this template to jump-start your ASP.NET MVC application."; 16: 17: return View(); 18: } 19: 20: [HttpPost] 21: public JsonResult UploadInFormData() 22: { 23: var error = string.Empty; 24: try 25: { 26: var name = Request.Form["name"]; 27: var index = int.Parse(Request.Form["index"]); 28: var file = Request.Files[0]; 29: var id = Convert.ToBase64String(BitConverter.GetBytes(index)); 30: 31: var container = _client.GetContainerReference("test"); 32: container.CreateIfNotExists(); 33: var blob = container.GetBlockBlobReference(name); 34: blob.PutBlock(id, file.InputStream, null); 35: } 36: catch (Exception e) 37: { 38: error = e.ToString(); 39: } 40: 41: return new JsonResult() 42: { 43: Data = new 44: { 45: success = string.IsNullOrWhiteSpace(error), 46: error = error 47: } 48: }; 49: } 50: 51: [HttpPost] 52: public JsonResult Commit() 53: { 54: var error = string.Empty; 55: try 56: { 57: var name = Request.Form["name"]; 58: var list = Request.Form["list"]; 59: var ids = list 60: .Split(',') 61: .Where(id => !string.IsNullOrWhiteSpace(id)) 62: .Select(id => Convert.ToBase64String(BitConverter.GetBytes(int.Parse(id)))) 63: .ToArray(); 64: 65: var container = _client.GetContainerReference("test"); 66: container.CreateIfNotExists(); 67: var blob = container.GetBlockBlobReference(name); 68: blob.PutBlockList(ids); 69: } 70: catch (Exception e) 71: { 72: error = e.ToString(); 73: } 74: 75: return new JsonResult() 76: { 77: Data = new 78: { 79: success = string.IsNullOrWhiteSpace(error), 80: error = error 81: } 82: }; 83: } 84: } And if we selected a file from the browser we will see our application will upload chunks in the size we specified to the server through ajax call in background, and then commit all chunks in one blob. Then we can find the blob in our Windows Azure Blob Storage. Optimized by Parallel Upload In previous example we just uploaded our file in chunks. This solved the problem that ASP.NET MVC request content size limitation as well as the Windows Azure load balancer timeout. But it might introduce the performance problem since we uploaded chunks in sequence. In order to improve the upload performance we could modify our client side code a bit to make the upload operation invoked in parallel. The good news is that, “async.js” library provides the parallel execution function. If you remembered the code we invoke the service to upload chunks, it utilized “async.series” which means all functions will be executed in sequence. Now we will change this code to “async.parallel”. This will invoke all functions in parallel. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallel(putBlocks, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); In this way all chunks will be uploaded to the server side at the same time to maximize the bandwidth usage. This should work if the file was not very large and the chunk size was not very small. But for large file this might introduce another problem that too many ajax calls are sent to the server at the same time. So the best solution should be, upload the chunks in parallel with maximum concurrency limitation. The code below specified the concurrency limitation to 4, which means at the most only 4 ajax calls could be invoked at the same time. 1: $("#upload_button_blob").click(function () { 2: // assert the browser support html5 3: ... ... 4: // start to upload each files in chunks 5: var files = $("#upload_files")[0].files; 6: for (var i = 0; i < files.length; i++) { 7: var file = files[i]; 8: var fileSize = file.size; 9: var fileName = file.name; 10: // calculate the start and end byte index for each blocks(chunks) 11: // with the index, file name and index list for future using 12: ... ... 13: // define the function array and push all chunk upload operation into this array 14: ... ... 15: // invoke the functions one by one 16: // then invoke the commit ajax call to put blocks into blob in azure storage 17: async.parallelLimit(putBlocks, 4, function (error, result) { 18: var data = { 19: name: fileName, 20: list: list 21: }; 22: $.post("/Home/Commit", data, function (result) { 23: if (!result.success) { 24: alert(result.error); 25: } 26: else { 27: alert("done!"); 28: } 29: }); 30: }); 31: } 32: }); Summary In this post we discussed how to upload files in chunks to the backend service and then upload them into Windows Azure Blob Storage in blocks. We focused on the frontend side and leverage three new feature introduced in HTML 5 which are - File.slice: Read part of the file by specifying the start and end byte index. - Blob: File-like interface which contains the part of the file content. - FormData: Temporary form element that we can pass the chunk alone with some metadata to the backend service. Then we discussed the performance consideration of chunk uploading. Sequence upload cannot provide maximized upload speed, but the unlimited parallel upload might crash the browser and server if too many chunks. So we finally came up with the solution to upload chunks in parallel with the concurrency limitation. We also demonstrated how to utilize “async.js” JavaScript library to help us control the asynchronize call and the parallel limitation. Regarding the chunk size and the parallel limitation value there is no “best” value. You need to test vary composition and find out the best one for your particular scenario. It depends on the local bandwidth, client machine cores and the server side (Windows Azure Cloud Service Virtual Machine) cores, memory and bandwidth. Below is one of my performance test result. The client machine was Windows 8 IE 10 with 4 cores. I was using Microsoft Cooperation Network. The web site was hosted on Windows Azure China North data center (in Beijing) with one small web role (1.7GB 1 core CPU, 1.75GB memory with 100Mbps bandwidth). The test cases were - Chunk size: 512KB, 1MB, 2MB, 4MB. - Upload Mode: Sequence, parallel (unlimited), parallel with limit (4 threads, 8 threads). - Chunk Format: base64 string, binaries. - Target file: 100MB. - Each case was tested 3 times. Below is the test result chart. Some thoughts, but not guidance or best practice: - Parallel gets better performance than series. - No significant performance improvement between parallel 4 threads and 8 threads. - Transform with binaries provides better performance than base64. - In all cases, chunk size in 1MB - 2MB gets better performance. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

Any task-control algorithms programming practices?

- by NumberFour

Hi, I was just wondering if there's any field which concerns the task-control programming (or at least that's the way I call it). For a better explanation of task-control consider the following scenario: An application (master-thread) waits for a command - which might be a particular action or a set of actions the application should perform. When a command is received the master-thread creates a task (= spawns an independent thread which actually does the action) and adds a record in it's task-list - thus keeping track of the time of execution, thread handle, task priority...etc. The master-thread awaits for any other incoming commands while taking care of all the tasks - e.g: kills tasks running too long, prioritizes tasks with higher priorities, kills a task on a request of another task, limits the number of currently running tasks, allows task scheduling, cleans finished tasks (threads) and so on. The model is pretty similar to what we can see in OS dealing with running processes. Are there any good practices programming such task-models or is there some theoretical work done in this field? Maybe my question is too generalized, but at least I wanted to know whether there are any experiences working on such models or if there's a better approach. Thanks for any answers.

Read the article

Why aren't we programming on the GPU???

- by Chris

So I finally took the time to learn CUDA and get it installed and configured on my computer and I have to say, I'm quite impressed! Here's how it does rendering the Mandelbrot set at 1280 x 678 pixels on my home PC with a Q6600 and a GeForce 8800GTS (max of 1000 iterations): Maxing out all 4 CPU cores with OpenMP: 2.23 fps Running the same algorithm on my GPU: 104.7 fps And here's how fast I got it to render the whole set at 8192 x 8192 with a max of 1000 iterations: Serial implemetation on my home PC: 81.2 seconds All 4 CPU cores on my home PC (OpenMP): 24.5 seconds 32 processors on my school's super computer (MPI with master-worker): 1.92 seconds My home GPU (CUDA): 0.310 seconds 4 GPUs on my school's super computer (CUDA with static domain decomposition): 0.0547 seconds So here's my question - if we can get such huge speedups by programming the GPU instead of the CPU, why is nobody doing it??? I can think of so many things we could speed up like this, and yet I don't know of many commercial apps that are actually doing it. Also, what kinds of other speedups have you seen by offloading your computations to the GPU?

Read the article

Segmentation fault on MPI, runs properly on OpenMP

- by Bellman

Hi, I am trying to run a program on a computer cluster. The structure of the program is the following: PROGRAM something ... CALL subroutine1(...) ... END PROGRAM SUBROUTINE subroutine1(...) ... DO i=1,n CALL subroutine2(...) ENDDO ... END SUBROUTINE SUBROUTINE subroutine2(...) ... CALL subroutine3(...) CALL subroutine4(...) ... END SUBROUTINE The idea is to parallelize the loop that calls subroutine2. Main program basically only makes the call to subroutine1 and only its arguments are declared. I use two alternatives. On the one hand, I write OpenMP clauses arround the loop. On the other hand, I add an IF conditional branch arround the call and I use MPI to share the results. In the OpenMP case, I add CALL KMP_SET_STACKSIZE(402653184) at the beginning of the main program and I can run it with 8 threads on an 8 core machine. When I run it (on the same 8 core machine) with MPI (either using 8 or 1 processors) it crashes just when makes the call to subroutine3 with a segmentation fault (signal 11) error. If I comment subroutine4, then it doesn't crash (notice that it crashed just when calling subroutine3 and it works when commenting subroutine4). I compile with mpif90 using MPICH2 libraries and the following flags: -O3 -fpscomp logicals -openmp -threads -m64 -xS. The machine has EM64T architecture and I use a Debian Linux distribution. I set ulimit -s hard before running the program. Any ideas on what is going on? Has it something to do with stack size? Thanks in advance

Read the article

Programming for Multi core Processors

- by Chathuranga Chandrasekara

As far as I know, the multi-core architecture in a processor does not effect the program. The actual instruction execution is handled in a lower layer. my question is, Given that you have a multicore environment, Can I use any programming practices to utilize the available resources more effectively? How should I change my code to gain more performance in multicore environments?

Read the article

local variable 'sresult' referenced before assignment

- by user288558

I have had multiple problems trying to use PP. I am running python2.6 and pp 1.6.0 rc3. Using the following test code: import pp nodes=('mosura02','mosura03','mosura04','mosura05','mosura06', 'mosura09','mosura10','mosura11','mosura12') def pptester(): js=pp.Server(ppservers=nodes) tmp=[] for i in range(200): tmp.append(js.submit(ppworktest,(),(),('os',))) return tmp def ppworktest(): return os.system("uname -a") gives me the following result: In [10]: Exception in thread run_local: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 525, in __bootstrap_inner self.run() File "/usr/lib64/python2.6/threading.py", line 477, in run self.__target(*self.__args, **self.__kwargs) File "/home/wkerzend/python_coala/lib/python2.6/site-packages/pp.py", line 751, in _run_local job.finalize(sresult) UnboundLocalError: local variable 'sresult' referenced before assignment Exception in thread run_local: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 525, in __bootstrap_inner self.run() File "/usr/lib64/python2.6/threading.py", line 477, in run self.__target(*self.__args, **self.__kwargs) File "/home/wkerzend/python_coala/lib/python2.6/site-packages/pp.py", line 751, in _run_local job.finalize(sresult) UnboundLocalError: local variable 'sresult' referenced before assignment Exception in thread run_local: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 525, in __bootstrap_inner self.run() File "/usr/lib64/python2.6/threading.py", line 477, in run self.__target(*self.__args, **self.__kwargs) File "/home/wkerzend/python_coala/lib/python2.6/site-packages/pp.py", line 751, in _run_local job.finalize(sresult) UnboundLocalError: local variable 'sresult' referenced before assignment Exception in thread run_local: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 525, in __bootstrap_inner self.run() File "/usr/lib64/python2.6/threading.py", line 477, in run self.__target(*self.__args, **self.__kwargs) File "/home/wkerzend/python_coala/lib/python2.6/site-packages/pp.py", line 751, in _run_local job.finalize(sresult) UnboundLocalError: local variable 'sresult' referenced before assignment any help greatly appreciated

Read the article

Concurrent cartesian product algorithm in Clojure

- by jqno

Is there a good algorithm to calculate the cartesian product of three seqs concurrently in Clojure? I'm working on a small hobby project in Clojure, mainly as a means to learn the language, and its concurrency features. In my project, I need to calculate the cartesian product of three seqs (and do something with the results). I found the cartesian-product function in clojure.contrib.combinatorics, which works pretty well. However, the calculation of the cartesian product turns out to be the bottleneck of the program. Therefore, I'd like to perform the calculation concurrently. Now, for the map function, there's a convenient pmap alternative that magically makes the thing concurrent. Which is cool :). Unfortunately, such a thing doesn't exist for cartesian-product. I've looked at the source code, but I can't find an easy way to make it concurrent myself. Also, I've tried to implement an algorithm myself using map, but I guess my algorithmic skills aren't what they used to be. I managed to come up with something ugly for two seqs, but three was definitely a bridge too far. So, does anyone know of an algorithm that's already concurrent, or one that I can parallelize myself?

Read the article

F# performance in scientific computing

- by aaa

hello. I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching. I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are: How well does it do floating-point? Does it allow vector instructions how friendly is it towards optimizing compilers? How big a memory foot print does it have? Does it allow fine-grained control over memory locality? does it have capacity for distributed memory processors, for example Cray? what features does it have that may be of interest to computational science where heavy number processing is involved? Are there actual scientific computing implementations that use it? Thanks

Read the article

Hadooop map reduce

- by Aina Ari

Im very much new to map reduce and i completed hadoop wordcount example. In that example it produces unsorted file (with key value) of word counts. So is it possible to make it sorted according to the most number of word occurrences by combining another map reduce task to the earlier one. Thanks in Advance

Read the article

How to setup matlabpool for multiple processors?

- by JohnIdol

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things. This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM. On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng: matlabpool open 2 On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running: matlabpool open 8 I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance! So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors. My question is, how do I get matlab to work with those 8 processors? I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem. Any help appreciated!

Read the article

Return data from subroutine while the subroutine is still processing

- by Perl QuestionAsker

Is there any way to have a subroutine send data back while still processing? For instance (this example used simply to illustrate) - a subroutine reads a file. While it is reading through the file, if some condition is met, then "return" that line and keep processing. I know there are those that will answer - why would you want to do that? and why don't you just ...?, but I really would like to know if this is possible. Thank you so much in advance.

Read the article

Strategies to use Database Sequences?

- by Bruno Brant

Hello all, I have a high-end architecture which receives many requests every second (in fact, it can receive many requests every millisecond). The architecture is designed so that some controls rely on a certain unique id assigned to each request. To create such UID we use a DB2 Sequence. Right now I already understand that this approach is flawed, since using the database is costly, but it makes sense to do so because this value will also be used to log information on the database. My team has just found out an increase of almost 1000% in elapsed time for each transaction, which we are assuming happened because of the sequence. Now I wonder, using sequences will serialize access to my application? Since they have to guarantee that increments works the way they should, they have to, right? So, are there better strategies when using sequences? Please assume that I have no other way of obtaining a unique id other than relying on the database.

Read the article

CPU Affinity Masks (Putting Threads on different CPUs)

- by hahuang65

I have 4 threads, and I am trying to set thread 1 to run on CPU 1, thread 2 on CPU 2, etc. However, when I run my code below, the affinity masks are returning the correct values, but when I do a sched_getcpu() on the threads, they all return that they are running on CPU 4. Anybody know what my problem here is? Thanks in advance! #define _GNU_SOURCE #include <stdio.h> #include <pthread.h> #include <stdlib.h> #include <sched.h> #include <errno.h> void *pthread_Message(char *message) { printf("%s is running on CPU %d\n", message, sched_getcpu()); } int main() { pthread_t thread1, thread2, thread3, thread4; pthread_t threadArray[4]; cpu_set_t cpu1, cpu2, cpu3, cpu4; char *thread1Msg = "Thread 1"; char *thread2Msg = "Thread 2"; char *thread3Msg = "Thread 3"; char *thread4Msg = "Thread 4"; int thread1Create, thread2Create, thread3Create, thread4Create, i, temp; CPU_ZERO(&cpu1); CPU_SET(1, &cpu1); temp = pthread_setaffinity_np(thread1, sizeof(cpu_set_t), &cpu1); printf("Set returned by pthread_getaffinity_np() contained:\n"); for (i = 0; i < CPU_SETSIZE; i++) if (CPU_ISSET(i, &cpu1)) printf("CPU1: CPU %d\n", i); CPU_ZERO(&cpu2); CPU_SET(2, &cpu2); temp = pthread_setaffinity_np(thread2, sizeof(cpu_set_t), &cpu2); for (i = 0; i < CPU_SETSIZE; i++) if (CPU_ISSET(i, &cpu2)) printf("CPU2: CPU %d\n", i); CPU_ZERO(&cpu3); CPU_SET(3, &cpu3); temp = pthread_setaffinity_np(thread3, sizeof(cpu_set_t), &cpu3); for (i = 0; i < CPU_SETSIZE; i++) if (CPU_ISSET(i, &cpu3)) printf("CPU3: CPU %d\n", i); CPU_ZERO(&cpu4); CPU_SET(4, &cpu4); temp = pthread_setaffinity_np(thread4, sizeof(cpu_set_t), &cpu4); for (i = 0; i < CPU_SETSIZE; i++) if (CPU_ISSET(i, &cpu4)) printf("CPU4: CPU %d\n", i); thread1Create = pthread_create(&thread1, NULL, (void *)pthread_Message, thread1Msg); thread2Create = pthread_create(&thread2, NULL, (void *)pthread_Message, thread2Msg); thread3Create = pthread_create(&thread3, NULL, (void *)pthread_Message, thread3Msg); thread4Create = pthread_create(&thread4, NULL, (void *)pthread_Message, thread4Msg); pthread_join(thread1, NULL); pthread_join(thread2, NULL); pthread_join(thread3, NULL); pthread_join(thread4, NULL); return 0; }

Read the article

chaining array of tasks with continuation

- by Andrei Cristof

I have a Task structure that is a little bit complex(for me at least). The structure is: (where T = Task) T1, T2, T3... Tn. There's an array (a list of files), and the T's represent tasks created for each file. Each T has always two subtasks that it must complete or fail: Tn.1, Tn.2 - download and install. For each download (Tn.1) there are always two subtasks to try, download from two paths(Tn.1.1, Tn.1.2). Execution would be: First, download file: Tn1.1. If Tn.1.1 fails, then Tn.1.2 executes. If either from download tasks returns OK - execute Tn.2. If Tn.2 executed or failed - go to next Tn. I figured the first thing to do, was to write all this structure with jagged arrays: private void CreateTasks() { //main array Task<int>[][][] mainTask = new Task<int>[_queuedApps.Count][][]; for (int i = 0; i < mainTask.Length; i++) { Task<int>[][] arr = GenerateOperationTasks(); mainTask[i] = arr; } } private Task<int>[][] GenerateOperationTasks() { //two download tasks Task<int>[] downloadTasks = new Task<int>[2]; downloadTasks[0] = new Task<int>(() => { return 0; }); downloadTasks[1] = new Task<int>(() => { return 0; }); //one installation task Task<int>[] installTask = new Task<int>[1] { new Task<int>(() => { return 0; }) }; //operations Task is jagged - keeps tasks above Task<int>[][] operationTasks = new Task<int>[2][]; operationTasks[0] = downloadTasks; operationTasks[1] = installTask; return operationTasks; } So now I got my mainTask array of tasks, containing nicely ordered tasks just as described above. However after reading the docs on ContinuationTasks, I realise this does not help me since I must call e.g. Task.ContinueWith(Task2). I'm stumped about doing this on my mainTask array. I can't write mainTask[0].ContinueWith(mainTask[1]) because I dont know the size of the array. If I could somehow reference the next task in the array (but without knowing its index), but cant figure out how. Any ideas? Thank you very much for your help. Regards,

Read the article

Submitting R jobs using PBS

- by Tony

I am submitting a job using qsub that runs parallelized R. My intention is to have R programme running on 4 different cores rather than 8 cores. Here are some of my settings in PBS file: #PBS -l nodes=1:ppn=4 .... time R --no-save < program1.R > program1.log I am issuing the command "ta job_id" and I'm seeing that 4 cores are listed. However, the job occupies a large amount of memory(31944900k used vs 32949628k total). If I were to use 8 cores, the jobs got hang due to memory limitation. top - 21:03:53 up 77 days, 11:54, 0 users, load average: 3.99, 3.75, 3.37 Tasks: 207 total, 5 running, 202 sleeping, 0 stopped, 0 zombie Cpu(s): 30.4%us, 1.6%sy, 0.0%ni, 66.8%id, 0.0%wa, 0.0%hi, 1.2%si, 0.0%st Mem: 32949628k total, 31944900k used, 1004728k free, 269812k buffers Swap: 2097136k total, 8360k used, 2088776k free, 6030856k cached Here is a snapshot when issuing command ta job_id PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1794 x 25 0 6247m 6.0g 1780 R 99.2 19.1 8:14.37 R 1795 x 25 0 6332m 6.1g 1780 R 99.2 19.4 8:14.37 R 1796 x 25 0 6242m 6.0g 1784 R 99.2 19.1 8:14.37 R 1797 x 25 0 6322m 6.1g 1780 R 99.2 19.4 8:14.33 R 1714 x 18 0 65932 1504 1248 S 0.0 0.0 0:00.00 bash 1761 x 18 0 63840 1244 1052 S 0.0 0.0 0:00.00 20016.hpc 1783 x 18 0 133m 7096 1128 S 0.0 0.0 0:00.00 python 1786 x 18 0 137m 46m 2688 S 0.0 0.1 0:02.06 R How can I prevent other users to use the other 4 cores? I like to mask somehow that my job is using 8 cores with 4 cores idling. Could anyone kindly help me out on this? Can this be solved using pbs? Many Thanks

Search Results

Search found 6559 results on 263 pages for 'parallel foreach'.

Page 30/263 | < Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >

- by ardiyu07

- by Scott Chamberlain

- by Scott Chamberlain

- by user147133

- by Mike

- by Jin

- by Jin

- by Craig P. Motlin

- by Ryan Olson

- by Shaun

- by NumberFour

- by Chris

- by Bellman

- by Chathuranga Chandrasekara

- by user288558

- by jqno

- by aaa

- by Aina Ari

- by JohnIdol

- by Perl QuestionAsker

- by Bruno Brant

- by hahuang65

- by Andrei Cristof

- by Tony

< Previous Page | 26 27 28 29 30 31 32 33 34 35 36 37 | Next Page >