Search Results

Search found 1774 results on 71 pages for 'parallel'.

Page 24/71 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >

GLES2.0 3D Android game performance and multi threading the update?

- by Ofer

I have profiled my mixed Java\C++ Android game and I got the following result: https://dl.dropbox.com/u/8025882/PompiDev/AndroidProfile.png As you can see, the pink think is a C++ functions that updates the game. It does things like updating the logic but it mostly it generates a "request list" for rendering. The thing is, I generate DrawLists on C++ and then send them to Java to process and draw using GLES2.0. Since then I was able to improve update from 9ms down to about 7ms, but I would like to ask if I would benefit from multi threading the update? As I understand from that diagram is that the function that takes the most time is the one you see it's color on the timeline. So the pink area is taken mostly by update. The other area has MainOpenGL.Handle as it's main contributor(whch is my java function), but since it's not drawn to the top of the diagram I can conclude other things are happening at the same time that use the CPU? Or even GPU stuff that isn't shown in this diagram. I am not sure how the GPU works on this. Does it calculate stuff in parallel to the CPU? Or is it part of the CPU usage as in SoC? I am not sure. Anyway, in case GPU things DO happen in parallel to CPU, then I would guess that if I do this C++ Update in parallel to the thread that makes the OpenGL calls, I might make use of "dead CPU time" due to GPU stalling or maybe have the GPU calls getting processed earlier because it won't have to wait for Update to finish? How do you suggest to improve performance based on that? Thanks.

Read the article
regex to break a string into "key" / "value" pairs when # of pairs is variable?

- by user141146

Hi, I'm using Ruby 1.9 and I'm wondering if there's a simple regex way to do this. I have many strings that look like some variation of this: str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment" The idea is that I'd like to break this string into its functional components Allocation: Random Control: Active Control Endpoint Classification: Safety Study Intervention Model: Parallel Assignment Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes, Assessor) Primary Purpose: Treatment The "syntax" of the string is that there is a "key" which consists of one or more "words or other characters" (e.g. Intervention Model) followed by a colon (:). Each key has a corresponding "value" (e.g., Parallel Assignment) that immediately follows the colon (:)…The "value" consists of words, commas (whatever), but the end of the "value" is signaled by a comma. The # of key/value pairs is variable. I'm also assuming that colons (:) aren't allowed to be part of the "value" and that commas (,) aren't allowed to be part of the "key". One would think that there is a "regexy" way to break this into its component pieces, but my attempt at making an appropriate matching regex only picks up the first key/value pair and I'm not sure how to capture the others. Any thoughts on how to capture the other matches? regex = /(([^,]+?): ([^:]+?,))+?/ => /(([^,]+?): ([^:]+?,))+?/ irb(main):139:0> str = "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment" => "Allocation: Random, Control: Active Control, Endpoint Classification: Safety Study, Intervention Model: Parallel Assignment, Masking: Double Blind (Subject, Caregiver, Investigator, Outcomes Assessor), Primary Purpose: Treatment" irb(main):140:0> str.match regex => #<MatchData "Allocation: Random," 1:"Allocation: Random," 2:"Allocation" 3:" Random,"> irb(main):141:0> $1 => "Allocation: Random," irb(main):142:0> $2 => "Allocation" irb(main):143:0> $3 => " Random," irb(main):144:0> $4 => nil

Read the article
Understanding G1 GC Logs

- by poonam

The purpose of this post is to explain the meaning of GC logs generated with some tracing and diagnostic options for G1 GC. We will take a look at the output generated with PrintGCDetails which is a product flag and provides the most detailed level of information. Along with that, we will also look at the output of two diagnostic flags that get enabled with -XX:+UnlockDiagnosticVMOptions option - G1PrintRegionLivenessInfo that prints the occupancy and the amount of space used by live objects in each region at the end of the marking cycle and G1PrintHeapRegions that provides detailed information on the heap regions being allocated and reclaimed. We will be looking at the logs generated with JDK 1.7.0_04 using these options. Option -XX:+PrintGCDetails Here's a sample log of G1 collection generated with PrintGCDetails. 0.522: [GC pause (young), 0.15877971 secs] [Parallel Time: 157.1 ms] [GC Worker Start (ms): 522.1 522.2 522.2 522.2 Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] [Processed Buffers : 2 2 3 2 Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] [GC Worker Other (ms): 0.3 0.3 0.3 0.3 Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] [Clear CT: 0.1 ms] [Other: 1.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.3 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.3 ms] [Eden: 12M(12M)->0B(10M) Survivors: 0B->2048K Heap: 13M(64M)->9739K(64M)] [Times: user=0.59 sys=0.02, real=0.16 secs] This is the typical log of an Evacuation Pause (G1 collection) in which live objects are copied from one set of regions (young OR young+old) to another set. It is a stop-the-world activity and all the application threads are stopped at a safepoint during this time. This pause is made up of several sub-tasks indicated by the indentation in the log entries. Here's is the top most line that gets printed for the Evacuation Pause. 0.522: [GC pause (young), 0.15877971 secs] This is the highest level information telling us that it is an Evacuation Pause that started at 0.522 secs from the start of the process, in which all the regions being evacuated are Young i.e. Eden and Survivor regions. This collection took 0.15877971 secs to finish. Evacuation Pauses can be mixed as well. In which case the set of regions selected include all of the young regions as well as some old regions. 1.730: [GC pause (mixed), 0.32714353 secs] Let's take a look at all the sub-tasks performed in this Evacuation Pause. [Parallel Time: 157.1 ms] Parallel Time is the total elapsed time spent by all the parallel GC worker threads. The following lines correspond to the parallel tasks performed by these worker threads in this total parallel time, which in this case is 157.1 ms. [GC Worker Start (ms): 522.1 522.2 522.2 522.2Avg: 522.2, Min: 522.1, Max: 522.2, Diff: 0.1] The first line tells us the start time of each of the worker thread in milliseconds. The start times are ordered with respect to the worker thread ids – thread 0 started at 522.1ms and thread 1 started at 522.2ms from the start of the process. The second line tells the Avg, Min, Max and Diff of the start times of all of the worker threads. [Ext Root Scanning (ms): 1.6 1.5 1.6 1.9 Avg: 1.7, Min: 1.5, Max: 1.9, Diff: 0.4] This gives us the time spent by each worker thread scanning the roots (globals, registers, thread stacks and VM data structures). Here, thread 0 took 1.6ms to perform the root scanning task and thread 1 took 1.5 ms. The second line clearly shows the Avg, Min, Max and Diff of the times spent by all the worker threads. [Update RS (ms): 38.7 38.8 50.6 37.3 Avg: 41.3, Min: 37.3, Max: 50.6, Diff: 13.3] Update RS gives us the time each thread spent in updating the Remembered Sets. Remembered Sets are the data structures that keep track of the references that point into a heap region. Mutator threads keep changing the object graph and thus the references that point into a particular region. We keep track of these changes in buffers called Update Buffers. The Update RS sub-task processes the update buffers that were not able to be processed concurrently, and updates the corresponding remembered sets of all regions. [Processed Buffers : 2 2 3 2Sum: 9, Avg: 2, Min: 2, Max: 3, Diff: 1] This tells us the number of Update Buffers (mentioned above) processed by each worker thread. [Scan RS (ms): 9.9 9.7 0.0 9.7 Avg: 7.3, Min: 0.0, Max: 9.9, Diff: 9.9] These are the times each worker thread had spent in scanning the Remembered Sets. Remembered Set of a region contains cards that correspond to the references pointing into that region. This phase scans those cards looking for the references pointing into all the regions of the collection set. [Object Copy (ms): 106.7 106.8 104.6 107.9 Avg: 106.5, Min: 104.6, Max: 107.9, Diff: 3.3] These are the times spent by each worker thread copying live objects from the regions in the Collection Set to the other regions. [Termination (ms): 0.0 0.0 0.0 0.0 Avg: 0.0, Min: 0.0, Max: 0.0, Diff: 0.0] Termination time is the time spent by the worker thread offering to terminate. But before terminating, it checks the work queues of other threads and if there are still object references in other work queues, it tries to steal object references, and if it succeeds in stealing a reference, it processes that and offers to terminate again. [Termination Attempts : 1 4 4 6 Sum: 15, Avg: 3, Min: 1, Max: 6, Diff: 5] This gives the number of times each thread has offered to terminate. [GC Worker End (ms): 679.1 679.1 679.1 679.1 Avg: 679.1, Min: 679.1, Max: 679.1, Diff: 0.1] These are the times in milliseconds at which each worker thread stopped. [GC Worker (ms): 156.9 157.0 156.9 156.9 Avg: 156.9, Min: 156.9, Max: 157.0, Diff: 0.1] These are the total lifetimes of each worker thread. [GC Worker Other (ms): 0.3 0.3 0.3 0.3Avg: 0.3, Min: 0.3, Max: 0.3, Diff: 0.0] These are the times that each worker thread spent in performing some other tasks that we have not accounted above for the total Parallel Time. [Clear CT: 0.1 ms] This is the time spent in clearing the Card Table. This task is performed in serial mode. [Other: 1.5 ms] Time spent in the some other tasks listed below. The following sub-tasks (which individually may be parallelized) are performed serially. [Choose CSet: 0.0 ms] Time spent in selecting the regions for the Collection Set. [Ref Proc: 0.3 ms] Total time spent in processing Reference objects. [Ref Enq: 0.0 ms] Time spent in enqueuing references to the ReferenceQueues. [Free CSet: 0.3 ms] Time spent in freeing the collection set data structure. [Eden: 12M(12M)->0B(13M) Survivors: 0B->2048K Heap: 14M(64M)->9739K(64M)] This line gives the details on the heap size changes with the Evacuation Pause. This shows that Eden had the occupancy of 12M and its capacity was also 12M before the collection. After the collection, its occupancy got reduced to 0 since everything is evacuated/promoted from Eden during a collection, and its target size grew to 13M. The new Eden capacity of 13M is not reserved at this point. This value is the target size of the Eden. Regions are added to Eden as the demand is made and when the added regions reach to the target size, we start the next collection. Similarly, Survivors had the occupancy of 0 bytes and it grew to 2048K after the collection. The total heap occupancy and capacity was 14M and 64M receptively before the collection and it became 9739K and 64M after the collection. Apart from the evacuation pauses, G1 also performs concurrent-marking to build the live data information of regions. 1.416: [GC pause (young) (initial-mark), 0.62417980 secs] ….... 2.042: [GC concurrent-root-region-scan-start] 2.067: [GC concurrent-root-region-scan-end, 0.0251507] 2.068: [GC concurrent-mark-start] 3.198: [GC concurrent-mark-reset-for-overflow] 4.053: [GC concurrent-mark-end, 1.9849672 sec] 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 4.090: [GC concurrent-cleanup-start] 4.091: [GC concurrent-cleanup-end, 0.0002721] The first phase of a marking cycle is Initial Marking where all the objects directly reachable from the roots are marked and this phase is piggy-backed on a fully young Evacuation Pause. 2.042: [GC concurrent-root-region-scan-start] This marks the start of a concurrent phase that scans the set of root-regions which are directly reachable from the survivors of the initial marking phase. 2.067: [GC concurrent-root-region-scan-end, 0.0251507] End of the concurrent root region scan phase and it lasted for 0.0251507 seconds. 2.068: [GC concurrent-mark-start] Start of the concurrent marking at 2.068 secs from the start of the process. 3.198: [GC concurrent-mark-reset-for-overflow] This indicates that the global marking stack had became full and there was an overflow of the stack. Concurrent marking detected this overflow and had to reset the data structures to start the marking again. 4.053: [GC concurrent-mark-end, 1.9849672 sec] End of the concurrent marking phase and it lasted for 1.9849672 seconds. 4.055: [GC remark 4.055: [GC ref-proc, 0.0000254 secs], 0.0030184 secs] This corresponds to the remark phase which is a stop-the-world phase. It completes the left over marking work (SATB buffers processing) from the previous phase. In this case, this phase took 0.0030184 secs and out of which 0.0000254 secs were spent on Reference processing. 4.088: [GC cleanup 117M->106M(138M), 0.0015198 secs] Cleanup phase which is again a stop-the-world phase. It goes through the marking information of all the regions, computes the live data information of each region, resets the marking data structures and sorts the regions according to their gc-efficiency. In this example, the total heap size is 138M and after the live data counting it was found that the total live data size dropped down from 117M to 106M. 4.090: [GC concurrent-cleanup-start] This concurrent cleanup phase frees up the regions that were found to be empty (didn't contain any live data) during the previous stop-the-world phase. 4.091: [GC concurrent-cleanup-end, 0.0002721] Concurrent cleanup phase took 0.0002721 secs to free up the empty regions. Option -XX:G1PrintRegionLivenessInfo Now, let's look at the output generated with the flag G1PrintRegionLivenessInfo. This is a diagnostic option and gets enabled with -XX:+UnlockDiagnosticVMOptions. G1PrintRegionLivenessInfo prints the live data information of each region during the Cleanup phase of the concurrent-marking cycle. 26.896: [GC cleanup ### PHASE Post-Marking @ 26.896### HEAP committed: 0x02e00000-0x0fe00000 reserved: 0x02e00000-0x12e00000 region-size: 1048576 Cleanup phase of the concurrent-marking cycle started at 26.896 secs from the start of the process and this live data information is being printed after the marking phase. Committed G1 heap ranges from 0x02e00000 to 0x0fe00000 and the total G1 heap reserved by JVM is from 0x02e00000 to 0x12e00000. Each region in the G1 heap is of size 1048576 bytes. ### type address-range used prev-live next-live gc-eff### (bytes) (bytes) (bytes) (bytes/ms) This is the header of the output that tells us about the type of the region, address-range of the region, used space in the region, live bytes in the region with respect to the previous marking cycle, live bytes in the region with respect to the current marking cycle and the GC efficiency of that region. ### FREE 0x02e00000-0x02f00000 0 0 0 0.0 This is a Free region. ### OLD 0x02f00000-0x03000000 1048576 1038592 1038592 0.0 Old region with address-range from 0x02f00000 to 0x03000000. Total used space in the region is 1048576 bytes, live bytes as per the previous marking cycle are 1038592 and live bytes with respect to the current marking cycle are also 1038592. The GC efficiency has been computed as 0. ### EDEN 0x03400000-0x03500000 20992 20992 20992 0.0 This is an Eden region. ### HUMS 0x0ae00000-0x0af00000 1048576 1048576 1048576 0.0### HUMC 0x0af00000-0x0b000000 1048576 1048576 1048576 0.0### HUMC 0x0b000000-0x0b100000 1048576 1048576 1048576 0.0### HUMC 0x0b100000-0x0b200000 1048576 1048576 1048576 0.0### HUMC 0x0b200000-0x0b300000 1048576 1048576 1048576 0.0### HUMC 0x0b300000-0x0b400000 1048576 1048576 1048576 0.0### HUMC 0x0b400000-0x0b500000 1001480 1001480 1001480 0.0 These are the continuous set of regions called Humongous regions for storing a large object. HUMS (Humongous starts) marks the start of the set of humongous regions and HUMC (Humongous continues) tags the subsequent regions of the humongous regions set. ### SURV 0x09300000-0x09400000 16384 16384 16384 0.0 This is a Survivor region. ### SUMMARY capacity: 208.00 MB used: 150.16 MB / 72.19 % prev-live: 149.78 MB / 72.01 % next-live: 142.82 MB / 68.66 % At the end, a summary is printed listing the capacity, the used space and the change in the liveness after the completion of concurrent marking. In this case, G1 heap capacity is 208MB, total used space is 150.16MB which is 72.19% of the total heap size, live data in the previous marking was 149.78MB which was 72.01% of the total heap size and the live data as per the current marking is 142.82MB which is 68.66% of the total heap size. Option -XX:+G1PrintHeapRegions G1PrintHeapRegions option logs the regions related events when regions are committed, allocated into or are reclaimed. COMMIT/UNCOMMIT events G1HR COMMIT [0x6e900000,0x6ea00000]G1HR COMMIT [0x6ea00000,0x6eb00000] Here, the heap is being initialized or expanded and the region (with bottom: 0x6eb00000 and end: 0x6ec00000) is being freshly committed. COMMIT events are always generated in order i.e. the next COMMIT event will always be for the uncommitted region with the lowest address. G1HR UNCOMMIT [0x72700000,0x72800000]G1HR UNCOMMIT [0x72600000,0x72700000] Opposite to COMMIT. The heap got shrunk at the end of a Full GC and the regions are being uncommitted. Like COMMIT, UNCOMMIT events are also generated in order i.e. the next UNCOMMIT event will always be for the committed region with the highest address. GC Cycle events G1HR #StartGC 7G1HR CSET 0x6e900000G1HR REUSE 0x70500000G1HR ALLOC(Old) 0x6f800000G1HR RETIRE 0x6f800000 0x6f821b20G1HR #EndGC 7 This shows start and end of an Evacuation pause. This event is followed by a GC counter tracking both evacuation pauses and Full GCs. Here, this is the 7th GC since the start of the process. G1HR #StartFullGC 17G1HR UNCOMMIT [0x6ed00000,0x6ee00000]G1HR POST-COMPACTION(Old) 0x6e800000 0x6e854f58G1HR #EndFullGC 17 Shows start and end of a Full GC. This event is also followed by the same GC counter as above. This is the 17th GC since the start of the process. ALLOC events G1HR ALLOC(Eden) 0x6e800000 The region with bottom 0x6e800000 just started being used for allocation. In this case it is an Eden region and allocated into by a mutator thread. G1HR ALLOC(StartsH) 0x6ec00000 0x6ed00000G1HR ALLOC(ContinuesH) 0x6ed00000 0x6e000000 Regions being used for the allocation of Humongous object. The object spans over two regions. G1HR ALLOC(SingleH) 0x6f900000 0x6f9eb010 Single region being used for the allocation of Humongous object. G1HR COMMIT [0x6ee00000,0x6ef00000]G1HR COMMIT [0x6ef00000,0x6f000000]G1HR COMMIT [0x6f000000,0x6f100000]G1HR COMMIT [0x6f100000,0x6f200000]G1HR ALLOC(StartsH) 0x6ee00000 0x6ef00000G1HR ALLOC(ContinuesH) 0x6ef00000 0x6f000000G1HR ALLOC(ContinuesH) 0x6f000000 0x6f100000G1HR ALLOC(ContinuesH) 0x6f100000 0x6f102010 Here, Humongous object allocation request could not be satisfied by the free committed regions that existed in the heap, so the heap needed to be expanded. Thus new regions are committed and then allocated into for the Humongous object. G1HR ALLOC(Old) 0x6f800000 Old region started being used for allocation during GC. G1HR ALLOC(Survivor) 0x6fa00000 Region being used for copying old objects into during a GC. Note that Eden and Humongous ALLOC events are generated outside the GC boundaries and Old and Survivor ALLOC events are generated inside the GC boundaries. Other Events G1HR RETIRE 0x6e800000 0x6e87bd98 Retire and stop using the region having bottom 0x6e800000 and top 0x6e87bd98 for allocation. Note that most regions are full when they are retired and we omit those events to reduce the output volume. A region is retired when another region of the same type is allocated or we reach the start or end of a GC(depending on the region). So for Eden regions: For example: 1. ALLOC(Eden) Foo2. ALLOC(Eden) Bar3. StartGC At point 2, Foo has just been retired and it was full. At point 3, Bar was retired and it was full. If they were not full when they were retired, we will have a RETIRE event: 1. ALLOC(Eden) Foo2. RETIRE Foo top3. ALLOC(Eden) Bar4. StartGC G1HR CSET 0x6e900000 Region (bottom: 0x6e900000) is selected for the Collection Set. The region might have been selected for the collection set earlier (i.e. when it was allocated). However, we generate the CSET events for all regions in the CSet at the start of a GC to make sure there's no confusion about which regions are part of the CSet. G1HR POST-COMPACTION(Old) 0x6e800000 0x6e839858 POST-COMPACTION event is generated for each non-empty region in the heap after a full compaction. A full compaction moves objects around, so we don't know what the resulting shape of the heap is (which regions were written to, which were emptied, etc.). To deal with this, we generate a POST-COMPACTION event for each non-empty region with its type (old/humongous) and the heap boundaries. At this point we should only have Old and Humongous regions, as we have collapsed the young generation, so we should not have eden and survivors. POST-COMPACTION events are generated within the Full GC boundary. G1HR CLEANUP 0x6f400000G1HR CLEANUP 0x6f300000G1HR CLEANUP 0x6f200000 These regions were found empty after remark phase of Concurrent Marking and are reclaimed shortly afterwards. G1HR #StartGC 5G1HR CSET 0x6f400000G1HR CSET 0x6e900000G1HR REUSE 0x6f800000 At the end of a GC we retire the old region we are allocating into. Given that its not full, we will carry on allocating into it during the next GC. This is what REUSE means. In the above case 0x6f800000 should have been the last region with an ALLOC(Old) event during the previous GC and should have been retired before the end of the previous GC. G1HR ALLOC-FORCE(Eden) 0x6f800000 A specialization of ALLOC which indicates that we have reached the max desired number of the particular region type (in this case: Eden), but we decided to allocate one more. Currently it's only used for Eden regions when we extend the young generation because we cannot do a GC as the GC-Locker is active. G1HR EVAC-FAILURE 0x6f800000 During a GC, we have failed to evacuate an object from the given region as the heap is full and there is no space left to copy the object. This event is generated within GC boundaries and exactly once for each region from which we failed to evacuate objects. When Heap Regions are reclaimed ? It is also worth mentioning when the heap regions in the G1 heap are reclaimed. All regions that are in the CSet (the ones that appear in CSET events) are reclaimed at the end of a GC. The exception to that are regions with EVAC-FAILURE events. All regions with CLEANUP events are reclaimed. After a Full GC some regions get reclaimed (the ones from which we moved the objects out). But that is not shown explicitly, instead the non-empty regions that are left in the heap are printed out with the POST-COMPACTION events.

Read the article
Big Data – Buzz Words: What is MapReduce – Day 7 of 21

- by Pinal Dave

In yesterday’s blog post we learned what is Hadoop. In this article we will take a quick look at one of the four most important buzz words which goes around Big Data – MapReduce. What is MapReduce? MapReduce was designed by Google as a programming model for processing large data sets with a parallel, distributed algorithm on a cluster. Though, MapReduce was originally Google proprietary technology, it has been quite a generalized term in the recent time. MapReduce comprises a Map() and Reduce() procedures. Procedure Map() performance filtering and sorting operation on data where as procedure Reduce() performs a summary operation of the data. This model is based on modified concepts of the map and reduce functions commonly available in functional programing. The library where procedure Map() and Reduce() belongs is written in many different languages. The most popular free implementation of MapReduce is Apache Hadoop which we will explore tomorrow. Advantages of MapReduce Procedures The MapReduce Framework usually contains distributed servers and it runs various tasks in parallel to each other. There are various components which manages the communications between various nodes of the data and provides the high availability and fault tolerance. Programs written in MapReduce functional styles are automatically parallelized and executed on commodity machines. The MapReduce Framework takes care of the details of partitioning the data and executing the processes on distributed server on run time. During this process if there is any disaster the framework provides high availability and other available modes take care of the responsibility of the failed node. As you can clearly see more this entire MapReduce Frameworks provides much more than just Map() and Reduce() procedures; it provides scalability and fault tolerance as well. A typical implementation of the MapReduce Framework processes many petabytes of data and thousands of the processing machines. How do MapReduce Framework Works? A typical MapReduce Framework contains petabytes of the data and thousands of the nodes. Here is the basic explanation of the MapReduce Procedures which uses this massive commodity of the servers. Map() Procedure There is always a master node in this infrastructure which takes an input. Right after taking input master node divides it into smaller sub-inputs or sub-problems. These sub-problems are distributed to worker nodes. A worker node later processes them and does necessary analysis. Once the worker node completes the process with this sub-problem it returns it back to master node. Reduce() Procedure All the worker nodes return the answer to the sub-problem assigned to them to master node. The master node collects the answer and once again aggregate that in the form of the answer to the original big problem which was assigned master node. The MapReduce Framework does the above Map () and Reduce () procedure in the parallel and independent to each other. All the Map() procedures can run parallel to each other and once each worker node had completed their task they can send it back to master code to compile it with a single answer. This particular procedure can be very effective when it is implemented on a very large amount of data (Big Data). The MapReduce Framework has five different steps: Preparing Map() Input Executing User Provided Map() Code Shuffle Map Output to Reduce Processor Executing User Provided Reduce Code Producing the Final Output Here is the Dataflow of MapReduce Framework: Input Reader Map Function Partition Function Compare Function Reduce Function Output Writer In a future blog post of this 31 day series we will explore various components of MapReduce in Detail. MapReduce in a Single Statement MapReduce is equivalent to SELECT and GROUP BY of a relational database for a very large database. Tomorrow In tomorrow’s blog post we will discuss Buzz Word – HDFS. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
SQL SERVER – Partition Parallelism Support in expressor 3.6

- by pinaldave

I am very excited to learn that there is a new version of expressor’s data integration platform coming out in March of this year. It will be version 3.6, and I look forward to using it and telling everyone about it. Let me describe a little bit more about what will be so great in expressor 3.6: Greatly enhanced user interface Parallel Processing Bulk Artifact Upgrading The User Interface First let me cover the most obvious enhancements. The expressor Studio user interface (UI) has had some significant work done. Kudos to the expressor Engineering team; the entire UI is a visual masterpiece that is very responsive and intuitive. The improvements are more than just eye candy; they provide significant productivity gains when developing expressor Dataflows. Operator shape icons now include a description that identifies the function of each operator, instead of having to guess at the function by the icon. Operator shapes and highlighting depict the current function and status: Disabled, enabled, complete, incomplete, and error. Each status displays an appropriate message in the message panel with correction suggestions. Floating or docking property panels provide descriptive tool tips for each property as well as auto resize when adjusting the canvas, without having to search Help or the need to scroll around to get access to the property. Progress and status indicators let you know when an operation is working. “No limit” canvas with snap-to-grid allows automatic sizing and accurate positioning when you have numerous operators in the Dataflow. The inline tool bar offers quick access to pan, zoom, fit and overview functions. Selecting multiple artifacts with a right click context allows you to easily manage your workspace more efficiently. Partitioning and Parallel Processing Partitioning allows each operator to process multiple subsets of records in parallel as opposed to processing all records that flow through that operator in a single sequential set. This capability allows the user to configure the expressor Dataflow to run in a way that most efficiently utilizes the resources of the hardware where the Dataflow is running. Partitions can exist in most individual operators. Using partitions increases the speed of an expressor data integration application, therefore improving performance and load times. With the expressor 3.6 Enterprise Edition, expressor simplifies enabling parallel processing by adding intuitive partition settings that are easy to configure. Bulk Artifact Upgrading Bulk Artifact Upgrading sounds a bit intimidating, but it actually is not and it is a welcome addition to expressor Studio. In past releases, users were prompted to confirm that they wanted to upgrade their individual artifacts only when opened. This was a cumbersome and repetitive process. Now with bulk artifact upgrading, a user can easily select what artifact or group of artifacts to upgrade all at once. As you can see, there are many new features and upgrade options that will prove to make expressor Studio quicker and more efficient. I hope I’m not the only one who is excited about all these new upgrades, and that I you try expressor and share your experience with me. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
Efficiently separating Read/Compute/Write steps for concurrent processing of entities in Entity/Component systems

- by TravisG

Setup I have an entity-component architecture where Entities can have a set of attributes (which are pure data with no behavior) and there exist systems that run the entity logic which act on that data. Essentially, in somewhat pseudo-code: Entity { id; map<id_type, Attribute> attributes; } System { update(); vector<Entity> entities; } A system that just moves along all entities at a constant rate might be MovementSystem extends System { update() { for each entity in entities position = entity.attributes["position"]; position += vec3(1,1,1); } } Essentially, I'm trying to parallelise update() as efficiently as possible. This can be done by running entire systems in parallel, or by giving each update() of one system a couple of components so different threads can execute the update of the same system, but for a different subset of entities registered with that system. Problem In reality, these systems sometimes require that entities interact(/read/write data from/to) each other, sometimes within the same system (e.g. an AI system that reads state from other entities surrounding the current processed entity), but sometimes between different systems that depend on each other (i.e. a movement system that requires data from a system that processes user input). Now, when trying to parallelize the update phases of entity/component systems, the phases in which data (components/attributes) from Entities are read and used to compute something, and the phase where the modified data is written back to entities need to be separated in order to avoid data races. Otherwise the only way (not taking into account just "critical section"ing everything) to avoid them is to serialize parts of the update process that depend on other parts. This seems ugly. To me it would seem more elegant to be able to (ideally) have all processing running in parallel, where a system may read data from all entities as it wishes, but doesn't write modifications to that data back until some later point. The fact that this is even possible is based on the assumption that modification write-backs are usually very small in complexity, and don't require much performance, whereas computations are very expensive (relatively). So the overhead added by a delayed-write phase might be evened out by more efficient updating of entities (by having threads work more % of the time instead of waiting). A concrete example of this might be a system that updates physics. The system needs to both read and write a lot of data to and from entities. Optimally, there would be a system in place where all available threads update a subset of all entities registered with the physics system. In the case of the physics system this isn't trivially possible because of race conditions. So without a workaround, we would have to find other systems to run in parallel (which don't modify the same data as the physics system), other wise the remaining threads are waiting and wasting time. However, that has disadvantages Practically, the L3 cache is pretty much always better utilized when updating a large system with multiple threads, as opposed to multiple systems at once, which all act on different sets of data. Finding and assembling other systems to run in parallel can be extremely time consuming to design well enough to optimize performance. Sometimes, it might even not be possible at all because a system just depends on data that is touched by all other systems. Solution? In my thinking, a possible solution would be a system where reading/updating and writing of data is separated, so that in one expensive phase, systems only read data and compute what they need to compute, and then in a separate, performance-wise cheap, write phase, attributes of entities that needed to be modified are finally written back to the entities. The Question How might such a system be implemented to achieve optimal performance, as well as making programmer life easier? What are the implementation details of such a system and what might have to be changed in the existing EC-architecture to accommodate this solution?

Read the article
Asynchrony in C# 5 (Part II)

- by javarg

This article is a continuation of the series of asynchronous features included in the new Async CTP preview for next versions of C# and VB. Check out Part I for more information. So, let’s continue with TPL Dataflow: Asynchronous functions TPL Dataflow Task based asynchronous Pattern Part II: TPL Dataflow Definition (by quote of Async CTP doc): “TPL Dataflow (TDF) is a new .NET library for building concurrent applications. It promotes actor/agent-oriented designs through primitives for in-process message passing, dataflow, and pipelining. TDF builds upon the APIs and scheduling infrastructure provided by the Task Parallel Library (TPL) in .NET 4, and integrates with the language support for asynchrony provided by C#, Visual Basic, and F#.” This means: data manipulation processed asynchronously. “TPL Dataflow is focused on providing building blocks for message passing and parallelizing CPU- and I/O-intensive applications”. Data manipulation is another hot area when designing asynchronous and parallel applications: how do you sync data access in a parallel environment? how do you avoid concurrency issues? how do you notify when data is available? how do you control how much data is waiting to be consumed? etc. Dataflow Blocks TDF provides data and action processing blocks. Imagine having preconfigured data processing pipelines to choose from, depending on the type of behavior you want. The most basic block is the BufferBlock<T>, which provides an storage for some kind of data (instances of <T>). So, let’s review data processing blocks available. Blocks a categorized into three groups: Buffering Blocks Executor Blocks Joining Blocks Think of them as electronic circuitry components :).. 1. BufferBlock<T>: it is a FIFO (First in First Out) queue. You can Post data to it and then Receive it synchronously or asynchronously. It synchronizes data consumption for only one receiver at a time (you can have many receivers but only one will actually process it). 2. BroadcastBlock<T>: same FIFO queue for messages (instances of <T>) but link the receiving event to all consumers (it makes the data available for consumption to N number of consumers). The developer can provide a function to make a copy of the data if necessary. 3. WriteOnceBlock<T>: it stores only one value and once it’s been set, it can never be replaced or overwritten again (immutable after being set). As with BroadcastBlock<T>, all consumers can obtain a copy of the value. 4. ActionBlock<TInput>: this executor block allows us to define an operation to be executed when posting data to the queue. Thus, we must pass in a delegate/lambda when creating the block. Posting data will result in an execution of the delegate for each data in the queue. You could also specify how many parallel executions to allow (degree of parallelism). 5. TransformBlock<TInput, TOutput>: this is an executor block designed to transform each input, that is way it defines an output parameter. It ensures messages are processed and delivered in order. 6. TransformManyBlock<TInput, TOutput>: similar to TransformBlock but produces one or more outputs from each input. 7. BatchBlock<T>: combines N single items into one batch item (it buffers and batches inputs). 8. JoinBlock<T1, T2, …>: it generates tuples from all inputs (it aggregates inputs). Inputs could be of any type you want (T1, T2, etc.). 9. BatchJoinBlock<T1, T2, …>: aggregates tuples of collections. It generates collections for each type of input and then creates a tuple to contain each collection (Tuple<IList<T1>, IList<T2>>). Next time I will show some examples of usage for each TDF block. * Images taken from Microsoft’s Async CTP documentation.

Read the article
flex using tweenmax library

- by Nishant

Hello, I am currently using flex transition effects on state change. Is there a way I can use tweenmax library for that? Update: In the code below, I have transitions from state one to state two. I would like to replace that code tweenermax library. <?xml version="1.0" encoding="utf-8"?> <s:Application xmlns:fx="http://ns.adobe.com/mxml/2009" xmlns:s="library://ns.adobe.com/flex/spark" xmlns:mx="library://ns.adobe.com/flex/mx" minWidth="955" minHeight="600"> <s:states> <s:State name="one" /> <s:State name="two" /> </s:states> <s:transitions> <s:Transition fromState="one" toState="two"> <s:Parallel targets="{one, two}"> <s:Fade /> </s:Parallel> </s:Transition> <s:Transition fromState="two" toState="one"> <s:Parallel targets="{one, two}"> <s:Fade /> </s:Parallel> </s:Transition> </s:transitions> <component:one id="one" /> <component:one id="two" /> </s:Application>

Read the article
C#/.NET Little Wonders: Interlocked CompareExchange()

- by James Michael Hare

Once again, in this series of posts I look at the parts of the .NET Framework that may seem trivial, but can help improve your code by making it easier to write and maintain. The index of all my past little wonders posts can be found here. Two posts ago, I discussed the Interlocked Add(), Increment(), and Decrement() methods (here) for adding and subtracting values in a thread-safe, lightweight manner. Then, last post I talked about the Interlocked Read() and Exchange() methods (here) for safely and efficiently reading and setting 32 or 64 bit values (or references). This week, we’ll round out the discussion by talking about the Interlocked CompareExchange() method and how it can be put to use to exchange a value if the current value is what you expected it to be. Dirty reads can lead to bad results Many of the uses of Interlocked that we’ve explored so far have centered around either reading, setting, or adding values. But what happens if you want to do something more complex such as setting a value based on the previous value in some manner? Perhaps you were creating an application that reads a current balance, applies a deposit, and then saves the new modified balance, where of course you’d want that to happen atomically. If you read the balance, then go to save the new balance and between that time the previous balance has already changed, you’ll have an issue! Think about it, if we read the current balance as $400, and we are applying a new deposit of $50.75, but meanwhile someone else deposits $200 and sets the total to $600, but then we write a total of $450.75 we’ve lost $200! Now, certainly for int and long values we can use Interlocked.Add() to handles these cases, and it works well for that. But what if we want to work with doubles, for example? Let’s say we wanted to add the numbers from 0 to 99,999 in parallel. We could do this by spawning several parallel tasks to continuously add to a total: 1: double total = 0; 2: 3: Parallel.For(0, 10000, next => 4: { 5: total += next; 6: }); Were this run on one thread using a standard for loop, we’d expect an answer of 4,999,950,000 (the sum of all numbers from 0 to 99,999). But when we run this in parallel as written above, we’ll likely get something far off. The result of one of my runs, for example, was 1,281,880,740. That is way off! If this were banking software we’d be in big trouble with our clients. So what happened? The += operator is not atomic, it will read in the current value, add the result, then store it back into the total. At any point in all of this another thread could read a “dirty” current total and accidentally “skip” our add. So, to clean this up, we could use a lock to guarantee concurrency: 1: double total = 0.0; 2: object locker = new object(); 3: 4: Parallel.For(0, count, next => 5: { 6: lock (locker) 7: { 8: total += next; 9: } 10: }); Which will give us the correct result of 4,999,950,000. One thing to note is that locking can be heavy, especially if the operation being locked over is trivial, or the life of the lock is a high percentage of the work being performed concurrently. In the case above, the lock consumes pretty much all of the time of each parallel task – and the task being locked on is relatively trivial. Now, let me put in a disclaimer here before we go further: For most uses, lock is more than sufficient for your needs, and is often the simplest solution! So, if lock is sufficient for most needs, why would we ever consider another solution? The problem with locking is that it can suspend execution of your thread while it waits for the signal that the lock is free. Moreover, if the operation being locked over is trivial, the lock can add a very high level of overhead. This is why things like Interlocked.Increment() perform so well, instead of locking just to perform an increment, we perform the increment with an atomic, lockless method. As with all things performance related, it’s important to profile before jumping to the conclusion that you should optimize everything in your path. If your profiling shows that locking is causing a high level of waiting in your application, then it’s time to consider lighter alternatives such as Interlocked. CompareExchange() – Exchange existing value if equal some value So let’s look at how we could use CompareExchange() to solve our problem above. The general syntax of CompareExchange() is: T CompareExchange<T>(ref T location, T newValue, T expectedValue) If the value in location == expectedValue, then newValue is exchanged. Either way, the value in location (before exchange) is returned. Actually, CompareExchange() is not one method, but a family of overloaded methods that can take int, long, float, double, pointers, or references. It cannot take other value types (that is, can’t CompareExchange() two DateTime instances directly). Also keep in mind that the version that takes any reference type (the generic overload) only checks for reference equality, it does not call any overridden Equals(). So how does this help us? Well, we can grab the current total, and exchange the new value if total hasn’t changed. This would look like this: 1: // grab the snapshot 2: double current = total; 3: 4: // if the total hasn’t changed since I grabbed the snapshot, then 5: // set it to the new total 6: Interlocked.CompareExchange(ref total, current + next, current); So what the code above says is: if the amount in total (1st arg) is the same as the amount in current (3rd arg), then set total to current + next (2nd arg). This check and exchange pair is atomic (and thus thread-safe). This works if total is the same as our snapshot in current, but the problem, is what happens if they aren’t the same? Well, we know that in either case we will get the previous value of total (before the exchange), back as a result. Thus, we can test this against our snapshot to see if it was the value we expected: 1: // if the value returned is != current, then our snapshot must be out of date 2: // which means we didn't (and shouldn't) apply current + next 3: if (Interlocked.CompareExchange(ref total, current + next, current) != current) 4: { 5: // ooops, total was not equal to our snapshot in current, what should we do??? 6: } So what do we do if we fail? That’s up to you and the problem you are trying to solve. It’s possible you would decide to abort the whole transaction, or perhaps do a lightweight spin and try again. Let’s try that: 1: double current = total; 2: 3: // make first attempt... 4: if (Interlocked.CompareExchange(ref total, current + i, current) != current) 5: { 6: // if we fail, go into a spin wait, spin, and try again until succeed 7: var spinner = new SpinWait(); 8: 9: do 10: { 11: spinner.SpinOnce(); 12: current = total; 13: } 14: while (Interlocked.CompareExchange(ref total, current + i, current) != current); 15: } 16: This is not trivial code, but it illustrates a possible use of CompareExchange(). What we are doing is first checking to see if we succeed on the first try, and if so great! If not, we create a SpinWait and then repeat the process of SpinOnce(), grab a fresh snapshot, and repeat until CompareExchnage() succeeds. You may wonder why not a simple do-while here, and the reason it’s more efficient to only create the SpinWait until we absolutely know we need one, for optimal efficiency. Though not as simple (or maintainable) as a simple lock, this will perform better in many situations. Comparing an unlocked (and wrong) version, a version using lock, and the Interlocked of the code, we get the following average times for multiple iterations of adding the sum of 100,000 numbers: 1: Unlocked money average time: 2.1 ms 2: Locked money average time: 5.1 ms 3: Interlocked money average time: 3 ms So the Interlocked.CompareExchange(), while heavier to code, came in lighter than the lock, offering a good compromise of safety and performance when we need to reduce contention. CompareExchange() - it’s not just for adding stuff… So that was one simple use of CompareExchange() in the context of adding double values -- which meant we couldn’t have used the simpler Interlocked.Add() -- but it has other uses as well. If you think about it, this really works anytime you want to create something new based on a current value without using a full lock. For example, you could use it to create a simple lazy instantiation implementation. In this case, we want to set the lazy instance only if the previous value was null: 1: public static class Lazy<T> where T : class, new() 2: { 3: private static T _instance; 4: 5: public static T Instance 6: { 7: get 8: { 9: // if current is null, we need to create new instance 10: if (_instance == null) 11: { 12: // attempt create, it will only set if previous was null 13: Interlocked.CompareExchange(ref _instance, new T(), (T)null); 14: } 15: 16: return _instance; 17: } 18: } 19: } So, if _instance == null, this will create a new T() and attempt to exchange it with _instance. If _instance is not null, then it does nothing and we discard the new T() we created. This is a way to create lazy instances of a type where we are more concerned about locking overhead than creating an accidental duplicate which is not used. In fact, the BCL implementation of Lazy<T> offers a similar thread-safety choice for Publication thread safety, where it will not guarantee only one instance was created, but it will guarantee that all readers get the same instance. Another possible use would be in concurrent collections. Let’s say, for example, that you are creating your own brand new super stack that uses a linked list paradigm and is “lock free”. We could use Interlocked.CompareExchange() to be able to do a lockless Push() which could be more efficient in multi-threaded applications where several threads are pushing and popping on the stack concurrently. Yes, there are already concurrent collections in the BCL (in .NET 4.0 as part of the TPL), but it’s a fun exercise! So let’s assume we have a node like this: 1: public sealed class Node<T> 2: { 3: // the data for this node 4: public T Data { get; set; } 5: 6: // the link to the next instance 7: internal Node<T> Next { get; set; } 8: } Then, perhaps, our stack’s Push() operation might look something like: 1: public sealed class SuperStack<T> 2: { 3: private volatile T _head; 4: 5: public void Push(T value) 6: { 7: var newNode = new Node<int> { Data = value, Next = _head }; 8: 9: if (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next) 10: { 11: var spinner = new SpinWait(); 12: 13: do 14: { 15: spinner.SpinOnce(); 16: newNode.Next = _head; 17: } 18: while (Interlocked.CompareExchange(ref _head, newNode, newNode.Next) != newNode.Next); 19: } 20: } 21: 22: // ... 23: } Notice a similar paradigm here as with adding our doubles before. What we are doing is creating the new Node with the data to push, and with a Next value being the original node referenced by _head. This will create our stack behavior (LIFO – Last In, First Out). Now, we have to set _head to now refer to the newNode, but we must first make sure it hasn’t changed! So we check to see if _head has the same value we saved in our snapshot as newNode.Next, and if so, we set _head to newNode. This is all done atomically, and the result is _head’s original value, as long as the original value was what we assumed it was with newNode.Next, then we are good and we set it without a lock! If not, we SpinWait and try again. Once again, this is much lighter than locking in highly parallelized code with lots of contention. If I compare the method above with a similar class using lock, I get the following results for pushing 100,000 items: 1: Locked SuperStack average time: 6 ms 2: Interlocked SuperStack average time: 4.5 ms So, once again, we can get more efficient than a lock, though there is the cost of added code complexity. Fortunately for you, most of the concurrent collection you’d ever need are already created for you in the System.Collections.Concurrent (here) namespace – for more information, see my Little Wonders – The Concurent Collections Part 1 (here), Part 2 (here), and Part 3 (here). Summary We’ve seen before how the Interlocked class can be used to safely and efficiently add, increment, decrement, read, and exchange values in a multi-threaded environment. In addition to these, Interlocked CompareExchange() can be used to perform more complex logic without the need of a lock when lock contention is a concern. The added efficiency, though, comes at the cost of more complex code. As such, the standard lock is often sufficient for most thread-safety needs. But if profiling indicates you spend a lot of time waiting for locks, or if you just need a lock for something simple such as an increment, decrement, read, exchange, etc., then consider using the Interlocked class’s methods to reduce wait. Technorati Tags: C#,CSharp,.NET,Little Wonders,Interlocked,CompareExchange,threading,concurrency

Read the article
Red Gate Coder interviews: Alex Davies

- by Michael Williamson

Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

Read the article
Python error after installing libboost-all-dev on debian [migrated]

- by Cameron Metzke

A friend of mine wanted the liboost libraries installed on our shared computer so after installing libboost-all-dev 1.49.0.1 ( A debian wheezy machine ), I get this error when using the "pydoc modules" command on the commandline. It spits out the following error -- root@debian:/usr/include/c++/4.7# pydoc modules Please wait a moment while I gather a list of all available modules... **[debian:49065] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 357 [debian:49065] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../../../../orte/mca/ess/singleton/ess_singleton_module.c at line 230 [debian:49065] [[INVALID],INVALID] ORTE_ERROR_LOG: A system-required executable either could not be found or was not executable by this user in file ../../../orte/runtime/orte_init.c at line 132 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_ess_set_name failed --> Returned value A system-required executable either could not be found or was not executable by this user (-127) instead of ORTE_SUCCESS -------------------------------------------------------------------------- -------------------------------------------------------------------------- It looks like MPI_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during MPI_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): ompi_mpi_init: orte_init failed --> Returned "A system-required executable either could not be found or was not executable by this user" (-127) instead of "Success" (0) -------------------------------------------------------------------------- *** The MPI_Init() function was called before MPI_INIT was invoked. *** This is disallowed by the MPI standard. *** Your MPI job will now abort. [debian:49065] Abort before MPI_INIT completed successfully; not able to guarantee that all other processes were killed!** root@debian:/usr/include/c++/4.7# I tried looking into the problem and ended up uninstalling the following to get it to work again. openmpi common all 1.4.5-1 libibverbs-dev amd64 1.1.6-1 libopenmpi-dev amd64 1.4.5-1 mpi-default-dev amd64 1.0.1 libboost-mpi-python1.49.0 although pydoc works again, I'm assuming the packages I removed are gunna hurt somethiong else down the track ? As you guessed im not a c/c++ programmer. So I guess my question is, will this hurt something later ? is their a way to install those packages without hurting python ?

Read the article
Cant access a remote server due mistake by setting firewall rule

- by LMIT

I need help due a my silly mistake! So for long time i have a dedicate server hosted by register.it Usually i access remotly to this server (Windows 2008 server) by Terminal Server. Today i wanted to block one site that continually send request to my server. So i was adding a new rule in the firewall (the native firewall on windows 2008 server), as i did many time, but this time, probably i was sleeping with my brain i add a general rules that stop everything! So i cant access to the server anymore, as no any users can browse the sites, nothing is working because this rule block everything. I know that is a silly mistake, no need to tell me :) so please what i can do ? The only 1 thing that my provider let me is reboot the server by his control panel, but this not help me in any way because the firewall block me again. i have administrator username and password, so what i really can do ? there are some trick some tecnique, some expert guru that can help me in this very bad situation ? UPDATE i follow the Tony suggest and i did a NMAP to check if some ports are open but look like all closed: NMAP RESULT Starting Nmap 6.00 ( http://nmap.org ) at 2012-05-29 22:32 W. Europe Daylight Time NSE: Loaded 93 scripts for scanning. NSE: Script Pre-scanning. Initiating Parallel DNS resolution of 1 host. at 22:32 Completed Parallel DNS resolution of 1 host. at 22:33, 13.00s elapsed Initiating SYN Stealth Scan at 22:33 Scanning xxx.xxx.xxx.xxx [1000 ports] SYN Stealth Scan Timing: About 29.00% done; ETC: 22:34 (0:01:16 remaining) SYN Stealth Scan Timing: About 58.00% done; ETC: 22:34 (0:00:44 remaining) Completed SYN Stealth Scan at 22:34, 104.39s elapsed (1000 total ports) Initiating Service scan at 22:34 Initiating OS detection (try #1) against xxx.xxx.xxx.xxx Retrying OS detection (try #2) against xxx.xxx.xxx.xxx Initiating Traceroute at 22:34 Completed Traceroute at 22:35, 6.27s elapsed Initiating Parallel DNS resolution of 11 hosts. at 22:35 Completed Parallel DNS resolution of 11 hosts. at 22:35, 13.00s elapsed NSE: Script scanning xxx.xxx.xxx.xxx. Initiating NSE at 22:35 Completed NSE at 22:35, 0.00s elapsed Nmap scan report for xxx.xxx.xxx.xxx Host is up. All 1000 scanned ports on xxx.xxx.xxx.xxx are filtered Too many fingerprints match this host to give specific OS details TRACEROUTE (using proto 1/icmp) HOP RTT ADDRESS 1 ... ... ... 13 ... 30 NSE: Script Post-scanning. Read data files from: D:\Program Files\Nmap OS and Service detection performed. Please report any incorrect results at http://nmap.org/submit/ . Nmap done: 1 IP address (1 host up) scanned in 145.08 seconds Raw packets sent: 2116 (96.576KB) | Rcvd: 61 (4.082KB) Question: The provider locally can access by username and password ?

Read the article
Benchmark MySQL Cluster using flexAsynch: No free node id found for mysqld(API)?

- by quanta

I am going to benchmark MySQL Cluster using flexAsynch follow this guide, details as below: mkdir /usr/local/mysqlc732/ cd /usr/local/src/mysql-cluster-gpl-7.3.2 cmake . -DCMAKE_INSTALL_PREFIX=/usr/local/mysqlc732/ -DWITH_NDB_TEST=ON make make install Everything works fine until this step: # /usr/local/mysqlc732/bin/flexAsynch -t 1 -p 80 -l 2 -o 100 -c 100 -n FLEXASYNCH - Starting normal mode Perform benchmark of insert, update and delete transactions 1 number of concurrent threads 80 number of parallel operation per thread 100 transaction(s) per round 2 iterations Load Factor is 80% 25 attributes per table 1 is the number of 32 bit words per attribute Tables are with logging Transactions are executed with hint provided No force send is used, adaptive algorithm used Key Errors are disallowed Temporary Resource Errors are allowed Insufficient Space Errors are disallowed Node Recovery Errors are allowed Overload Errors are allowed Timeout Errors are allowed Internal NDB Errors are allowed User logic reported Errors are allowed Application Errors are disallowed Using table name TAB0 NDBT_ProgramExit: 1 - Failed ndb_cluster.log: WARNING -- Failed to allocate nodeid for API at 127.0.0.1. Returned eror: 'No free node id found for mysqld(API).' I also have recompiled with -DWITH_DEBUG=1 -DWITH_NDB_DEBUG=1. How can I run flexAsynch in the debug mode? # /usr/local/mysqlc732/bin/flexAsynch -h FLEXASYNCH Perform benchmark of insert, update and delete transactions Arguments: -t Number of threads to start, default 1 -p Number of parallel transactions per thread, default 32 -o Number of transactions per loop, default 500 -l Number of loops to run, default 1, 0=infinite -load_factor Number Load factor in index in percent (40 -> 99) -a Number of attributes, default 25 -c Number of operations per transaction -s Size of each attribute, default 1 (PK is always of size 1, independent of this value) -simple Use simple read to read from database -dirty Use dirty read to read from database -write Use writeTuple in insert and update -n Use standard table names -no_table_create Don't create tables in db -temp Create table(s) without logging -no_hint Don't give hint on where to execute transaction coordinator -adaptive Use adaptive send algorithm (default) -force Force send when communicating -non_adaptive Send at a 10 millisecond interval -local 1 = each thread its own node, 2 = round robin on node per parallel trans 3 = random node per parallel trans -ndbrecord Use NDB Record -r Number of extra loops -insert Only run inserts on standard table -read Only run reads on standard table -update Only run updates on standard table -delete Only run deletes on standard table -create_table Only run Create Table of standard table -drop_table Only run Drop Table on standard table -warmup_time Warmup Time before measurement starts -execution_time Execution Time where measurement is done -cooldown_time Cooldown time after measurement completed -table Number of standard table, default 0

Read the article
Why cant I use two or more keyboards/ mouse parallely on one computer?

- by rainbow365

Hi, This is just a question of curiosity. Why cannot I use two keyboards parallely in windows? for eg. typing in notepad using 2 keyboards in real parallel mode. what prevents windows to accomdate such feature? is there any multitasking or parallel processing OS that can do it?

Read the article
ZeroDowntime deployment of configuration in Tomcat 7

- by pagid

looking at the things which can be done with the Parallel deployments in Tomcat 7, I wonder how new or changed configuration could be provided to these various versions of the application. In a nutshell - what parallel deployment offers is that pushing a new version of a war file to the webapps dir (with filenames like "App##01.war, "App##02.war") and ever user with a new session will get the newer version, all others stay with the old version. So how could one provide different or additional configuration (properties) to the various versions? Cheers.

Read the article
Why can't I use two or more keyboards/mice at the same time on one computer?

- by rainbow365

This is just a question of curiosity. Why cannot I use two keyboards at the same time in Windows? For example, typing in Notepad using 2 keyboards in real parallel mode. what prevents Windows from accommodating such a feature? Is there any multitasking or parallel processing OS that can do it?

Read the article
Enable ReadyBoost on a second internal HDD?

- by kd304

I have two SATA HDDs in my desktop PC (one for daily activity, one for storage and backup). I can finely use ReadyBoost with pendrives, but I wonder, Is there a way I could use my underutilized second HDD to participate in the cacheing mechanism (same concept as having two CPU cores crunch things in parallel: have two HDDs fetch data in parallel)? Clearly speaking: I want to enable ReadyBoost on my separate D: drive.

Read the article
Enabling "USB Printing Support" in Windows 7

- by Kevin Dente

I'm trying to use an old parallel-port based printer with a USB-to-parallel port adapter on Windows 7. When I plug it into the USB port on the computer it's listed as an unrecognized device. I know that these cables typically use the "USB Printing Support" driver with makes USB ports show up as printer ports in the printer dialog. Is there a way to manually add USB Printing Support to Windows 7, since it isn't being added automatically?

Read the article
Enabling "USB Printing Support" in Windows 7

- by Kevin Dente

I'm trying to use an old parallel-port based printer with a USB-to-parallel port adapter on Windows 7. When I plug it into the USB port on the computer it's listed as an unrecognized device. I know that these cables typically use the "USB Printing Support" driver with makes USB ports show up as printer ports in the printer dialog. Is there a way to manually add USB Printing Support to Windows 7, since it isn't being added automatically?

Read the article
Linux software RAID6: 3 drives offline - how to force online?

- by Ole Tange

This is similar to 3 drives fell out of Raid6 mdadm - rebuilding? except that it is not due to a failing cable. Instead the 3rd drive fell offline during rebuild of another drive. The drive failed with: kernel: end_request: I/O error, dev sdc, sector 293732432 kernel: md/raid:md0: read error not correctable (sector 293734224 on sdc). After rebooting both these sectors and the sectors around them are fine. This leads me to believe the error is intermittent and thus the device simply took too long to error correct the sector and remap it. I expect that no data was written to the RAID after it failed. Therefore I hope that if I can kick the last failing device online that the RAID is fine and that the xfs_filesystem is OK, maybe with a few missing recent files. Taking a backup of the disks in the RAID takes 24 hours, so I would prefer that the solution works the first time. I have therefore set up a test scenario: export PRE=3 parallel dd if=/dev/zero of=/tmp/raid${PRE}{} bs=1k count=1000k ::: 1 2 3 4 5 parallel mknod /dev/loop${PRE}{} b 7 ${PRE}{} \; losetup /dev/loop${PRE}{} /tmp/raid${PRE}{} ::: 1 2 3 4 5 mdadm --create /dev/md$PRE -c 4096 --level=6 --raid-devices=5 /dev/loop${PRE}[12345] cat /proc/mdstat mkfs.xfs -f /dev/md$PRE mkdir -p /mnt/disk2 umount -l /mnt/disk2 mount /dev/md$PRE /mnt/disk2 seq 1000 | parallel -j1 mkdir -p /mnt/disk2/{}\;cp /bin/* /mnt/disk2/{}\;sleep 0.5 & mdadm --fail /dev/md$PRE /dev/loop${PRE}3 /dev/loop${PRE}4 cat /proc/mdstat # Assume reboot so no process is using the dir kill %1; sync & kill %1; sync & # Force fail one too many mdadm --fail /dev/md$PRE /dev/loop${PRE}1 parallel --tag -k mdadm -E ::: /dev/loop${PRE}? | grep Upda # loop 2,5 are newest. loop1 almost newest => force add loop1 Next step is to add loop1 back - and this is where I am stuck. After that do a xfs-consistency check. When that works, check that the solution also works on real devices (such a 4 USB sticks).

Read the article
openmp in mex : stackoverflow error

- by Edwin

i have got the following fraction of code that getting me the stack overflow error #pragma omp parallel shared(Mo1, Mo2, sum_normalized_p_gn, Data, Mean_Out,Covar_Out,Prior_Out, det) private(i) num_threads( number_threads ) { //every thread has a new copy double* normalized_p_gn = (double*)malloc(NMIX*sizeof(double)); #pragma omp critical { int id = omp_get_thread_num(); int threads = omp_get_num_threads(); mexEvalString("drawnow"); } #pragma omp for //some parallel process..... } the variables declared in the shared are created by malloc. and they consumes with large amount of memory there are 2 questions regarding to the above code. 1) why this would generate the stack overflow error( i.e. segmentation fault) before it goes into the parallel for loop? it works fine when it runs in the sequential mode.... 2) am i right to dynamic allocate memory for each thread like "normalized_p_gn" above? Regards Edwin

Read the article
Hadoop: Iterative MapReduce Performance

- by S.N

Is it correct to say that the parallel computation with iterative MapReduce can be justified only when the training data size is too large for the non-parallel computation for the same logic? I am aware that the there is overhead for starting MapReduce jobs. This can be critical for overall execution time when a large number of iterations is required. I can imagine that the sequential computation is faster than the parallel computation with iterative MapReduce as long as the memory allows to hold a data set in many cases. Is it the only benefit to use the iterative MapReduce? If not, what are the other benefits could be?

Read the article
Scala Tuple Deconstruction

- by dbyrne

I am new to Scala, and ran across a small hiccup that has been annoying me. Initializing two vars in parallel works great: var (x,y) = (1,2) However I can't find a way to assign new values in parallel: (x,y) = (x+y,y-x) //invalid syntax I end up writing something like this: val xtmp = x+y; y = x-y; x = xtmp I realize writing functional code is one way of avoiding this, but there are certain situations where vars just make more sense. I have two questions: 1) Is there a better way of doing this? Am I missing something? 2) What is the reason for not allowing true parallel assignment?

Read the article
C#/.NET Little Wonders: The Concurrent Collections (1 of 3)

- by James Michael Hare

Once again we consider some of the lesser known classes and keywords of C#. In the next few weeks, we will discuss the concurrent collections and how they have changed the face of concurrent programming. This week’s post will begin with a general introduction and discuss the ConcurrentStack<T> and ConcurrentQueue<T>. Then in the following post we’ll discuss the ConcurrentDictionary<T> and ConcurrentBag<T>. Finally, we shall close on the third post with a discussion of the BlockingCollection<T>. For more of the "Little Wonders" posts, see the index here. A brief history of collections In the beginning was the .NET 1.0 Framework. And out of this framework emerged the System.Collections namespace, and it was good. It contained all the basic things a growing programming language needs like the ArrayList and Hashtable collections. The main problem, of course, with these original collections is that they held items of type object which means you had to be disciplined enough to use them correctly or you could end up with runtime errors if you got an object of a type you weren't expecting. Then came .NET 2.0 and generics and our world changed forever! With generics the C# language finally got an equivalent of the very powerful C++ templates. As such, the System.Collections.Generic was born and we got type-safe versions of all are favorite collections. The List<T> succeeded the ArrayList and the Dictionary<TKey,TValue> succeeded the Hashtable and so on. The new versions of the library were not only safer because they checked types at compile-time, in many cases they were more performant as well. So much so that it's Microsoft's recommendation that the System.Collections original collections only be used for backwards compatibility. So we as developers came to know and love the generic collections and took them into our hearts and embraced them. The problem is, thread safety in both the original collections and the generic collections can be problematic, for very different reasons. Now, if you are only doing single-threaded development you may not care – after all, no locking is required. Even if you do have multiple threads, if a collection is “load-once, read-many” you don’t need to do anything to protect that container from multi-threaded access, as illustrated below: 1: public static class OrderTypeTranslator 2: { 3: // because this dictionary is loaded once before it is ever accessed, we don't need to synchronize 4: // multi-threaded read access 5: private static readonly Dictionary<string, char> _translator = new Dictionary<string, char> 6: { 7: {"New", 'N'}, 8: {"Update", 'U'}, 9: {"Cancel", 'X'} 10: }; 11: 12: // the only public interface into the dictionary is for reading, so inherently thread-safe 13: public static char? Translate(string orderType) 14: { 15: char charValue; 16: if (_translator.TryGetValue(orderType, out charValue)) 17: { 18: return charValue; 19: } 20: 21: return null; 22: } 23: } Unfortunately, most of our computer science problems cannot get by with just single-threaded applications or with multi-threading in a load-once manner. Looking at today's trends, it's clear to see that computers are not so much getting faster because of faster processor speeds -- we've nearly reached the limits we can push through with today's technologies -- but more because we're adding more cores to the boxes. With this new hardware paradigm, it is even more important to use multi-threaded applications to take full advantage of parallel processing to achieve higher application speeds. So let's look at how to use collections in a thread-safe manner. Using historical collections in a concurrent fashion The early .NET collections (System.Collections) had a Synchronized() static method that could be used to wrap the early collections to make them completely thread-safe. This paradigm was dropped in the generic collections (System.Collections.Generic) because having a synchronized wrapper resulted in atomic locks for all operations, which could prove overkill in many multithreading situations. Thus the paradigm shifted to having the user of the collection specify their own locking, usually with an external object: 1: public class OrderAggregator 2: { 3: private static readonly Dictionary<string, List<Order>> _orders = new Dictionary<string, List<Order>>(); 4: private static readonly _orderLock = new object(); 5: 6: public void Add(string accountNumber, Order newOrder) 7: { 8: List<Order> ordersForAccount; 9: 10: // a complex operation like this should all be protected 11: lock (_orderLock) 12: { 13: if (!_orders.TryGetValue(accountNumber, out ordersForAccount)) 14: { 15: _orders.Add(accountNumber, ordersForAccount = new List<Order>()); 16: } 17: 18: ordersForAccount.Add(newOrder); 19: } 20: } 21: } Notice how we’re performing several operations on the dictionary under one lock. With the Synchronized() static methods of the early collections, you wouldn’t be able to specify this level of locking (a more macro-level). So in the generic collections, it was decided that if a user needed synchronization, they could implement their own locking scheme instead so that they could provide synchronization as needed. The need for better concurrent access to collections Here’s the problem: it’s relatively easy to write a collection that locks itself down completely for access, but anything more complex than that can be difficult and error-prone to write, and much less to make it perform efficiently! For example, what if you have a Dictionary that has frequent reads but in-frequent updates? Do you want to lock down the entire Dictionary for every access? This would be overkill and would prevent concurrent reads. In such cases you could use something like a ReaderWriterLockSlim which allows for multiple readers in a lock, and then once a writer grabs the lock it blocks all further readers until the writer is done (in a nutshell). This is all very complex stuff to consider. Fortunately, this is where the Concurrent Collections come in. The Parallel Computing Platform team at Microsoft went through great pains to determine how to make a set of concurrent collections that would have the best performance characteristics for general case multi-threaded use. Now, as in all things involving threading, you should always make sure you evaluate all your container options based on the particular usage scenario and the degree of parallelism you wish to acheive. This article should not be taken to understand that these collections are always supperior to the generic collections. Each fills a particular need for a particular situation. Understanding what each container is optimized for is key to the success of your application whether it be single-threaded or multi-threaded. General points to consider with the concurrent collections The MSDN points out that the concurrent collections all support the ICollection interface. However, since the collections are already synchronized, the IsSynchronized property always returns false, and SyncRoot always returns null. Thus you should not attempt to use these properties for synchronization purposes. Note that since the concurrent collections also may have different operations than the traditional data structures you may be used to. Now you may ask why they did this, but it was done out of necessity to keep operations safe and atomic. For example, in order to do a Pop() on a stack you have to know the stack is non-empty, but between the time you check the stack’s IsEmpty property and then do the Pop() another thread may have come in and made the stack empty! This is why some of the traditional operations have been changed to make them safe for concurrent use. In addition, some properties and methods in the concurrent collections achieve concurrency by creating a snapshot of the collection, which means that some operations that were traditionally O(1) may now be O(n) in the concurrent models. I’ll try to point these out as we talk about each collection so you can be aware of any potential performance impacts. Finally, all the concurrent containers are safe for enumeration even while being modified, but some of the containers support this in different ways (snapshot vs. dirty iteration). Once again I’ll highlight how thread-safe enumeration works for each collection. ConcurrentStack<T>: The thread-safe LIFO container The ConcurrentStack<T> is the thread-safe counterpart to the System.Collections.Generic.Stack<T>, which as you may remember is your standard last-in-first-out container. If you think of algorithms that favor stack usage (for example, depth-first searches of graphs and trees) then you can see how using a thread-safe stack would be of benefit. The ConcurrentStack<T> achieves thread-safe access by using System.Threading.Interlocked operations. This means that the multi-threaded access to the stack requires no traditional locking and is very, very fast! For the most part, the ConcurrentStack<T> behaves like it’s Stack<T> counterpart with a few differences: Pop() was removed in favor of TryPop() Returns true if an item existed and was popped and false if empty. PushRange() and TryPopRange() were added Allows you to push multiple items and pop multiple items atomically. Count takes a snapshot of the stack and then counts the items. This means it is a O(n) operation, if you just want to check for an empty stack, call IsEmpty instead which is O(1). ToArray() and GetEnumerator() both also take snapshots. This means that iteration over a stack will give you a static view at the time of the call and will not reflect updates. Pushing on a ConcurrentStack<T> works just like you’d expect except for the aforementioned PushRange() method that was added to allow you to push a range of items concurrently. 1: var stack = new ConcurrentStack<string>(); 2: 3: // adding to stack is much the same as before 4: stack.Push("First"); 5: 6: // but you can also push multiple items in one atomic operation (no interleaves) 7: stack.PushRange(new [] { "Second", "Third", "Fourth" }); For looking at the top item of the stack (without removing it) the Peek() method has been removed in favor of a TryPeek(). This is because in order to do a peek the stack must be non-empty, but between the time you check for empty and the time you execute the peek the stack contents may have changed. Thus the TryPeek() was created to be an atomic check for empty, and then peek if not empty: 1: // to look at top item of stack without removing it, can use TryPeek. 2: // Note that there is no Peek(), this is because you need to check for empty first. TryPeek does. 3: string item; 4: if (stack.TryPeek(out item)) 5: { 6: Console.WriteLine("Top item was " + item); 7: } 8: else 9: { 10: Console.WriteLine("Stack was empty."); 11: } Finally, to remove items from the stack, we have the TryPop() for single, and TryPopRange() for multiple items. Just like the TryPeek(), these operations replace Pop() since we need to ensure atomically that the stack is non-empty before we pop from it: 1: // to remove items, use TryPop or TryPopRange to get multiple items atomically (no interleaves) 2: if (stack.TryPop(out item)) 3: { 4: Console.WriteLine("Popped " + item); 5: } 6: 7: // TryPopRange will only pop up to the number of spaces in the array, the actual number popped is returned. 8: var poppedItems = new string[2]; 9: int numPopped = stack.TryPopRange(poppedItems); 10: 11: foreach (var theItem in poppedItems.Take(numPopped)) 12: { 13: Console.WriteLine("Popped " + theItem); 14: } Finally, note that as stated before, GetEnumerator() and ToArray() gets a snapshot of the data at the time of the call. That means if you are enumerating the stack you will get a snapshot of the stack at the time of the call. This is illustrated below: 1: var stack = new ConcurrentStack<string>(); 2: 3: // adding to stack is much the same as before 4: stack.Push("First"); 5: 6: var results = stack.GetEnumerator(); 7: 8: // but you can also push multiple items in one atomic operation (no interleaves) 9: stack.PushRange(new [] { "Second", "Third", "Fourth" }); 10: 11: while(results.MoveNext()) 12: { 13: Console.WriteLine("Stack only has: " + results.Current); 14: } The only item that will be printed out in the above code is "First" because the snapshot was taken before the other items were added. This may sound like an issue, but it’s really for safety and is more correct. You don’t want to enumerate a stack and have half a view of the stack before an update and half a view of the stack after an update, after all. In addition, note that this is still thread-safe, whereas iterating through a non-concurrent collection while updating it in the old collections would cause an exception. ConcurrentQueue<T>: The thread-safe FIFO container The ConcurrentQueue<T> is the thread-safe counterpart of the System.Collections.Generic.Queue<T> class. The concurrent queue uses an underlying list of small arrays and lock-free System.Threading.Interlocked operations on the head and tail arrays. Once again, this allows us to do thread-safe operations without the need for heavy locks! The ConcurrentQueue<T> (like the ConcurrentStack<T>) has some departures from the non-concurrent counterpart. Most notably: Dequeue() was removed in favor of TryDequeue(). Returns true if an item existed and was dequeued and false if empty. Count does not take a snapshot It subtracts the head and tail index to get the count. This results overall in a O(1) complexity which is quite good. It’s still recommended, however, that for empty checks you call IsEmpty instead of comparing Count to zero. ToArray() and GetEnumerator() both take snapshots. This means that iteration over a queue will give you a static view at the time of the call and will not reflect updates. The Enqueue() method on the ConcurrentQueue<T> works much the same as the generic Queue<T>: 1: var queue = new ConcurrentQueue<string>(); 2: 3: // adding to queue is much the same as before 4: queue.Enqueue("First"); 5: queue.Enqueue("Second"); 6: queue.Enqueue("Third"); For front item access, the TryPeek() method must be used to attempt to see the first item if the queue. There is no Peek() method since, as you’ll remember, we can only peek on a non-empty queue, so we must have an atomic TryPeek() that checks for empty and then returns the first item if the queue is non-empty. 1: // to look at first item in queue without removing it, can use TryPeek. 2: // Note that there is no Peek(), this is because you need to check for empty first. TryPeek does. 3: string item; 4: if (queue.TryPeek(out item)) 5: { 6: Console.WriteLine("First item was " + item); 7: } 8: else 9: { 10: Console.WriteLine("Queue was empty."); 11: } Then, to remove items you use TryDequeue(). Once again this is for the same reason we have TryPeek() and not Peek(): 1: // to remove items, use TryDequeue. If queue is empty returns false. 2: if (queue.TryDequeue(out item)) 3: { 4: Console.WriteLine("Dequeued first item " + item); 5: } Just like the concurrent stack, the ConcurrentQueue<T> takes a snapshot when you call ToArray() or GetEnumerator() which means that subsequent updates to the queue will not be seen when you iterate over the results. Thus once again the code below will only show the first item, since the other items were added after the snapshot. 1: var queue = new ConcurrentQueue<string>(); 2: 3: // adding to queue is much the same as before 4: queue.Enqueue("First"); 5: 6: var iterator = queue.GetEnumerator(); 7: 8: queue.Enqueue("Second"); 9: queue.Enqueue("Third"); 10: 11: // only shows First 12: while (iterator.MoveNext()) 13: { 14: Console.WriteLine("Dequeued item " + iterator.Current); 15: } Using collections concurrently You’ll notice in the examples above I stuck to using single-threaded examples so as to make them deterministic and the results obvious. Of course, if we used these collections in a truly multi-threaded way the results would be less deterministic, but would still be thread-safe and with no locking on your part required! For example, say you have an order processor that takes an IEnumerable<Order> and handles each other in a multi-threaded fashion, then groups the responses together in a concurrent collection for aggregation. This can be done easily with the TPL’s Parallel.ForEach(): 1: public static IEnumerable<OrderResult> ProcessOrders(IEnumerable<Order> orderList) 2: { 3: var proxy = new OrderProxy(); 4: var results = new ConcurrentQueue<OrderResult>(); 5: 6: // notice that we can process all these in parallel and put the results 7: // into our concurrent collection without needing any external locking! 8: Parallel.ForEach(orderList, 9: order => 10: { 11: var result = proxy.PlaceOrder(order); 12: 13: results.Enqueue(result); 14: }); 15: 16: return results; 17: } Summary Obviously, if you do not need multi-threaded safety, you don’t need to use these collections, but when you do need multi-threaded collections these are just the ticket! The plethora of features (I always think of the movie The Three Amigos when I say plethora) built into these containers and the amazing way they acheive thread-safe access in an efficient manner is wonderful to behold. Stay tuned next week where we’ll continue our discussion with the ConcurrentBag<T> and the ConcurrentDictionary<TKey,TValue>. For some excellent information on the performance of the concurrent collections and how they perform compared to a traditional brute-force locking strategy, see this wonderful whitepaper by the Microsoft Parallel Computing Platform team here. Tweet Technorati Tags: C#,.NET,Concurrent Collections,Collections,Multi-Threading,Little Wonders,BlackRabbitCoder,James Michael Hare

Read the article
SQL SERVER – CXPACKET – Parallelism – Usual Solution – Wait Type – Day 6 of 28

- by pinaldave

CXPACKET has to be most popular one of all wait stats. I have commonly seen this wait stat as one of the top 5 wait stats in most of the systems with more than one CPU. Books On-Line: Occurs when trying to synchronize the query processor exchange iterator. You may consider lowering the degree of parallelism if contention on this wait type becomes a problem. CXPACKET Explanation: When a parallel operation is created for SQL Query, there are multiple threads for a single query. Each query deals with a different set of the data (or rows). Due to some reasons, one or more of the threads lag behind, creating the CXPACKET Wait Stat. There is an organizer/coordinator thread (thread 0), which takes waits for all the threads to complete and gathers result together to present on the client’s side. The organizer thread has to wait for the all the threads to finish before it can move ahead. The Wait by this organizer thread for slow threads to complete is called CXPACKET wait. Note that not all the CXPACKET wait types are bad. You might experience a case when it totally makes sense. There might also be cases when this is unavoidable. If you remove this particular wait type for any query, then that query may run slower because the parallel operations are disabled for the query. Reducing CXPACKET wait: We cannot discuss about reducing the CXPACKET wait without talking about the server workload type. OLTP: On Pure OLTP system, where the transactions are smaller and queries are not long but very quick usually, set the “Maximum Degree of Parallelism” to 1 (one). This way it makes sure that the query never goes for parallelism and does not incur more engine overhead. EXEC sys.sp_configure N'cost threshold for parallelism', N'1' GO RECONFIGURE WITH OVERRIDE GO Data-warehousing / Reporting server: As queries will be running for long time, it is advised to set the “Maximum Degree of Parallelism” to 0 (zero). This way most of the queries will utilize the parallel processor, and long running queries get a boost in their performance due to multiple processors. EXEC sys.sp_configure N'cost threshold for parallelism', N'0' GO RECONFIGURE WITH OVERRIDE GO Mixed System (OLTP & OLAP): Here is the challenge. The right balance has to be found. I have taken a very simple approach. I set the “Maximum Degree of Parallelism” to 2, which means the query still uses parallelism but only on 2 CPUs. However, I keep the “Cost Threshold for Parallelism” very high. This way, not all the queries will qualify for parallelism but only the query with higher cost will go for parallelism. I have found this to work best for a system that has OLTP queries and also where the reporting server is set up. Here, I am setting ‘Cost Threshold for Parallelism’ to 25 values (which is just for illustration); you can choose any value, and you can find it out by experimenting with the system only. In the following script, I am setting the ‘Max Degree of Parallelism’ to 2, which indicates that the query that will have a higher cost (here, more than 25) will qualify for parallel query to run on 2 CPUs. This implies that regardless of the number of CPUs, the query will select any two CPUs to execute itself. EXEC sys.sp_configure N'cost threshold for parallelism', N'25' GO EXEC sys.sp_configure N'max degree of parallelism', N'2' GO RECONFIGURE WITH OVERRIDE GO Read all the post in the Wait Types and Queue series. Additionally a must read comment of Jonathan Kehayias. Note: The information presented here is from my experience and I no way claim it to be accurate. I suggest you all to read the online book for further clarification. All the discussion of Wait Stats over here is generic and it varies from system to system. It is recommended that you test this on the development server before implementing on the production server. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: DMV, Pinal Dave, PostADay, SQL, SQL Authority, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQL Wait Stats, SQL Wait Types, T SQL, Technology

Read the article

Search Results

Search found 1774 results on 71 pages for 'parallel'.

Page 24/71 | < Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >

- by Ofer

- by user141146

- by poonam

- by Pinal Dave

- by pinaldave

- by TravisG

- by javarg

- by Nishant

- by James Michael Hare

- by Michael Williamson

- by Cameron Metzke

- by LMIT

- by quanta

- by rainbow365

- by pagid

- by rainbow365

- by kd304

- by Kevin Dente

- by Kevin Dente

- by Ole Tange

- by Edwin

- by S.N

- by dbyrne

- by James Michael Hare

- by pinaldave

< Previous Page | 20 21 22 23 24 25 26 27 28 29 30 31 | Next Page >