Search Results

Search found 1794 results on 72 pages for 'embarrassingly parallel'.

Page 13/72 | < Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20  | Next Page >

  • SQL University: Parallelism Week - Introduction

    - by Adam Machanic
    Welcome to Parallelism Week at SQL University . My name is Adam Machanic, and I'm your professor. Imagine having 8 brains, or 16, or 32. Imagine being able to break up complex thoughts and distribute them across your many brains, so that you could solve problems faster. Now quit imagining that, because you're human and you're stuck with only one brain, and you only get access to the entire thing if you're lucky enough to have avoided abusing too many recreational drugs. For your database server,...(read more)

    Read the article

  • Query Tuning Mastery at PASS Summit 2012: The Video

    - by Adam Machanic
    An especially clever community member was kind enough to reverse-engineer the video stream for me, and came up with a direct link to the PASS TV video stream for my Query Tuning Mastery: The Art and Science of Manhandling Parallelism talk, delivered at the PASS Summit last Thursday. I'm not sure how long this link will work , but I'd like to share it for my readers who were unable to see it in person or live on the stream. Start here. Skip past the keynote, to the 149 minute mark. Enjoy!...(read more)

    Read the article

  • Query Tuning Mastery at PASS Summit 2012: The Video

    - by Adam Machanic
    An especially clever community member was kind enough to reverse-engineer the video stream for me, and came up with a direct link to the PASS TV video stream for my Query Tuning Mastery: The Art and Science of Manhandling Parallelism talk, delivered at the PASS Summit last Thursday. I'm not sure how long this link will work , but I'd like to share it for my readers who were unable to see it in person or live on the stream. Start here. Skip past the keynote, to the 149 minute mark. Enjoy!...(read more)

    Read the article

  • Query Tuning Mastery at PASS Summit 2012: The Demos

    - by Adam Machanic
    For the second year in a row, I was asked to deliver a 500-level "Query Tuning Mastery" talk in room 6E of the Washington State Convention Center, for the PASS Summit. ( Here's some information about last year's talk, on workspace memory. ) And for the second year in a row, I had to deliver said talk at 10:15 in the morning, in a room used as overflow for the keynote, following a keynote speaker that didn't stop speaking on time. Frustrating! Last Thursday, after very, very quickly setting up and...(read more)

    Read the article

  • SQL University: Parallelism Week - Introduction

    - by Adam Machanic
    Welcome to Parallelism Week at SQL University . My name is Adam Machanic, and I'm your professor. Imagine having 8 brains, or 16, or 32. Imagine being able to break up complex thoughts and distribute them across your many brains, so that you could solve problems faster. Now quit imagining that, because you're human and you're stuck with only one brain, and you only get access to the entire thing if you're lucky enough to have avoided abusing too many recreational drugs. For your database server,...(read more)

    Read the article

  • Any frameworks or library allow me to run large amount of concurrent jobs schedully?

    - by Yoga
    Are there any high level programming frameworks that allow me to run large amount of concurrent jobs schedully? e.g. I have 100K of urls need to check their uptime every 5 minutes Definitely I can write a program to handle this, but then I need to handle concurrency, queuing, error handling, system throttling, job distribution etc. Will there be a framework that I only focus on a particular job (i.e. the ping task) and the system will take care of the scaling and error handling for me? I am open to any language.

    Read the article

  • Asynchronously returning a hierarchal data using .NET TPL... what should my return object "look" like?

    - by makerofthings7
    I want to use the .NET TPL to asynchronously do a DIR /S and search each subdirectory on a hard drive, and want to search for a word in each file... what should my API look like? In this scenario I know that each sub directory will have 0..10000 files or 0...10000 directories. I know the tree is unbalanced and want to return data (in relation to its position in the hierarchy) as soon as it's available. I am interested in getting data as quickly as possible, but also want to update that result if "better" data is found (better means closer to the root of c:) I may also be interested in finding all matches in relation to its position in the hierarchy. (akin to a report) Question: How should I return data to my caller? My first guess is that I think I need a shared object that will maintain the current "status" of the traversal (started | notstarted | complete ) , and might base it on the System.Collections.Concurrent. Another idea that I'm considering is the consumer/producer pattern (which ConcurrentCollections can handle) however I'm not sure what the objects "look" like. Optional Logical Constraint: The API doesn't have to address this, but in my "real world" design, if a directory has files, then only one file will ever contain the word I'm looking for.  If someone were to literally do a DIR /S as described above then they would need to account for more than one matching file per subdirectory. More information : I'm using Azure Tables to store a hierarchy of data using these TPL extension methods. A "node" is a table. Not only does each node in the hierarchy have a relation to any number of nodes, but it's possible for each node to have a reciprocal link back to any other node. This may have issues with recursion but I'm addressing that with a shared object in my recursion loop. Note that each "node" also has the ability to store local data unique to that node. It is this information that I'm searching for. In other words, I'm searching for a specific fixed RowKey in a hierarchy of nodes. When I search for the fixed RowKey in the hierarchy I'm interested in getting the results FAST (first node found) but prefer data that is "closer" to the starting point of the hierarchy. Since many nodes may have the particular RowKey I'm interested in, sometimes I may want to get a report of ALL the nodes that contain this RowKey.

    Read the article

  • Parallelize code using CUDA [migrated]

    - by user878944
    If I have a code which takes struct variable as input and manipulate it's elements, how can I parallelize this using CUDA? void BackpropagateLayer(NET* Net, LAYER* Upper, LAYER* Lower) { INT i,j; REAL Out, Err; for (i=1; i<=Lower->Units; i++) { Out = Lower->Output[i]; Err = 0; for (j=1; j<=Upper->Units; j++) { Err += Upper->Weight[j][i] * Upper->Error[j]; } Lower->Error[i] = Net->Gain * Out * (1-Out) * Err; } } Where NET and LAYER are structs defined as: typedef struct { /* A LAYER OF A NET: */ INT Units; /* - number of units in this layer */ REAL* Output; /* - output of ith unit */ REAL* Error; /* - error term of ith unit */ REAL** Weight; /* - connection weights to ith unit */ REAL** WeightSave; /* - saved weights for stopped training */ REAL** dWeight; /* - last weight deltas for momentum */ } LAYER; typedef struct { /* A NET: */ LAYER** Layer; /* - layers of this net */ LAYER* InputLayer; /* - input layer */ LAYER* OutputLayer; /* - output layer */ REAL Alpha; /* - momentum factor */ REAL Eta; /* - learning rate */ REAL Gain; /* - gain of sigmoid function */ REAL Error; /* - total net error */ } NET; What I could think of is to first convert the 2d Weight into 1d. And then send it to kernel to take the product or just use the CUBLAS library. Any suggestions?

    Read the article

  • Dividing sections inside an omp parallel for : OpenMP

    - by Sayan Ghosh
    Hi, I have a situation like: #pragma omp parallel for private(i, j, k, val, p, l) for (i = 0; i < num1; i++) { for (j = 0; j < num2; j++) { for (k = 0; k < num3; k++) { val = m[i + j*somenum + k*2] if (val != 0) for (l = start; l <= end; l++) { someFunctionThatWritesIntoGlobalArray((i + l), j, k, (someFunctionThatGetsValueFromAnotherArray((i + l), j, k) * val)); } } } for (p = 0; p < num4; p++) { m[p] = 0; } } Thanks for reading, phew! Well I am noticing a very minor difference in the results (0.999967[omp] against 1[serial]), when I use the above (which is 3 times faster) against the serial implementation. Now I know I am doing a mistake here...especially the connection between loops is evident. Is it possible to parallelize this using omp sections? I tried some options like making shared(p) {doing this, I got correct values, as in the serial form}, but there was no speedup then. Any general advice on handling openmp pragmas over a slew of for loops would also be great for me!

    Read the article

  • Parallel scroll textarea and webpage with jquery

    - by Roger Rogers
    This is both a conceptual and how-to question: In wiki formatting, or non WYSIWYG editor scenarios, you typically have a textarea for content entry and then an ancillary preview pane to show results, just like StackOverflow. This works fairly well, except with larger amounts of text, such as full page wikis, etc. I have a concept that I'd like critical feedback/advice on: Envision a two pane layout, with the preview content on the left side, taking up ~ 2/3 of the page, and the textarea on the right side, taking up ~ 1/3 of the page. The textarea would float, to remain in view, even if the user scrolls the browser window. Furthermore, if the user scrolls the textarea content, supposing it has exceeded the textarea's frame size, the page would scroll so that the content presently showing in the textarea syncs/is parallel with the content showing in the browser window. I'm imagining a wiki scenario, where going back and forth between markup and preview is frustrating. I'm curious what others think; is there anything out there like this? Any suggestions on how to attack this functionality (ideally using jquery)? Thanks

    Read the article

  • php mysql parallel array checkboxes

    - by gramware
    I have an array of checkboxes that I edit at once to set up a 'tinyint' field. the problem comes in when i uncheck the checkbox and post the vales to mysql. since it posts an array of checkboxes and another parallel array of values to edit, unchecking a checkbox results in the 0 value been ignored by PHP_POST and hence the checkbox array will be less by the number of unchecked values in the form while the array to be edited will have all the records in the form. here is the submit code while($row=mysql_fetch_array($result)) { $checked = ($row[active]==1) ? 'checked="checked"' : ''; ... echo "<input type='hidden' name='TrID[]' value='$TrID'>"; echo "<input type='checkbox' name='active1[]' value='$row[active]''$checked' >"; ... and the processing php script $userid = ($_POST['TrID']); $checked= ($_POST['active']); $i=0; foreach ($userid as $usid) { if ($checked[$i]==1){ $check = 1; } else{ $check = 0; } $qry1 ="UPDATE `epapers`.`clientelle` SET `active` = '$check' WHERE `clientelle`.`user_id` = '$usid' "; $result = mysql_query($qry1); $i++; }

    Read the article

  • using AsyncTask class for parallel operationand displaying a progress bar

    - by Kumar
    I am displaying a progress bar using Async task class and simulatneously in parallel operation , i want to retrieve a string array from a function of another class that takes some time to return the string array. The problem is that when i place the function call in doing backgroung function of AsyncTask class , it gives an error in Doing Background and gives the message as cant change the UI in doing Background .. Therefore , i placed the function call in post Execute method of Asynctask class . It doesnot give an error but after the progress bar has reached 100% , then the screen goes black and takes some time to start the new activity. How can i display the progress bar and make the function call simultaneously.??plz help , m in distress here is the code package com.integrated.mpr; import android.app.Activity; import android.app.ProgressDialog; import android.content.Intent; import android.os.AsyncTask; import android.os.Bundle; import android.os.Handler; import android.view.View; import android.view.View.OnClickListener; import android.widget.Button; public class Progess extends Activity implements OnClickListener{ static String[] display = new String[Choose.n]; Button bprogress; @Override protected void onCreate(Bundle savedInstanceState) { // TODO Auto-generated method stub super.onCreate(savedInstanceState); setContentView(R.layout.progress); bprogress = (Button) findViewById(R.id.bProgress); bprogress.setOnClickListener(this); } @Override public void onClick(View v) { // TODO Auto-generated method stub switch(v.getId()){ case R.id.bProgress: String x ="abc"; new loadSomeStuff().execute(x); break; } } public class loadSomeStuff extends AsyncTask<String , Integer , String>{ ProgressDialog dialog; protected void onPreExecute(){ dialog = new ProgressDialog(Progess.this); dialog.setProgressStyle(ProgressDialog.STYLE_HORIZONTAL); dialog.setMax(100); dialog.show(); } @Override protected String doInBackground(String... arg0) { // TODO Auto-generated method stub for(int i = 0 ;i<40;i++){ publishProgress(5); try { Thread.sleep(1000); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } dialog.dismiss(); String y ="abc"; return y; } protected void onProgressUpdate(Integer...progress){ dialog.incrementProgressBy(progress[0]); } protected void onPostExecute(String result){ display = new Logic().finaldata(); Intent openList = new Intent("com.integrated.mpr.SENSITIVELIST"); startActivity(openList); } } }

    Read the article

  • Trying to run multiple HTTP requests in parallel, but being limited by Windows (registry)

    - by Nailuj
    I'm developing an application (winforms C# .NET 4.0) where I access a lookup functionality from a 3rd party through a simple HTTP request. I call an url with a parameter, and in return I get a small string with the result of the lookup. Simple enough. The challenge is however, that I have to do lots of these lookups (a couple of thousands), and I would like to limit the time needed. Therefore I would like to run requests in parallel (say 10-20). I use a ThreadPool to do this, and the short version of my code looks like this: public void startAsyncLookup(Action<LookupResult> returnLookupResult) { this.returnLookupResult = returnLookupResult; foreach (string number in numbersToLookup) { ThreadPool.QueueUserWorkItem(lookupNumber, number); } } public void lookupNumber(Object threadContext) { string numberToLookup = (string)threadContext; string url = @"http://some.url.com/?number=" + numberToLookup; WebClient webClient = new WebClient(); Stream responseData = webClient.OpenRead(url); LookupResult lookupResult = parseLookupResult(responseData); returnLookupResult(lookupResult); } I fill up numbersToLookup (a List<String>) from another place, call startAsyncLookup and provide it with a call-back function returnLookupResult to return each result. This works, but I found that I'm not getting the throughput I want. Initially I thought it might be the 3rd party having a poor system on their end, but I excluded this by trying to run the same code from two different machines at the same time. Each of the two took as long as one did alone, so I could rule out that one. A colleague then tipped me that this might be a limitation in Windows. I googled a bit, and found amongst others this post saying that by default Windows limits the number of simultaneous request to the same web server to 4 for HTTP 1.0 and to 2 for HTTP 1.1 (for HTTP 1.1 this is actually according to the specification (RFC2068)). The same post referred to above also provided a way to increase these limits. By adding two registry values to [HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings] (MaxConnectionsPerServer and MaxConnectionsPer1_0Server), I could control this myself. So, I tried this (sat both to 20), restarted my computer, and tried to run my program again. Sadly though, it didn't seem to help any. I also kept an eye on the Resource Monitor (see screen shot) while running my batch lookup, and I noticed that my application (the one with the title blacked out) still only was using two TCP connections. So, the question is, why isn't this working? Is the post I linked to using the wrong registry values? Is this perhaps not possible to "hack" in Windows any longer (I'm on Windows 7)? Any ideas would be highly appreciated :) And just in case anyone should wonder, I have also tried with different settings for MaxThreads on ThreadPool (everyting from 10 to 100), and this didn't seem to affect my throughput at all, so the problem shouldn't be there either.

    Read the article

  • Ikoula lance un nouveau serveur dédié, le Green GPU propose « 192 CUDA Parallel Processor Cores » aux professionnels de la création graphique

    Ikoula lance un nouveau serveur dédié le Green GPU propose « 192 CUDA Parallel Processor Cores » aux professionnels de la création graphiqueL'hébergeur français Ikoula propose à la location un nouveau serveur dédié qui intègre une carte graphique professionnelle ou GPU. La Nvidia Quadro 2000D est la carte retenue pour le lancement de cette nouvelle offre de serveur dédié. La Quadro 2000D bénéficie du coeur de la technologie Fermi de Nvidia et propose 192 CUDA Parallel Processor Cores, le tout accompagné...

    Read the article

  • What about parallelism across network using multiple PCs?

    - by MainMa
    Parallel computing is used more and more, and new framework features and shortcuts make it easier to use (for example Parallel extensions which are directly available in .NET 4). Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc. Something like, in C#: NetworkParallel.ForEach(myEnumerable, () => { // Computing and/or access to web ressource or local network database here }); I understand that it is very different from the multi-core parallelism. The two most obvious differences would probably be: The fact that such parallel task will be limited to computing, without being able for example to use files stored locally (but why not a database?), or even to use local variables, because it would be rather two distinct applications than two threads of the same application, The very specific implementation, requiring not just a separate thread (which is quite easy), but spanning a process on different machines, then communicating with them over local network. Despite those differences, such parallelism is quite possible, even without speaking about distributed architecture. Do you think it will be implemented in a few years? Do you agree that it enables developers to easily develop extremely powerfull stuff with much less pain? Example: Think about a business application which extracts data from the database, transforms it, and displays statistics. Let's say this application takes ten seconds to load data, twenty seconds to transform data and ten seconds to build charts on a single machine in a company, using all the CPU, whereas ten other machines are used at 5% of CPU most of the time. In a such case, every action may be done in parallel, resulting in probably six to ten seconds for overall process instead of forty.

    Read the article

  • Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

    - by Reed
    Parallel LINQ and the Task Parallel Library contain many options for configuration.  Although the default configuration options are often ideal, there are times when customizing the behavior is desirable.  Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine.  By default, PLINQ and the TPL both use the ThreadPool to schedule tasks.  Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal.  However, there are times that the default behavior is not appropriate.  For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system.  Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class.  All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property.  For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image.  If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system.  This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system.  Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception.  I also did not hard code a specific value for the MaxDegreeOfParallelism property.  One of our goals when parallelizing a routine is allowing it to scale on better hardware.  Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system.  The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads.  In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation.  If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options.  By default, PLINQ will occasionally revert to processing a query in parallel.  This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent.  By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel.  This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread.  However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query.  There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly.  In these cases, you can override the default behavior by using the WithExecutionMode extension method.  This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>.  We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned.  When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result.  For example, the method above returns a new, reversed collection.  In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead.  IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues.  There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller.  This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach.  However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot.  The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest.  However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes.  By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain.  Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks.  This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes.  This can be done by using the WithMergeOptions extension method.  For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring.  This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query.  By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

    Read the article

  • Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

    - by Reed
    In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine.  If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine.  The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data.  Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection.  One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism.  In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library.  This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple.  Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter.  Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel.  Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition.  Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns.  We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change.  Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine.  Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method.  In this case, our for loop iteration block becomes a delegate creating via a lambda expression.  This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea.  There is a balance to be struck when writing parallel code.  We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce.  In this case, we have an image of data – most likely hundreds of pixels in both dimensions.  By just parallelizing our first loop, each row of pixels can be run as a single task.  With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks.  This introduces extra overhead with no extra gain, and will actually reduce our overall performance.  This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop.  I could have just as easily partitioned the inner loop.  However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels).  My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop.  If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance.  If we’re iterating through an IList<T> or an array, this is a typical approach.  However, there are other iteration statements common in C#.  In many situations, we’ll use foreach instead of a for loop.  This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation.  Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change.  Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist.  If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression.  This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel.  The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library.  These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism.  Iterating through collections and performing “work” is a very common pattern in nearly every application.  When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls.  Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

    Read the article

  • Parallelism in .NET – Part 4, Imperative Data Parallelism: Aggregation

    - by Reed
    In the article on simple data parallelism, I described how to perform an operation on an entire collection of elements in parallel.  Often, this is not adequate, as the parallel operation is going to be performing some form of aggregation. Simple examples of this might include taking the sum of the results of processing a function on each element in the collection, or finding the minimum of the collection given some criteria.  This can be done using the techniques described in simple data parallelism, however, special care needs to be taken into account to synchronize the shared data appropriately.  The Task Parallel Library has tools to assist in this synchronization. The main issue with aggregation when parallelizing a routine is that you need to handle synchronization of data.  Since multiple threads will need to write to a shared portion of data.  Suppose, for example, that we wanted to parallelize a simple loop that looked for the minimum value within a dataset: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This seems like a good candidate for parallelization, but there is a problem here.  If we just wrap this into a call to Parallel.ForEach, we’ll introduce a critical race condition, and get the wrong answer.  Let’s look at what happens here: // Buggy code! Do not use! double min = double.MaxValue; Parallel.ForEach(collection, item => { double value = item.PerformComputation(); min = System.Math.Min(min, value); }); This code has a fatal flaw: min will be checked, then set, by multiple threads simultaneously.  Two threads may perform the check at the same time, and set the wrong value for min.  Say we get a value of 1 in thread 1, and a value of 2 in thread 2, and these two elements are the first two to run.  If both hit the min check line at the same time, both will determine that min should change, to 1 and 2 respectively.  If element 1 happens to set the variable first, then element 2 sets the min variable, we’ll detect a min value of 2 instead of 1.  This can lead to wrong answers. Unfortunately, fixing this, with the Parallel.ForEach call we’re using, would require adding locking.  We would need to rewrite this like: // Safe, but slow double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach(collection, item => { double value = item.PerformComputation(); lock(syncObject) min = System.Math.Min(min, value); }); This will potentially add a huge amount of overhead to our calculation.  Since we can potentially block while waiting on the lock for every single iteration, we will most likely slow this down to where it is actually quite a bit slower than our serial implementation.  The problem is the lock statement – any time you use lock(object), you’re almost assuring reduced performance in a parallel situation.  This leads to two observations I’ll make: When parallelizing a routine, try to avoid locks. That being said: Always add any and all required synchronization to avoid race conditions. These two observations tend to be opposing forces – we often need to synchronize our algorithms, but we also want to avoid the synchronization when possible.  Looking at our routine, there is no way to directly avoid this lock, since each element is potentially being run on a separate thread, and this lock is necessary in order for our routine to function correctly every time. However, this isn’t the only way to design this routine to implement this algorithm.  Realize that, although our collection may have thousands or even millions of elements, we have a limited number of Processing Elements (PE).  Processing Element is the standard term for a hardware element which can process and execute instructions.  This typically is a core in your processor, but many modern systems have multiple hardware execution threads per core.  The Task Parallel Library will not execute the work for each item in the collection as a separate work item. Instead, when Parallel.ForEach executes, it will partition the collection into larger “chunks” which get processed on different threads via the ThreadPool.  This helps reduce the threading overhead, and help the overall speed.  In general, the Parallel class will only use one thread per PE in the system. Given the fact that there are typically fewer threads than work items, we can rethink our algorithm design.  We can parallelize our algorithm more effectively by approaching it differently.  Because the basic aggregation we are doing here (Min) is communitive, we do not need to perform this in a given order.  We knew this to be true already – otherwise, we wouldn’t have been able to parallelize this routine in the first place.  With this in mind, we can treat each thread’s work independently, allowing each thread to serially process many elements with no locking, then, after all the threads are complete, “merge” together the results. This can be accomplished via a different set of overloads in the Parallel class: Parallel.ForEach<TSource,TLocal>.  The idea behind these overloads is to allow each thread to begin by initializing some local state (TLocal).  The thread will then process an entire set of items in the source collection, providing that state to the delegate which processes an individual item.  Finally, at the end, a separate delegate is run which allows you to handle merging that local state into your final results. To rewriting our routine using Parallel.ForEach<TSource,TLocal>, we need to provide three delegates instead of one.  The most basic version of this function is declared as: public static ParallelLoopResult ForEach<TSource, TLocal>( IEnumerable<TSource> source, Func<TLocal> localInit, Func<TSource, ParallelLoopState, TLocal, TLocal> body, Action<TLocal> localFinally ) The first delegate (the localInit argument) is defined as Func<TLocal>.  This delegate initializes our local state.  It should return some object we can use to track the results of a single thread’s operations. The second delegate (the body argument) is where our main processing occurs, although now, instead of being an Action<T>, we actually provide a Func<TSource, ParallelLoopState, TLocal, TLocal> delegate.  This delegate will receive three arguments: our original element from the collection (TSource), a ParallelLoopState which we can use for early termination, and the instance of our local state we created (TLocal).  It should do whatever processing you wish to occur per element, then return the value of the local state after processing is completed. The third delegate (the localFinally argument) is defined as Action<TLocal>.  This delegate is passed our local state after it’s been processed by all of the elements this thread will handle.  This is where you can merge your final results together.  This may require synchronization, but now, instead of synchronizing once per element (potentially millions of times), you’ll only have to synchronize once per thread, which is an ideal situation. Now that I’ve explained how this works, lets look at the code: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Although this is a bit more complicated than the previous version, it is now both thread-safe, and has minimal locking.  This same approach can be used by Parallel.For, although now, it’s Parallel.For<TLocal>.  When working with Parallel.For<TLocal>, you use the same triplet of delegates, with the same purpose and results. Also, many times, you can completely avoid locking by using a method of the Interlocked class to perform the final aggregation in an atomic operation.  The MSDN example demonstrating this same technique using Parallel.For uses the Interlocked class instead of a lock, since they are doing a sum operation on a long variable, which is possible via Interlocked.Add. By taking advantage of local state, we can use the Parallel class methods to parallelize algorithms such as aggregation, which, at first, may seem like poor candidates for parallelization.  Doing so requires careful consideration, and often requires a slight redesign of the algorithm, but the performance gains can be significant if handled in a way to avoid excessive synchronization.

    Read the article

  • Using DB_PARAMS to Tune the EP_LOAD_SALES Performance

    - by user702295
    The DB_PARAMS table can be used to tune the EP_LOAD_SALES performance.  The AWR report supplied shows 16 CPUs so I imaging that you can run with 8 or more parallel threads.  This can be done by setting the following DB_PARAMS parameters.  Note that most of parameter changes are just changing a 2 or 4 into an 8: DBHintEp_Load_SalesUseParallel = TRUE DBHintEp_Load_SalesUseParallelDML = TRUE DBHintEp_Load_SalesInsertErr = + parallel(@T_SRC_SALES@ 8) full(@T_SRC_SALES@) DBHintEp_Load_SalesInsertLd  = + parallel(@T_SRC_SALES@ 8) DBHintEp_Load_SalesMergeSALES_DATA = + parallel(@T_SRC_SALES_LD@ 8) full(@T_SRC_SALES_LD@) DBHintMdp_AddUpdateIs_Fictive0SD = + parallel(s 8 ) DBHintMdp_AddUpdateIs_Fictive2SD = + parallel(s 8 )

    Read the article

  • Partition Table and Exadata Hybrid Columnar Compression (EHCC)

    - by Bandari Huang
    Create EHCC table CREATE TABLE ... COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH]; select owner,table_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ‘ENABLED'; Convert Table/Partition/Subpartition to EHCC Compress Table&Partition&Subpartition to EHCC: ALTER TABLE table_name MOVE COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH] [PARALLEL <dop>]; ALTER TABLE table_name MOVE PARATITION partition_name COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH] [PARALLEL <dop>]; ALTER TABLE table_name MOVE SUBPARATITION subpartition_name COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH] [PARALLEL <dop>]; select owner,table_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ‘ENABLED'; select table_owner,table_name,partition_name,compress_for DBA_TAB_PARTITIONS where compression = ‘ENABLED’; select table_owner,table_name,subpartition_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ‘ENABLED’; Rebuild Unusable Index: select index_name from dba_index where status = 'UNUSABLE'; select index_name,partition_name from dba_ind_partition where status = 'UNUSABLE'; select index_name,subpartition_name from dba_ind_partition where status = 'UNUSABLE'; ALTER INDEX index_name REBUILD [PARALLEL <dop>]; ALTER INDEX index_name REBUILD PARTITION partition_name [PARALLEL <dop>]; ALTER INDEX index_name REBUILD SUBPARTITION subpartition_name [PARALLEL <dop>]; Convert Table/Partition/Subpartition from EHCC to OLTP compression or uncompressed format: Uncompress EHCC Table&Partition&Subpartition: ALTER TABLE table_name MOVE [NOCOMPRESS|COMPRESS for OLTP] [PARALLEL <dop>]; ALTER TABLE table_name MOVE PARTITION partition_name [NOCOMPRESS|COMPRESS for OLTP] [PARALLEL <dop>]; ALTER TABLE table_name MOVE SUBPARTITION subpartition_name [NOCOMPRESS|COMPRESS for OLTP] [PARALLEL <dop>]; select owner,table_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ''; select table_owner,table_name,partition_name,compress_for DBA_TAB_PARTITIONS where compression = ''; select table_owner,table_name,subpartition_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ''; Rebuild Unusable Index: select index_name from dba_index where status = 'UNUSABLE'; select index_name,partition_name from dba_ind_partition where status = 'UNUSABLE'; select index_name,subpartition_name from dba_ind_partition where status = 'UNUSABLE'; ALTER INDEX index_name REBUILD [PARALLEL <dop>]; ALTER INDEX index_name REBUILD PARTITION partition_name [PARALLEL <dop>]; ALTER INDEX index_name REBUILD SUBPARTITION subpartition_name [PARALLEL <dop>];

    Read the article

  • No LPT port in Windows 7 virtual machines

    - by KeyboardMonkey
    Windows 7 has MS virtual PC integrated, the VM settings don't give a parallel LPT port mapping to the physical machine. Where did it go? Has anyone else noticed this, and found a solution? Update: After much digging, I found the one and only reference to this issue, on the VPC Blog: "Parallel port devices are not supported, as they are relatively rare today." -More details- It's a XP VM I've been using since VPC 2007 days, which did have this functionality. This is to configure barcode printers via the LPT port. Since the (new) MS VM can't map to my physical LPT port, I'm having a hard time configuring printers. My physical ports are enabled in the BIOS. It has worked the past 3 years, before switching to Win 7. Any help is appreciated. This screen shot of the VM settings shows COM ports, but LPT is no more In contrast, here is a screen shot of VPC 2007 (before it got integrated into Win 7). Notice how it has LPT support

    Read the article

  • Parallel Class/Interface Hierarchy with the Facade Design Pattern?

    - by Mike G
    About a third of my code is wrapped inside a Facade class. Note that this isn't a "God" class, but actually represents a single thing (called a Line). Naturally, it delegates responsibilities to the subsystem behind it. What ends up happening is that two of the subsystem classes (Output and Timeline) have all of their methods duplicated in the Line class, which effectively makes Line both an Output and a Timeline. It seems to make sense to make Output and Timeline interfaces, so that the Line class can implement them both. At the same time, I'm worried about creating parallel class and interface structures. You see, there are different types of lines AudioLine, VideoLine, which all use the same type of Timeline, but different types of Output (AudioOutput and VideoOutput, respectively). So that would mean that I'd have to create an AudioOutputInterface and VideoOutputInterface as well. So not only would I have to have parallel class hierarchy, but there would be a parallel interface hierarchy as well. Is there any solution to this design flaw? Here's an image of the basic structure (minus the Timeline class, though know that each Line has-a Timeline): NOTE: I just realized that the word 'line' in Timeline might make is sound like is does a similar function as the Line class. They don't, just to clarify.

    Read the article

  • C++/g++: Concurrent programm

    - by phimuemue
    Hi, I got a C++ program (source) that is said to work in parallel. However, if I compile it (I am using Ubuntu 10.04 and g++ 4.4.3) with g++ and run it, one of my two CPU cores gets full load while the other is doing "nothing". So I spoke to the one who gave me the program. I was told that I had to set specific flags for g++ in order to get the program compiled for 2 CPU cores. However, if I look at the code I'm not able to find any lines that point to parallelism. So I have two questions: Are there any C++-intrinsics for multithreaded applications, i.e. is it possible to write parallel code without any extra libraries (because I did not find any non-standard libraries included)? Is it true that there are indeed flags for g++ that tell the compiler to compile the program for 2 CPU cores and to compile it so it runs in parallel (and if: what are they)?

    Read the article

< Previous Page | 9 10 11 12 13 14 15 16 17 18 19 20  | Next Page >