Search Results

Search found 3521 results on 141 pages for 'parallel computing'.

Page 27/141 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >

Ouverture de la rubrique Cloud Computing, pour trouver les ressources nécessaires à la compréhension et à l'utilisation du "Cloud"

Bonjour à tous, La rubrique Cloud Computing vient de voir le jour à l'adresse http://cloud-computing.developpez.com. Cette rubrique contiendra des news et toutes les ressources nécessaires à la compréhension, à l'utilisation et au développement pour et avec le "Cloud". Si vous avez des idées de tutoriels, d'articles, de sources ou encore de Q/R pour de prochaines FAQ, n'hésitez pas à nous en faire part. Très cordialement, Gordon...

Read the article
Ouverture de la rubrique Cloud Computing, pour trouver les ressources nécessaires à la compréhension et à l'utilisation du "Cloud"

Bonjour à tous, La rubrique Cloud Computing vient de voir le jour à l'adresse http://cloud-computing.developpez.com. Cette rubrique contiendra des news et toutes les ressources nécessaires à la compréhension, à l'utilisation et au développement pour et avec le "Cloud" (de Windows Azure aux Google Apps en passant par Salesforce et les serveurs HPC). Si vous avez des idées de tutoriels, d'articles, de sources ou encore de Q/R pour de prochaines FAQ, n'hésitez pas à nous en faire part. Très cordialement, Gordon...

Read the article
What about parallelism across network using multiple PCs?

- by MainMa

Parallel computing is used more and more, and new framework features and shortcuts make it easier to use (for example Parallel extensions which are directly available in .NET 4). Now what about the parallelism across network? I mean, an abstraction of everything related to communications, creation of processes on remote machines, etc. Something like, in C#: NetworkParallel.ForEach(myEnumerable, () => { // Computing and/or access to web ressource or local network database here }); I understand that it is very different from the multi-core parallelism. The two most obvious differences would probably be: The fact that such parallel task will be limited to computing, without being able for example to use files stored locally (but why not a database?), or even to use local variables, because it would be rather two distinct applications than two threads of the same application, The very specific implementation, requiring not just a separate thread (which is quite easy), but spanning a process on different machines, then communicating with them over local network. Despite those differences, such parallelism is quite possible, even without speaking about distributed architecture. Do you think it will be implemented in a few years? Do you agree that it enables developers to easily develop extremely powerfull stuff with much less pain? Example: Think about a business application which extracts data from the database, transforms it, and displays statistics. Let's say this application takes ten seconds to load data, twenty seconds to transform data and ten seconds to build charts on a single machine in a company, using all the CPU, whereas ten other machines are used at 5% of CPU most of the time. In a such case, every action may be done in parallel, resulting in probably six to ten seconds for overall process instead of forty.

Read the article
Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

- by Reed

Parallel LINQ and the Task Parallel Library contain many options for configuration. Although the default configuration options are often ideal, there are times when customizing the behavior is desirable. Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine. By default, PLINQ and the TPL both use the ThreadPool to schedule tasks. Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal. However, there are times that the default behavior is not appropriate. For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system. Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class. All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property. For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image. If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system. This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system. Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception. I also did not hard code a specific value for the MaxDegreeOfParallelism property. One of our goals when parallelizing a routine is allowing it to scale on better hardware. Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system. The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads. In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation. If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options. By default, PLINQ will occasionally revert to processing a query in parallel. This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent. By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel. This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread. However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query. There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly. In these cases, you can override the default behavior by using the WithExecutionMode extension method. This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>. We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned. When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result. For example, the method above returns a new, reversed collection. In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead. IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues. There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller. This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach. However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot. The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest. However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes. By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain. Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks. This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes. This can be done by using the WithMergeOptions extension method. For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring. This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query. By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

Read the article
Parallelism in .NET – Part 2, Simple Imperative Data Parallelism

- by Reed

In my discussion of Decomposition of the problem space, I mentioned that Data Decomposition is often the simplest abstraction to use when trying to parallelize a routine. If a problem can be decomposed based off the data, we will often want to use what MSDN refers to as Data Parallelism as our strategy for implementing our routine. The Task Parallel Library in .NET 4 makes implementing Data Parallelism, for most cases, very simple. Data Parallelism is the main technique we use to parallelize a routine which can be decomposed based off data. Data Parallelism refers to taking a single collection of data, and having a single operation be performed concurrently on elements in the collection. One side note here: Data Parallelism is also sometimes referred to as the Loop Parallelism Pattern or Loop-level Parallelism. In general, for this series, I will try to use the terminology used in the MSDN Documentation for the Task Parallel Library. This should make it easier to investigate these topics in more detail. Once we’ve determined we have a problem that, potentially, can be decomposed based on data, implementation using Data Parallelism in the TPL is quite simple. Let’s take our example from the Data Decomposition discussion – a simple contrast stretching filter. Here, we have a collection of data (pixels), and we need to run a simple operation on each element of the pixel. Once we know the minimum and maximum values, we most likely would have some simple code like the following: for (int row=0; row < pixelData.GetUpperBound(0); ++row) { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This simple routine loops through a two dimensional array of pixelData, and calls the AdjustContrast routine on each pixel. As I mentioned, when you’re decomposing a problem space, most iteration statements are potentially candidates for data decomposition. Here, we’re using two for loops – one looping through rows in the image, and a second nested loop iterating through the columns. We then perform one, independent operation on each element based on those loop positions. This is a prime candidate – we have no shared data, no dependencies on anything but the pixel which we want to change. Since we’re using a for loop, we can easily parallelize this using the Parallel.For method in the TPL: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Here, by simply changing our first for loop to a call to Parallel.For, we can parallelize this portion of our routine. Parallel.For works, as do many methods in the TPL, by creating a delegate and using it as an argument to a method. In this case, our for loop iteration block becomes a delegate creating via a lambda expression. This lets you write code that, superficially, looks similar to the familiar for loop, but functions quite differently at runtime. We could easily do this to our second for loop as well, but that may not be a good idea. There is a balance to be struck when writing parallel code. We want to have enough work items to keep all of our processors busy, but the more we partition our data, the more overhead we introduce. In this case, we have an image of data – most likely hundreds of pixels in both dimensions. By just parallelizing our first loop, each row of pixels can be run as a single task. With hundreds of rows of data, we are providing fine enough granularity to keep all of our processors busy. If we parallelize both loops, we’re potentially creating millions of independent tasks. This introduces extra overhead with no extra gain, and will actually reduce our overall performance. This leads to my first guideline when writing parallel code: Partition your problem into enough tasks to keep each processor busy throughout the operation, but not more than necessary to keep each processor busy. Also note that I parallelized the outer loop. I could have just as easily partitioned the inner loop. However, partitioning the inner loop would have led to many more discrete work items, each with a smaller amount of work (operate on one pixel instead of one row of pixels). My second guideline when writing parallel code reflects this: Partition your problem in a way to place the most work possible into each task. This typically means, in practice, that you will want to parallelize the routine at the “highest” point possible in the routine, typically the outermost loop. If you’re looking at parallelizing methods which call other methods, you’ll want to try to partition your work high up in the stack – as you get into lower level methods, the performance impact of parallelizing your routines may not overcome the overhead introduced. Parallel.For works great for situations where we know the number of elements we’re going to process in advance. If we’re iterating through an IList<T> or an array, this is a typical approach. However, there are other iteration statements common in C#. In many situations, we’ll use foreach instead of a for loop. This can be more understandable and easier to read, but also has the advantage of working with collections which only implement IEnumerable<T>, where we do not know the number of elements involved in advance. As an example, lets take the following situation. Say we have a collection of Customers, and we want to iterate through each customer, check some information about the customer, and if a certain case is met, send an email to the customer and update our instance to reflect this change. Normally, this might look something like: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } } Here, we’re doing a fair amount of work for each customer in our collection, but we don’t know how many customers exist. If we assume that theStore.GetLastContact(customer) and theStore.EmailCustomer(customer) are both side-effect free, thread safe operations, we could parallelize this using Parallel.ForEach: Parallel.ForEach(customers, customer => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { theStore.EmailCustomer(customer); customer.LastEmailContact = DateTime.Now; } }); Just like Parallel.For, we rework our loop into a method call accepting a delegate created via a lambda expression. This keeps our new code very similar to our original iteration statement, however, this will now execute in parallel. The same guidelines apply with Parallel.ForEach as with Parallel.For. The other iteration statements, do and while, do not have direct equivalents in the Task Parallel Library. These, however, are very easy to implement using Parallel.ForEach and the yield keyword. Most applications can benefit from implementing some form of Data Parallelism. Iterating through collections and performing “work” is a very common pattern in nearly every application. When the problem can be decomposed by data, we often can parallelize the workload by merely changing foreach statements to Parallel.ForEach method calls, and for loops to Parallel.For method calls. Any time your program operates on a collection, and does a set of work on each item in the collection where that work is not dependent on other information, you very likely have an opportunity to parallelize your routine.

Read the article
Parallelism in .NET – Part 4, Imperative Data Parallelism: Aggregation

- by Reed

In the article on simple data parallelism, I described how to perform an operation on an entire collection of elements in parallel. Often, this is not adequate, as the parallel operation is going to be performing some form of aggregation. Simple examples of this might include taking the sum of the results of processing a function on each element in the collection, or finding the minimum of the collection given some criteria. This can be done using the techniques described in simple data parallelism, however, special care needs to be taken into account to synchronize the shared data appropriately. The Task Parallel Library has tools to assist in this synchronization. The main issue with aggregation when parallelizing a routine is that you need to handle synchronization of data. Since multiple threads will need to write to a shared portion of data. Suppose, for example, that we wanted to parallelize a simple loop that looked for the minimum value within a dataset: double min = double.MaxValue; foreach(var item in collection) { double value = item.PerformComputation(); min = System.Math.Min(min, value); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This seems like a good candidate for parallelization, but there is a problem here. If we just wrap this into a call to Parallel.ForEach, we’ll introduce a critical race condition, and get the wrong answer. Let’s look at what happens here: // Buggy code! Do not use! double min = double.MaxValue; Parallel.ForEach(collection, item => { double value = item.PerformComputation(); min = System.Math.Min(min, value); }); This code has a fatal flaw: min will be checked, then set, by multiple threads simultaneously. Two threads may perform the check at the same time, and set the wrong value for min. Say we get a value of 1 in thread 1, and a value of 2 in thread 2, and these two elements are the first two to run. If both hit the min check line at the same time, both will determine that min should change, to 1 and 2 respectively. If element 1 happens to set the variable first, then element 2 sets the min variable, we’ll detect a min value of 2 instead of 1. This can lead to wrong answers. Unfortunately, fixing this, with the Parallel.ForEach call we’re using, would require adding locking. We would need to rewrite this like: // Safe, but slow double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach(collection, item => { double value = item.PerformComputation(); lock(syncObject) min = System.Math.Min(min, value); }); This will potentially add a huge amount of overhead to our calculation. Since we can potentially block while waiting on the lock for every single iteration, we will most likely slow this down to where it is actually quite a bit slower than our serial implementation. The problem is the lock statement – any time you use lock(object), you’re almost assuring reduced performance in a parallel situation. This leads to two observations I’ll make: When parallelizing a routine, try to avoid locks. That being said: Always add any and all required synchronization to avoid race conditions. These two observations tend to be opposing forces – we often need to synchronize our algorithms, but we also want to avoid the synchronization when possible. Looking at our routine, there is no way to directly avoid this lock, since each element is potentially being run on a separate thread, and this lock is necessary in order for our routine to function correctly every time. However, this isn’t the only way to design this routine to implement this algorithm. Realize that, although our collection may have thousands or even millions of elements, we have a limited number of Processing Elements (PE). Processing Element is the standard term for a hardware element which can process and execute instructions. This typically is a core in your processor, but many modern systems have multiple hardware execution threads per core. The Task Parallel Library will not execute the work for each item in the collection as a separate work item. Instead, when Parallel.ForEach executes, it will partition the collection into larger “chunks” which get processed on different threads via the ThreadPool. This helps reduce the threading overhead, and help the overall speed. In general, the Parallel class will only use one thread per PE in the system. Given the fact that there are typically fewer threads than work items, we can rethink our algorithm design. We can parallelize our algorithm more effectively by approaching it differently. Because the basic aggregation we are doing here (Min) is communitive, we do not need to perform this in a given order. We knew this to be true already – otherwise, we wouldn’t have been able to parallelize this routine in the first place. With this in mind, we can treat each thread’s work independently, allowing each thread to serially process many elements with no locking, then, after all the threads are complete, “merge” together the results. This can be accomplished via a different set of overloads in the Parallel class: Parallel.ForEach<TSource,TLocal>. The idea behind these overloads is to allow each thread to begin by initializing some local state (TLocal). The thread will then process an entire set of items in the source collection, providing that state to the delegate which processes an individual item. Finally, at the end, a separate delegate is run which allows you to handle merging that local state into your final results. To rewriting our routine using Parallel.ForEach<TSource,TLocal>, we need to provide three delegates instead of one. The most basic version of this function is declared as: public static ParallelLoopResult ForEach<TSource, TLocal>( IEnumerable<TSource> source, Func<TLocal> localInit, Func<TSource, ParallelLoopState, TLocal, TLocal> body, Action<TLocal> localFinally ) The first delegate (the localInit argument) is defined as Func<TLocal>. This delegate initializes our local state. It should return some object we can use to track the results of a single thread’s operations. The second delegate (the body argument) is where our main processing occurs, although now, instead of being an Action<T>, we actually provide a Func<TSource, ParallelLoopState, TLocal, TLocal> delegate. This delegate will receive three arguments: our original element from the collection (TSource), a ParallelLoopState which we can use for early termination, and the instance of our local state we created (TLocal). It should do whatever processing you wish to occur per element, then return the value of the local state after processing is completed. The third delegate (the localFinally argument) is defined as Action<TLocal>. This delegate is passed our local state after it’s been processed by all of the elements this thread will handle. This is where you can merge your final results together. This may require synchronization, but now, instead of synchronizing once per element (potentially millions of times), you’ll only have to synchronize once per thread, which is an ideal situation. Now that I’ve explained how this works, lets look at the code: // Safe, and fast! double min = double.MaxValue; // Make a "lock" object object syncObject = new object(); Parallel.ForEach( collection, // First, we provide a local state initialization delegate. () => double.MaxValue, // Next, we supply the body, which takes the original item, loop state, // and local state, and returns a new local state (item, loopState, localState) => { double value = item.PerformComputation(); return System.Math.Min(localState, value); }, // Finally, we provide an Action<TLocal>, to "merge" results together localState => { // This requires locking, but it's only once per used thread lock(syncObj) min = System.Math.Min(min, localState); } ); Although this is a bit more complicated than the previous version, it is now both thread-safe, and has minimal locking. This same approach can be used by Parallel.For, although now, it’s Parallel.For<TLocal>. When working with Parallel.For<TLocal>, you use the same triplet of delegates, with the same purpose and results. Also, many times, you can completely avoid locking by using a method of the Interlocked class to perform the final aggregation in an atomic operation. The MSDN example demonstrating this same technique using Parallel.For uses the Interlocked class instead of a lock, since they are doing a sum operation on a long variable, which is possible via Interlocked.Add. By taking advantage of local state, we can use the Parallel class methods to parallelize algorithms such as aggregation, which, at first, may seem like poor candidates for parallelization. Doing so requires careful consideration, and often requires a slight redesign of the algorithm, but the performance gains can be significant if handled in a way to avoid excessive synchronization.

Read the article
Using DB_PARAMS to Tune the EP_LOAD_SALES Performance

- by user702295

The DB_PARAMS table can be used to tune the EP_LOAD_SALES performance. The AWR report supplied shows 16 CPUs so I imaging that you can run with 8 or more parallel threads. This can be done by setting the following DB_PARAMS parameters. Note that most of parameter changes are just changing a 2 or 4 into an 8: DBHintEp_Load_SalesUseParallel = TRUE DBHintEp_Load_SalesUseParallelDML = TRUE DBHintEp_Load_SalesInsertErr = + parallel(@T_SRC_SALES@ 8) full(@T_SRC_SALES@) DBHintEp_Load_SalesInsertLd = + parallel(@T_SRC_SALES@ 8) DBHintEp_Load_SalesMergeSALES_DATA = + parallel(@T_SRC_SALES_LD@ 8) full(@T_SRC_SALES_LD@) DBHintMdp_AddUpdateIs_Fictive0SD = + parallel(s 8 ) DBHintMdp_AddUpdateIs_Fictive2SD = + parallel(s 8 )

Read the article
Partition Table and Exadata Hybrid Columnar Compression (EHCC)

- by Bandari Huang

Create EHCC table CREATE TABLE ... COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH]; select owner,table_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ‘ENABLED'; Convert Table/Partition/Subpartition to EHCC Compress Table&Partition&Subpartition to EHCC: ALTER TABLE table_name MOVE COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH] [PARALLEL <dop>]; ALTER TABLE table_name MOVE PARATITION partition_name COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH] [PARALLEL <dop>]; ALTER TABLE table_name MOVE SUBPARATITION subpartition_name COMPRESS FOR [QUERY LOW|QUERY HIGH|ARCHIVE LOW|ARCHIVE HIGH] [PARALLEL <dop>]; select owner,table_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ‘ENABLED'; select table_owner,table_name,partition_name,compress_for DBA_TAB_PARTITIONS where compression = ‘ENABLED’; select table_owner,table_name,subpartition_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ‘ENABLED’; Rebuild Unusable Index: select index_name from dba_index where status = 'UNUSABLE'; select index_name,partition_name from dba_ind_partition where status = 'UNUSABLE'; select index_name,subpartition_name from dba_ind_partition where status = 'UNUSABLE'; ALTER INDEX index_name REBUILD [PARALLEL <dop>]; ALTER INDEX index_name REBUILD PARTITION partition_name [PARALLEL <dop>]; ALTER INDEX index_name REBUILD SUBPARTITION subpartition_name [PARALLEL <dop>]; Convert Table/Partition/Subpartition from EHCC to OLTP compression or uncompressed format: Uncompress EHCC Table&Partition&Subpartition: ALTER TABLE table_name MOVE [NOCOMPRESS|COMPRESS for OLTP] [PARALLEL <dop>]; ALTER TABLE table_name MOVE PARTITION partition_name [NOCOMPRESS|COMPRESS for OLTP] [PARALLEL <dop>]; ALTER TABLE table_name MOVE SUBPARTITION subpartition_name [NOCOMPRESS|COMPRESS for OLTP] [PARALLEL <dop>]; select owner,table_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ''; select table_owner,table_name,partition_name,compress_for DBA_TAB_PARTITIONS where compression = ''; select table_owner,table_name,subpartition_name,compress_for DBA_TAB_SUBPARTITIONS where compression = ''; Rebuild Unusable Index: select index_name from dba_index where status = 'UNUSABLE'; select index_name,partition_name from dba_ind_partition where status = 'UNUSABLE'; select index_name,subpartition_name from dba_ind_partition where status = 'UNUSABLE'; ALTER INDEX index_name REBUILD [PARALLEL <dop>]; ALTER INDEX index_name REBUILD PARTITION partition_name [PARALLEL <dop>]; ALTER INDEX index_name REBUILD SUBPARTITION subpartition_name [PARALLEL <dop>];

Read the article
This Week's Multicore Reading List

Parallel programming tips and texts Parallel computing - Programming - Languages - HPC - Multi-core processor

Read the article
Parallelism Patterns in Seismic Duck: Part 1

Applying parallel patterns to real programs Parallel computing - Programming - High-performance computing - Microsoft Visual Studio - OpenMP

Read the article
No LPT port in Windows 7 virtual machines

- by KeyboardMonkey

Windows 7 has MS virtual PC integrated, the VM settings don't give a parallel LPT port mapping to the physical machine. Where did it go? Has anyone else noticed this, and found a solution? Update: After much digging, I found the one and only reference to this issue, on the VPC Blog: "Parallel port devices are not supported, as they are relatively rare today." -More details- It's a XP VM I've been using since VPC 2007 days, which did have this functionality. This is to configure barcode printers via the LPT port. Since the (new) MS VM can't map to my physical LPT port, I'm having a hard time configuring printers. My physical ports are enabled in the BIOS. It has worked the past 3 years, before switching to Win 7. Any help is appreciated. This screen shot of the VM settings shows COM ports, but LPT is no more In contrast, here is a screen shot of VPC 2007 (before it got integrated into Win 7). Notice how it has LPT support

Read the article
Parallel Class/Interface Hierarchy with the Facade Design Pattern?

- by Mike G

About a third of my code is wrapped inside a Facade class. Note that this isn't a "God" class, but actually represents a single thing (called a Line). Naturally, it delegates responsibilities to the subsystem behind it. What ends up happening is that two of the subsystem classes (Output and Timeline) have all of their methods duplicated in the Line class, which effectively makes Line both an Output and a Timeline. It seems to make sense to make Output and Timeline interfaces, so that the Line class can implement them both. At the same time, I'm worried about creating parallel class and interface structures. You see, there are different types of lines AudioLine, VideoLine, which all use the same type of Timeline, but different types of Output (AudioOutput and VideoOutput, respectively). So that would mean that I'd have to create an AudioOutputInterface and VideoOutputInterface as well. So not only would I have to have parallel class hierarchy, but there would be a parallel interface hierarchy as well. Is there any solution to this design flaw? Here's an image of the basic structure (minus the Timeline class, though know that each Line has-a Timeline): NOTE: I just realized that the word 'line' in Timeline might make is sound like is does a similar function as the Line class. They don't, just to clarify.

Read the article
What are the memory-management capabilities of MySQL + JDBC (in light of autonomic computing)?

- by Adel

I'm interested in implementing some kind of autonomic-computing functionality using MySQL. By autonomic-computing I mean roughly some failsafe abilities, whereby the application appears to be at least slightly "intelligent" For reference, the main parts of autonomic computing we'd like are the "self-configuring" and "self-healing" features (the other two - "self-optimizing" and "self-protecting", are too abstract/futuristic for us, at this time). Sofor example, if we have a sample Java application that utilizes a MySQL database, we might want to automatically restart the MySQL database if we take up too much memory. Or maybe we want to have the ability to dynamiccally adjust the database memory as needed. So for example, when we start the application the database begins with a 56 Megabyte buffer; but then as we insert so many rows we want to have it automatically jump up to 512 MB, then to 1024, until a max of 4096 MB. Does all of the above suggest that MySQL is too "weak" for the task? Do you suggest using Oracle database? My professor believes that by using Java we can basically make up for any memory-management deficiencies that MySQL has in relation to Oracle DB. I'm new to MySQL , but have experience with Oracle. If all of the above sounds wishy-washy, it is because I'm still fleshing it out. thanks

Read the article
C++/g++: Concurrent programm

- by phimuemue

Hi, I got a C++ program (source) that is said to work in parallel. However, if I compile it (I am using Ubuntu 10.04 and g++ 4.4.3) with g++ and run it, one of my two CPU cores gets full load while the other is doing "nothing". So I spoke to the one who gave me the program. I was told that I had to set specific flags for g++ in order to get the program compiled for 2 CPU cores. However, if I look at the code I'm not able to find any lines that point to parallelism. So I have two questions: Are there any C++-intrinsics for multithreaded applications, i.e. is it possible to write parallel code without any extra libraries (because I did not find any non-standard libraries included)? Is it true that there are indeed flags for g++ that tell the compiler to compile the program for 2 CPU cores and to compile it so it runs in parallel (and if: what are they)?

Read the article
Russian-to-English Parallel Word Corpus?

- by Cygorger

Hi: I am looking for a simple Russian to English word corpus. It can be as simple as a csv that lists a russian word in the first column and the equivalent English word in the second. Any ideas where I can find such a thing? Does the NLTK toolkit have something like this? Thanks

Read the article
Computing MD5SUM of large files in C#

- by spkhaira

I am using following code to compute MD5SUM of a file - byte[] b = System.IO.File.ReadAllBytes(file); string sum = BitConverter.ToString(new MD5CryptoServiceProvider().ComputeHash(b)); This works fine normally, but if I encounter a large file (~1GB) - e.g. an iso image or a DVD VOB file - I get an Out of Memory exception. Though, I am able to compute the MD5SUM in cygwin for the same file in about 10secs. Please suggest how can I get this to work for big files in my program. Thanks

Read the article
Multithreading/Parallel Processing in PHP

- by manyxcxi

I have a PHP script that will generate a report using PHPExcel from data queried from a MySQL DB. Currently, it is linear in processing in that it gets the data back from MySQL, reads in the Excel template, writes the data to the template, then outputs it. I have optimized the code to the point that the data is only iterated over once, and there is very little processing done on the PHP side. The query returns hundreds of lines in less than .001 seconds, so it is running fast enough. After some timing I have found my bottlenecks to be (surprise, surprise) reading the template and writing the output. I would like to do this: Spawn a thread/process to read the template Spawn a thread/process to fetch the data Return back to parent thread - Parent thread will wait until both are complete Proceed on as normal My main questions are is this possible, is it worth it? If yes to both, how would you tackle it? Also, it is PHP 5 on CentOS

Read the article
Running multiple image manipulations in parallel causing OutOfMemory exception

- by Tom

I am working on a site where I need to be able to split and image around 4000x6000 into 4 parts (amongst many other tasks) and I need this to be as quick as possible for multiple users. My current code for doing this is var bitmaps = new RenderTargetBitmap[elements.Length]; using (var stream = blobService.Stream(key)) { BitmapImage bi = new BitmapImage(); bi.BeginInit(); bi.StreamSource = stream; bi.EndInit(); for (var i = 0; i < elements.Length; i++) { var element = elements[i]; TransformGroup transformGroup = new TransformGroup(); TranslateTransform translateTransform = new TranslateTransform(); translateTransform.X = -element.Left; translateTransform.Y = -element.Top; transformGroup.Children.Add(translateTransform); DrawingVisual vis = new DrawingVisual(); DrawingContext cont = vis.RenderOpen(); cont.PushTransform(transformGroup); cont.DrawImage(bi, new Rect(new Size(bi.PixelWidth, bi.PixelHeight))); cont.Close(); RenderTargetBitmap rtb = new RenderTargetBitmap(element.Width, element.Height, 96d, 96d, PixelFormats.Default); rtb.Render(vis); bitmaps[i] = rtb; } } for (var i = 0; i < bitmaps.Length; i++) { using (MemoryStream ms = new MemoryStream()) { PngBitmapEncoder encoder = new PngBitmapEncoder(); encoder.Frames.Add(BitmapFrame.Create(bitmaps[i])); encoder.Save(ms); var regionKey = WebPath.Variant(key, elements[i].Id); saveBlobService.Save("image/png", regionKey, ms); } } I am running multiple threads which take jobs off a queue. I am finding that if this part of code is hit by 4 threads at once I get an OutOfMemory exception. I can stop this happening by wrapping all the code above in a lock(obj) but this isn't ideal. I have tried wrapping just the first using block (where the file is read from disk and split) but I still get the out of memory exceptions (this part of the code executes quite quickly). I this normal considering the amount of memory this should be taking up? Are there any optimisations I could make? Can I increase the memory available? UPDATE: My new code as per Moozhe's help public static void GenerateRegions(this IBlobService blobService, string key, Element[] elements) { using (var stream = blobService.Stream(key)) { foreach (var element in elements) { stream.Position = 0; BitmapImage bi = new BitmapImage(); bi.BeginInit(); bi.SourceRect = new Int32Rect(element.Left, element.Top, element.Width, element.Height); bi.StreamSource = stream; bi.EndInit(); DrawingVisual vis = new DrawingVisual(); DrawingContext cont = vis.RenderOpen(); cont.DrawImage(bi, new Rect(new Size(element.Width, element.Height))); cont.Close(); RenderTargetBitmap rtb = new RenderTargetBitmap(element.Width, element.Height, 96d, 96d, PixelFormats.Default); rtb.Render(vis); using (MemoryStream ms = new MemoryStream()) { PngBitmapEncoder encoder = new PngBitmapEncoder(); encoder.Frames.Add(BitmapFrame.Create(rtb)); encoder.Save(ms); var regionKey = WebPath.Variant(key, element.Id); blobService.Save("image/png", regionKey, ms); } } } }

Read the article
running a python script where dependencies are not avail: distributed computing

- by sadhu_

Hi, I have access to a grid (running condor) that would (potentially) allow to very substantially reduce how long by nltk based nlp tasks take. unfortunately, i dont have root access on the cluster so cannot install new packages, only run whatever is available on the linux boxes. python is of course available, but nltk isnt - i was wondering however, if there might be a way around this somehow ? is there a way i can somehow still distribute the task in a self-contained 'package' of some sort? Thanks for your hel

Read the article
How to implement reject in parallel approval workflow?

- by Dmitry Martynov

I develop a SharePoint workflow with a Replicator activity to replicate a custom activity for every approver. The custom activity implements an approval branch for a particular user. It has classic form with CreateTask, While, OnTaskChanged and CompleteTask activities. I setup UntilCondition on the replicator to cancel execution after one approver chooses to reject the approval and then workflow finishes. The problem happens with other uncompleted tasks which "hang" in their current state. User does not see this state when open the task. I put UpdateAllTasks after the replacator to set the task status to Cancelled. But since there is no event activities between CompleteTask (for the rejected task) and UpdateAllTasks, the UpdateAllTask activity set Cancelled for the rejected task also. The question, what can I do to flush the pending change made by CompleteTask before UpdateAllTasks? Or perhaps, there is another way to implement such workflow. I was thinking about the way to implement Cancel handler for the custom activity with UpdateTask. But I do not know how to implement it and tell to the cancel handler that it executes in the case of the rejection.

Read the article
Computing orientation of a square and displaying an object with the same orientation

- by Robin

Hi, I wrote an application which detects a square within an image. To give you a good understanding of how such an image containing such a square, in this case a marker, could look like: What I get, after the detection, are the coordinates of the four corners of my marker. Now I don't know how to display an object on my marker. The object should have the same rotation/angle/direction as the marker. Are there any papers on how to achieve that, any algorithms that I can use that proofed to be pretty solid/working? It doesn't need to be a working solution, it could be a simple description on how to achieve that or something similar. If you point me at a library or something, it should work under linux, windows is not needed but would be great in case I need to port the application at some point. I already looked at the ARToolkit but they you camera parameter files and more complex matrices while I only got the four corner points and a single image instead of a whole video / camera stream.

Read the article
Quartz Thread Execution Parallel or Sequential?

- by vikas

We have a quartz based scheduler application which runs about 1000 jobs per minute which are evenly distributed across seconds of each minute i.e. about 16-17 jobs per second. Ideally, these 16-17 jobs should fire at same time, however our first statement, which simply logs the time of execution, of execute method of the job is being called very late. e.g. let us assume we have 1000 jobs scheduled per minute from 05:00 to 05:04. So, ideally the job which is scheduled at 05:03:50 should have logged the first statement of the execute method at 05:03:50, however, it is doing it at about 05:06:38. I have tracked down the time taken by the scheduled job which comes around 15-20 milliseconds. This scheduled job is fast enough because we just send a message on an ActiveMQ queue. We have specified the number of threads of quartz to be 100 and even tried with increasing it to 200 and more, but no gain. One more thing we noticed is that logs from scheduler are coming sequential after first 1 minute i.e. [Quartz_Worker_28] <Some log statement> .. .. [Quartz_Worker_29] <Some log statement> .. .. [Quartz_Worker_30] <Some log statement> .. .. So it suggesting that after some time quartz is running threads almost sequential. May be this is happening due to the time taken in notifying the job completion to persistence store (which is a separate postgres database in this case) and/or context switching. What can be the reason behind this strange behavior? EDIT: More detailed Log [06/07/12 10:08:37:192][QuartzScheduler_Worker-34][INFO] org.quartz.plugins.history.LoggingTriggerHistoryPlugin - Trigger [<trigger_name>] fired job [<job_name>] scheduled at: 06-07-2012 10:08:33.458, next scheduled at: 06-07-2012 10:34:53.000 [06/07/12 10:08:37:192][QuartzScheduler_Worker-34][INFO] <my_package>.scheduler.quartz.ScheduledLocateJob - execute begin--------- ScheduledLocateJob with key: <job_name> started at Fri Jul 06 10:08:37 EDT 2012 [06/07/12 10:08:37:192][QuartzScheduler_Worker-34][INFO] <my_package>.scheduler.quartz.ScheduledLocateJob <some log statement> [06/07/12 10:08:37:192][QuartzScheduler_Worker-34][INFO] <my_package>.scheduler.quartz.ScheduledLocateJob <some log statement> [06/07/12 10:08:37:192][QuartzScheduler_Worker-34][INFO] <my_package>.scheduler.quartz.ScheduledLocateJob <some log statement> [06/07/12 10:08:37:220][QuartzScheduler_Worker-34][INFO] <my_package>.scheduler.quartz.ScheduledLocateJob - execute end--------- ScheduledLocateJob with key: <job_name> ended at Fri Jul 06 10:08:37 EDT 2012 [06/07/12 10:08:37:220][QuartzScheduler_Worker-34][INFO] org.quartz.plugins.history.LoggingTriggerHistoryPlugin - Trigger [<trigger_name>] completed firing job [<job_name>] with resulting trigger instruction code: DO NOTHING. Next scheduled at: 06-07-2012 10:34:53.000 I am doubting on this section of the above log scheduled at: 06-07-2012 10:08:33.458, next scheduled at: 06-07-2012 10:34:53.000 because this job was scheduled for 10:04:53, but it fired at 10:08:33 and still quartz didn't consider it as misfire. Shouldn't it be a misfire?

Read the article
computing "node closure" of graph with removal

- by Fakrudeen

Given a directed graph, the goal is to combine the node with the nodes it is pointing to and come up with minimum number of these [lets give the name] super nodes. The catch is once you combine the nodes you can't use those nodes again. [first node as well as all the combined nodes - that is all the members of one super node] The greedy approach would be to pick the node with maximum out degree and combine that node with nodes it is pointing to and remove all of them. Do this every time with the nodes which are not removed yet from graph. The greedy is O(V), but this won't necessarily output minimum number super nodes. So what is the best algorithm to do this?

Read the article
QProcess, QEventLoop - of any use for parallel-processing

- by dlib

I wonder whether I could use QEventLoop (QProcess?) to parallelize multiple calls to same function with Qt. What is precisely the difference with QtConcurrent or QThread? What is a process and an event loop more precisely? I read that QCoreApplication must exec() as early as possible in main() method, so that I wonder why it is different from main Thread. could you point as some efficient reference to processes and thread with Qt? I came through the official doc and those things remain unclear. Thanks and regards.

Read the article
Recommended book about parallel programming - theory & best practice?

- by Klesk

I'm looking for a book or books about multicore, multithreaded programming. The perfect book should focus on best practices and maybe include a bit of theory background. I'm not interested in a book which only describes a single library and focuses on its API. OS actually doesn't matter.

Read the article

Search Results

Search found 3521 results on 141 pages for 'parallel computing'.

Page 27/141 | < Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >

- by MainMa

- by Reed

- by Reed

- by Reed

- by user702295

- by Bandari Huang

- by KeyboardMonkey

- by Mike G

- by Adel

- by phimuemue

- by Cygorger

- by spkhaira

- by manyxcxi

- by Tom

- by sadhu_

- by Dmitry Martynov

- by Robin

- by vikas

- by Fakrudeen

- by dlib

- by Klesk

< Previous Page | 23 24 25 26 27 28 29 30 31 32 33 34 | Next Page >