embarrassingly parallel - Page 5

Perl Parallel::ForkManager wait_all_children() takes excessively long time

- by zhang18

I have a script that uses Parallel::ForkManager. However, the wait_all_children() process takes incredibly long time even after all child-processes are completed. The way I know is by printing out some timestamps (see below). Does anyone have any idea what might be causing this (I have 16 CPU cores on my machine)? my $pm = Parallel::ForkManager->new(16) for my $i (1..16) { $pm->start($i) and next; ... do something within the child-process ... print (scalar localtime), " Process $i completed.\n"; $pm->finish(); } print (scalar localtime), " Waiting for some child process to finish.\n"; $pm->wait_all_children(); print (scalar localtime), " All processes finished.\n"; Clearly, I'll get the Waiting for some child process to finish message first, with a timestamp of, say, 7:08:35. Then I'll get a list of Process i completed messages, with the last one at 7:10:30. However, I do not receive the message All Processes finished until 7:16:33(!). Why is that 6-minute delay between 7:10:30 and 7:16:33? Thx!

Read the article

Parallel version of loop not faster than serial version

- by Il-Bhima

I'm writing a program in C++ to perform a simulation of particular system. For each timestep, the biggest part of the execution is taking up by a single loop. Fortunately this is embarassingly parallel, so I decided to use Boost Threads to parallelize it (I'm running on a 2 core machine). I would expect at speedup close to 2 times the serial version, since there is no locking. However I am finding that there is no speedup at all. I implemented the parallel version of the loop as follows: Wake up the two threads (they are blocked on a barrier). Each thread then performs the following: Atomically fetch and increment a global counter. Retrieve the particle with that index. Perform the computation on that particle, storing the result in a separate array Wait on a job finished barrier The main thread waits on the job finished barrier. I used this approach since it should provide good load balancing (since each computation may take differing amounts of time). I am really curious as to what could possibly cause this slowdown. I always read that atomic variables are fast, but now I'm starting to wonder whether they have their performance costs. If anybody has some ideas what to look for or any hints I would really appreciate it. I've been bashing my head on it for a week, and profiling has not revealed much.

Read the article

Parallel.For Batching

- by chibacity

Is there built-in support in the TPL for batching operations? I was recently playing with a routine to carry out character replacement on a character array which required a lookup table i.e. transliteration: for (int i = 0; i < chars.Length; i++) { char replaceChar; if (lookup.TryGetValue(chars[i], out replaceChar)) { chars[i] = replaceChar; } } I could see that this could be trivially parallelized, so jumped in with a first stab which I knew would perform worse as the tasks were too fine-grained: Parallel.For(0, chars.Length, i => { char replaceChar; if (lookup.TryGetValue(chars[i], out replaceChar)) { chars[i] = replaceChar; } }); I then reworked the algorithm to use batching so that the work could be chunked onto different threads in less fine-grained batches. This made use of threads as expected and I got some near linear speed up. I'm sure that there must be built-in support for batching in the TPL. What is the syntax, and how do I use it? const int CharBatch = 100; int charLen = chars.Length; Parallel.For(0, ((charLen / CharBatch) + 1), i => { int batchUpper = ((i + 1) * CharBatch); for (int j = i * CharBatch; j < batchUpper && j < charLen; j++) { char replaceChar; if (lookup.TryGetValue(chars[j], out replaceChar)) { chars[j] = replaceChar; } } });

Read the article

Build OpenGL model in parallel?

- by Brendan Long

I have a program which draws some terrain and simulates water flowing over it (in a cheap and easy way). Updating the water was easy to parallelize using OpenMP, so I can do ~50 updates per second. The problem is that even with a small amounts of water, my draws per second are very very low (starts at 5 and drops to around 2 once there's a significant amount of water). It's not a problem with the video card because the terrain is more complicated and gets drawn so quickly that boost::timer tells me that I get infinity draws per second if I turn the water off. It may be related to memory bandwidth though (since I assume the model stays on the card and doesn't have to be transfered every time). What I'm concerned about is that on every draw, I'm calling glVertex3f() about a million times (max size is 450*600, 4 vertices each), and it's done entirely sequentially because Glut won't let me call anything in parallel. So.. is if there's some way of building the list in parallel and then passing it to OpenGL all at once? Or some other way of making it draw this faster? Am I using the wrong method (besides the obvious "use less vertices")?

Read the article

Parallel processing in R 2.11 Windows 64-bit using SNOW not quite working

- by Abhijit

I'm running R 2.11 64-bit on a WinXP64 machine with 8 processors. With R 2.10.1 the following code spawned 6 R processes for parallel processing: require(foreach) require(doSNOW) cl = makeCluster(6, type='SOCK') registerDoSNOW(cl) bl2 = foreach(i=icount(length(unqmrno))) %dopar% { (Some code here) } stopCluster(cl) When I run the same code in R 2.11 Win64, the 6 R processes are not spawning, and the code hangs. I'm wondering if this is a problem with the port of SNOW to 2.11-64bit, or if any additional code is required on my part. Thanks

Read the article

running a parallel port controlling program through php.

- by prateek

I have a program that is interacting with hardware via parallel port programming. i had compiled it and using its object file to interact with the hardware (a simple led). when i execute it directly on the shell it serves the purpose of glowing the LED but when i execute it using shell_exec() in php the command is executed but unable to interact with the hardware. i am totally confused.. .

Read the article

Parallel computing for integrals

- by Iman

I want to reduce the calculation time for a time-consuming integral by splitting the integration range. I'm using C++, Windows, and a quad-core Intel i7 CPU. How can I split it into 4 parallel computations?

Read the article

Parallel.Invoke the same method for a list of objects

- by iulianchira

I have a class MyClass with a method MyMethod. For every MyClass instance in a list of MyClass instances i want to invoke MyMethod and have them run in a separate thread. I am using .NET 4.0 and the Parallel extensions.

Read the article

Implementing parallel coordinates for multi dimensional data in Java

- by Algorist

Hi, I am planning to implement parallel coordinates in java using JOGL library. Do you have any other suggestions? Thank you Bala

Read the article

Parallel programming in C#

- by Alxandr

I'm interested in learning about parallel programming in C#.NET (not like everything there is to know, but the basics and maybe some good-practices), therefore I've decided to reprogram an old program of mine which is called ImageSyncer. ImageSyncer is a really simple program, all it does is to scan trough a folder and find all files ending with .jpg, then it calculates the new position of the files based on the date they were taken (parsing of xif-data, or whatever it's called). After a location has been generated the program checks for any existing files at that location, and if one exist it looks at the last write-time of both the file to copy, and the file "in its way". If those are equal the file is skipped. If not a md5 checksum of both files is created and matched. If there is no match the file to be copied is given a new location to be copied to (for instance, if it was to be copied to "C:\test.jpg" it's copied to "C:\test(1).jpg" instead). The result of this operation is populated into a queue of a struct-type that contains two strings, the original file and the position to copy it to. Then that queue is iterated over untill it is empty and the files are copied. In other words there are 4 operations: 1. Scan directory for jpegs 2. Parse files for xif and generate copy-location 3. Check for file existence and if needed generate new path 4. Copy files And so I want to rewrite this program to make it paralell and be able to perform several of the operations at the same time, and I was wondering what the best way to achieve that would be. I've came up with two different models I can think of, but neither one of them might be any good at all. The first one is to parallelize the 4 steps of the old program, so that when step one is to be executed it's done on several threads, and when the entire of step 1 is finished step 2 is began. The other one (which I find more interesting because I have no idea of how to do that) is to create a sort of worker and consumer model, so when a thread is finished with step 1 another one takes over and performs step 2 at that object (or something like that). But as said, I don't know if any of these are any good solutions. Also, I don't know much about parallel programming at all. I know how to make a thread, and how to make it perform a function taking in an object as its only parameter, and I've also used the BackgroundWorker-class on one occasion, but I'm not that familiar with any of them. Any input would be appreciated.

Read the article

JRuby-friendly method for parallel-testing Rails app

- by Toby Hede

I am looking for a system to parallelise a large suite of tests in a Ruby on Rails app (using rspec, cucumber) that works using JRuby. Cucumber is actually not too bad, but the full rSpec suite currently takes nearly 20 minutes to run. The systems I can find (hydra, parallel-test) look like they use forking, which isn't the ideal solution for the JRuby environment.

Read the article

Parallel Web Service Calls in Java

- by java_pill

Is there anyway to make web service calls from a Java client app. (Apache Axis based) in parallel? Does async style Web Service client (Apache Axis based) help?

Read the article

JRuby-friendly method for parallel-testing Ruby on Rails app

- by Toby Hede

I am looking for a system to parallelise my tests in a Ruby on Rails app (using rspec, cucumber) that works using JRuby. The systems I can find (hydra, parallel-test) look like they use forking, which is problematic in a JRuby environment.

Read the article

Unified Parallel C - examples and list of extensions

- by osgx

Hello Where can I find examples of code, written in "Unified Parallel C"? I also interested in normative documents about this language (standards, reference manuals, online-accessible books and courses). What extensions were added to C to get UPC? Is this dialect alive or dead?

Read the article

Send raw data to USB parallel port after upgrading to 11.10

- by zaphod

I have a laser cutter connected via a generic USB to parallel adapter. The laser cutter speaks HPGL, as it happens, but since this is a laser cutter and not a plotter, I usually want to generate the HPGL myself, since I care about the ordering, speed, and direction of cuts and so on. In previous versions of Ubuntu, I was able to print to the cutter by copying an HPGL file directly to the corresponding USB "lp" device. For example: cp foo.plt /dev/usblp1 Well, I just upgraded to Ubuntu 11.10 oneiric, and I can't find any "lp" devices in /dev anymore. D'oh! What's the preferred way to send raw data to a parallel port in Ubuntu? I've tried System Settings Printing + Add, hoping that I might be able to associate my device with some kind of "raw printer" driver and print to it with a command like lp -d LaserCutter foo.plt But my USB to parallel adapter doesn't seem to show up in the list. What I do see are my HP Color LaserJet, two USB-to-serial adapters, "Enter URI", and "Network Printer". Meanwhile, over in /dev, I do see /dev/ttyUSB0 and /dev/ttyUSB1 devices for the 2 USB-to-serial adapters. I don't see anything obvious corresponding to the HP printer (which was /dev/usblp0 prior to the upgrade), except for generic USB stuff. For example, sudo find /dev | grep lp produces no output. I do seem to be able to print to the HP printer just fine, though. The printer setup GUI gives it a device URI starting with "hp:" which isn't much help for the parallel adapter. The CUPS administrator's guide makes it sound like I might need to feed it a device URI of the form parallel:/dev/SOMETHING, but of course if I had a /dev/SOMETHING I'd probably just go on writing to it directly. Here's what dmesg says after I disconnect and reconnect the device from the USB port: [ 924.722906] usb 1-1.1.4: USB disconnect, device number 7 [ 959.993002] usb 1-1.1.4: new full speed USB device number 8 using ehci_hcd And here's how it shows up in lsusb -v: Bus 001 Device 008: ID 1a86:7584 QinHeng Electronics CH340S Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x1a86 QinHeng Electronics idProduct 0x7584 CH340S bcdDevice 2.52 iManufacturer 0 iProduct 2 USB2.0-Print iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 32 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 96mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 7 Printer bInterfaceSubClass 1 Printer bInterfaceProtocol 2 Bidirectional iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0020 1x 32 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0020 1x 32 bytes bInterval 0 Device Status: 0x0000 (Bus Powered)

Read the article

Send raw data to USB parallel port after upgrading to 11.10 oneiric

- by zaphod

I have a laser cutter connected via a generic USB to parallel adapter. The laser cutter speaks HPGL, as it happens, but since this is a laser cutter and not a plotter, I usually want to generate the HPGL myself, since I care about the ordering, speed, and direction of cuts and so on. In previous versions of Ubuntu, I was able to print to the cutter by copying an HPGL file directly to the corresponding USB "lp" device. For example: cp foo.plt /dev/usblp1 Well, I just upgraded to Ubuntu 11.10 oneiric, and I can't find any "lp" devices in /dev anymore. D'oh! What's the preferred way to send raw data to a parallel port in Ubuntu? I've tried System Settings Printing + Add, hoping that I might be able to associate my device with some kind of "raw printer" driver and print to it with a command like lp -d LaserCutter foo.plt But my USB to parallel adapter doesn't seem to show up in the list. What I do see are my HP Color LaserJet, two USB-to-serial adapters, "Enter URI", and "Network Printer". Meanwhile, over in /dev, I do see /dev/ttyUSB0 and /dev/ttyUSB1 devices for the 2 USB-to-serial adapters. I don't see anything obvious corresponding to the HP printer (which was /dev/usblp0 prior to the upgrade), except for generic USB stuff. For example, sudo find /dev | grep lp produces no output. I do seem to be able to print to the HP printer just fine, though. The printer setup GUI gives it a device URI starting with "hp:" which isn't much help for the parallel adapter. The CUPS administrator's guide makes it sound like I might need to feed it a device URI of the form parallel:/dev/SOMETHING, but of course if I had a /dev/SOMETHING I'd probably just go on writing to it directly. Here's what dmesg says after I disconnect and reconnect the device from the USB port: [ 924.722906] usb 1-1.1.4: USB disconnect, device number 7 [ 959.993002] usb 1-1.1.4: new full speed USB device number 8 using ehci_hcd And here's how it shows up in lsusb -v: Bus 001 Device 008: ID 1a86:7584 QinHeng Electronics CH340S Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x1a86 QinHeng Electronics idProduct 0x7584 CH340S bcdDevice 2.52 iManufacturer 0 iProduct 2 USB2.0-Print iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 32 bNumInterfaces 1 bConfigurationValue 1 iConfiguration 0 bmAttributes 0x80 (Bus Powered) MaxPower 96mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 2 bInterfaceClass 7 Printer bInterfaceSubClass 1 Printer bInterfaceProtocol 2 Bidirectional iInterface 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0020 1x 32 bytes bInterval 0 Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x02 EP 2 OUT bmAttributes 2 Transfer Type Bulk Synch Type None Usage Type Data wMaxPacketSize 0x0020 1x 32 bytes bInterval 0 Device Status: 0x0000 (Bus Powered)

Read the article

Using TPL and PLINQ to raise performance of feed aggregator

- by DigiMortal

In this posting I will show you how to use Task Parallel Library (TPL) and PLINQ features to boost performance of simple RSS-feed aggregator. I will use here only very basic .NET classes that almost every developer starts from when learning parallel programming. Of course, we will also measure how every optimization affects performance of feed aggregator. Feed aggregator Our feed aggregator works as follows: Load list of blogs Download RSS-feed Parse feed XML Add new posts to database Our feed aggregator is run by task scheduler after every 15 minutes by example. We will start our journey with serial implementation of feed aggregator. Second step is to use task parallelism and parallelize feeds downloading and parsing. And our last step is to use data parallelism to parallelize database operations. We will use Stopwatch class to measure how much time it takes for aggregator to download and insert all posts from all registered blogs. After every run we empty posts table in database. Serial aggregation Before doing parallel stuff let’s take a look at serial implementation of feed aggregator. All tasks happen one after other. internal class FeedClient { private readonly INewsService _newsService; private const int FeedItemContentMaxLength = 255; public FeedClient() { ObjectFactory.Initialize(container => { container.PullConfigurationFromAppConfig = true; }); _newsService = ObjectFactory.GetInstance<INewsService>(); } public void Execute() { var blogs = _newsService.ListPublishedBlogs(); for (var index = 0; index <blogs.Count; index++) { ImportFeed(blogs[index]); } } private void ImportFeed(BlogDto blog) { if(blog == null) return; if (string.IsNullOrEmpty(blog.RssUrl)) return; var uri = new Uri(blog.RssUrl); SyndicationContentFormat feedFormat; feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri); if (feedFormat == SyndicationContentFormat.Rss) ImportRssFeed(blog); if (feedFormat == SyndicationContentFormat.Atom) ImportAtomFeed(blog); } private void ImportRssFeed(BlogDto blog) { var uri = new Uri(blog.RssUrl); var feed = RssFeed.Create(uri); foreach (var item in feed.Channel.Items) { SaveRssFeedItem(item, blog.Id, blog.CreatedById); } } private void ImportAtomFeed(BlogDto blog) { var uri = new Uri(blog.RssUrl); var feed = AtomFeed.Create(uri); foreach (var item in feed.Entries) { SaveAtomFeedEntry(item, blog.Id, blog.CreatedById); } } } Serial implementation of feed aggregator downloads and inserts all posts with 25.46 seconds. Task parallelism Task parallelism means that separate tasks are run in parallel. You can find out more about task parallelism from MSDN page Task Parallelism (Task Parallel Library) and Wikipedia page Task parallelism. Although finding parts of code that can run safely in parallel without synchronization issues is not easy task we are lucky this time. Feeds import and parsing is perfect candidate for parallel tasks. We can safely parallelize feeds import because importing tasks doesn’t share any resources and therefore they don’t also need any synchronization. After getting the list of blogs we iterate through the collection and start new TPL task for each blog feed aggregation. internal class FeedClient { private readonly INewsService _newsService; private const int FeedItemContentMaxLength = 255; public FeedClient() { ObjectFactory.Initialize(container => { container.PullConfigurationFromAppConfig = true; }); _newsService = ObjectFactory.GetInstance<INewsService>(); } public void Execute() { var blogs = _newsService.ListPublishedBlogs(); var tasks = new Task[blogs.Count]; for (var index = 0; index <blogs.Count; index++) { tasks[index] = new Task(ImportFeed, blogs[index]); tasks[index].Start(); } Task.WaitAll(tasks); } private void ImportFeed(object blogObject) { if(blogObject == null) return; var blog = (BlogDto)blogObject; if (string.IsNullOrEmpty(blog.RssUrl)) return; var uri = new Uri(blog.RssUrl); SyndicationContentFormat feedFormat; feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri); if (feedFormat == SyndicationContentFormat.Rss) ImportRssFeed(blog); if (feedFormat == SyndicationContentFormat.Atom) ImportAtomFeed(blog); } private void ImportRssFeed(BlogDto blog) { var uri = new Uri(blog.RssUrl); var feed = RssFeed.Create(uri); foreach (var item in feed.Channel.Items) { SaveRssFeedItem(item, blog.Id, blog.CreatedById); } } private void ImportAtomFeed(BlogDto blog) { var uri = new Uri(blog.RssUrl); var feed = AtomFeed.Create(uri); foreach (var item in feed.Entries) { SaveAtomFeedEntry(item, blog.Id, blog.CreatedById); } } } You should notice first signs of the power of TPL. We made only minor changes to our code to parallelize blog feeds aggregating. On my machine this modification gives some performance boost – time is now 17.57 seconds. Data parallelism There is one more way how to parallelize activities. Previous section introduced task or operation based parallelism, this section introduces data based parallelism. By MSDN page Data Parallelism (Task Parallel Library) data parallelism refers to scenario in which the same operation is performed concurrently on elements in a source collection or array. In our code we have independent collections we can process in parallel – imported feed entries. As checking for feed entry existence and inserting it if it is missing from database doesn’t affect other entries the imported feed entries collection is ideal candidate for parallelization. internal class FeedClient { private readonly INewsService _newsService; private const int FeedItemContentMaxLength = 255; public FeedClient() { ObjectFactory.Initialize(container => { container.PullConfigurationFromAppConfig = true; }); _newsService = ObjectFactory.GetInstance<INewsService>(); } public void Execute() { var blogs = _newsService.ListPublishedBlogs(); var tasks = new Task[blogs.Count]; for (var index = 0; index <blogs.Count; index++) { tasks[index] = new Task(ImportFeed, blogs[index]); tasks[index].Start(); } Task.WaitAll(tasks); } private void ImportFeed(object blogObject) { if(blogObject == null) return; var blog = (BlogDto)blogObject; if (string.IsNullOrEmpty(blog.RssUrl)) return; var uri = new Uri(blog.RssUrl); SyndicationContentFormat feedFormat; feedFormat = SyndicationDiscoveryUtility.SyndicationContentFormatGet(uri); if (feedFormat == SyndicationContentFormat.Rss) ImportRssFeed(blog); if (feedFormat == SyndicationContentFormat.Atom) ImportAtomFeed(blog); } private void ImportRssFeed(BlogDto blog) { var uri = new Uri(blog.RssUrl); var feed = RssFeed.Create(uri); feed.Channel.Items.AsParallel().ForAll(a => { SaveRssFeedItem(a, blog.Id, blog.CreatedById); }); } private void ImportAtomFeed(BlogDto blog) { var uri = new Uri(blog.RssUrl); var feed = AtomFeed.Create(uri); feed.Entries.AsParallel().ForAll(a => { SaveAtomFeedEntry(a, blog.Id, blog.CreatedById); }); } } We did small change again and as the result we parallelized checking and saving of feed items. This change was data centric as we applied same operation to all elements in collection. On my machine I got better performance again. Time is now 11.22 seconds. Results Let’s visualize our measurement results (numbers are given in seconds). As we can see then with task parallelism feed aggregation takes about 25% less time than in original case. When adding data parallelism to task parallelism our aggregation takes about 2.3 times less time than in original case. More about TPL and PLINQ Adding parallelism to your application can be very challenging task. You have to carefully find out parts of your code where you can safely go to parallel processing and even then you have to measure the effects of parallel processing to find out if parallel code performs better. If you are not careful then troubles you will face later are worse than ones you have seen before (imagine error that occurs by average only once per 10000 code runs). Parallel programming is something that is hard to ignore. Effective programs are able to use multiple cores of processors. Using TPL you can also set degree of parallelism so your application doesn’t use all computing cores and leaves one or more of them free for host system and other processes. And there are many more things in TPL that make it easier for you to start and go on with parallel programming. In next major version all .NET languages will have built-in support for parallel programming. There will be also new language constructs that support parallel programming. Currently you can download Visual Studio Async to get some idea about what is coming. Conclusion Parallel programming is very challenging but good tools offered by Visual Studio and .NET Framework make it way easier for us. In this posting we started with feed aggregator that imports feed items on serial mode. With two steps we parallelized feed importing and entries inserting gaining 2.3 times raise in performance. Although this number is specific to my test environment it shows clearly that parallel programming may raise the performance of your application significantly.

Read the article

Explaining Explain Plan Notes for Auto DOP

- by jean-pierre.dijcks

I've recently gotten some questions around "why do I not see a parallel plan" while Auto DOP is on (I think)...? It is probably worthwhile to quickly go over some of the ways to find out what Auto DOP was thinking. In general, there is no need to go tracing sessions and look under the hood. The thing to start with is to do an explain plan on your statement and to look at the parameter settings on the system. Parameter Settings to Look At First and foremost, make sure that parallel_degree_policy = AUTO. If you have that parameter set to LIMITED you will not have queuing and we will only do the auto magic if your objects are set to default parallel (so no degree specified). Next you want to look at the value of parallel_degree_limit. It is typically set to CPU, which in default settings equates to the Default DOP of the system. If you are testing Auto DOP itself and the impact it has on performance you may want to leave it at this CPU setting. If you are running concurrent statements you may want to give this some more thoughts. See here for more information. In general, do stick with either CPU or with a specific number. For now avoid the IO setting as I've seen some mixed results with that... In 11.2.0.2 you should also check that IO Calibrate has been run. Best to simply do a: SQL> select * from V$IO_CALIBRATION_STATUS; STATUS CALIBRATION_TIME ------------- ---------------------------------------------------------------- READY 04-JAN-11 10.04.13.104 AM You should see that your IO Calibrate is READY and therefore Auto DOP is ready. In any case, if you did not run the IO Calibrate step you will get the following note in the explain plan: Note ----- - automatic DOP: skipped because of IO calibrate statistics are missing One more note on calibrate_io, if you do not have asynchronous IO enabled you will see: ERROR at line 1: ORA-56708: Could not find any datafiles with asynchronous i/o capability ORA-06512: at "SYS.DBMS_RMIN", line 463 ORA-06512: at "SYS.DBMS_RESOURCE_MANAGER", line 1296 ORA-06512: at line 7 While this is changed in some fixes to the calibrate procedure, you should really consider switching asynchronous IO on for your data warehouse. Explain Plan Explanation To see the notes that are shown and explained here (and the above little snippet ) you can use a simple explain plan mechanism. There should be no need to add +parallel etc. explain plan for <statement> SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); Auto DOP The note structure displaying why Auto DOP did not work (with the exception noted above on IO Calibrate) is like this: Automatic degree of parallelism is disabled: <reason> These are the reason codes: Parameter - parallel_degree_policy = manual which will not allow Auto DOP to kick in Hint - One of the following hints are used NOPARALLEL, PARALLEL(1), PARALLEL(MANUAL) Outline - A SQL outline of an older version (before 11.2) is used SQL property restriction - The statement type does not allow for parallel processing Rule-based mode - Instead of the Cost Based Optimizer the system is using the RBO Recursive SQL statement - The statement type does not allow for parallel processing pq disabled/pdml disabled/pddl disabled - For some reason (alter session?) parallelism is disabled Limited mode but no parallel objects referenced - your parallel_degree_policy = LIMITED and no objects in the statement are decorated with the default PARALLEL degree. In most cases all objects have a specific degree in which case Auto DOP will honor that degree. Parallel Degree Limited When Auto DOP does it works you may see the cap you imposed with parallel_degree_limit showing up in the note section of the explain plan: Note ----- - automatic DOP: Computed Degree of Parallelism is 16 because of degree limit This is an obvious indication that your are being capped for this statement. There is one quite interesting one that happens when you are being capped at DOP = 1. First of you get a serial plan and the note changes slightly in that it does not indicate it is being capped (we hope to update the note at some point in time to be more specific). It right now looks like this: Note ----- - automatic DOP: Computed Degree of Parallelism is 1 Dynamic Sampling With 11.2.0.2 you will start seeing another interesting change in parallel plans, and since we are talking about the note section here, I figured we throw this in for good measure. If we deem the parallel (!) statement complex enough, we will enact dynamic sampling on your query. This happens as long as you did not change the default for dynamic sampling on the system. The note looks like this: Note ----- - dynamic sampling used for this statement (level=5)

Read the article

Software to create a virtual parallel port in Windows XP?

- by drknexus

I am writing a program that will eventually be used on a computer with a physical parallel port and will need to set certain pins high or low in order to signal to an external device. However, the development laptop I am using does not have any physical parallel ports and is too low powered to run a virtual machine. Is there any option available that will create a virtual parallel port within Windows XP? Ideally it would include a debug mode that would allow me to see what values have been pushed out on the parallel port.

Read the article

Parallel Computing Platform Developer Lab

- by Daniel Moth

This is an exciting announcement that I must share: "Microsoft Developer & Platform Evangelism, in collaboration with the Microsoft Parallel Computing Platform product team, is hosting a developer lab at the Platform Adoption Center on April 12-15, 2010. This event is for Microsoft Partners and Customers seeking to incorporate either .NET Framework 4 or Visual C++ 2010 parallelism features into their new or existing applications, and to gain expertise with new Visual Studio 2010 tools including the Parallel Tasks and Parallel Stacks debugger toolwindows, and the Concurrency Visualizer in the profiler. Opportunities for attendees include: Gain expert design assistance with your Parallel Computing Platform based solution. Develop a solution prototype in collaboration with Microsoft Software Engineers. Attend topical presentations and “chalk-talk” sessions. Your team will be assigned private, secure offices for confidential collaboration activities. The event has limited capacity, thus enrollment is based on an application process. Please download and complete the application form then return it to the event management team per instructions included within the form. Applications will be evaluated based upon the technical solution scenario along with indicated project readiness timelines. Microsoft event management team members may contact you directly for additional clarification and discussion of your project scenario during the nomination process." Comments about this post welcome at the original blog.

Read the article

Unity: parallel vectors and cross product, how to compare vectors

- by Heisenbug

I read this post explaining a method to understand if the angle between 2 given vectors and the normal to the plane described by them, is clockwise or anticlockwise: public static AngleDir GetAngleDirection(Vector3 beginDir, Vector3 endDir, Vector3 upDir) { Vector3 cross = Vector3.Cross(beginDir, endDir); float dot = Vector3.Dot(cross, upDir); if (dot > 0.0f) return AngleDir.CLOCK; else if (dot < 0.0f) return AngleDir.ANTICLOCK; return AngleDir.PARALLEL; } After having used it a little bit, I think it's wrong. If I supply the same vector as input (beginDir equal to endDir), the cross product is zero, but the dot product is a little bit more than zero. I think that to fix that I can simply check if the cross product is zero, means that the 2 vectors are parallel, but my code doesn't work. I tried the following solution: Vector3 cross = Vector3.Cross(beginDir, endDir); if (cross == Vector.zero) return AngleDir.PARALLEL; And it doesn't work because comparison between Vector.zero and cross is always different from zero (even if cross is actually [0.0f, 0.0f, 0.0f]). I tried also this: Vector3 cross = Vector3.Cross(beginDir, endDir); if (cross.magnitude == 0.0f) return AngleDir.PARALLEL; it also fails because magnitude is slightly more than zero. So my question is: given 2 Vector3 in Unity, how to compare them? I need the elegant equivalent version of this: if (beginDir.x == endDir.x && beginDir.y == endDir.y && beginDir.z == endDir.z) return true;

Read the article

Parallel EntityFramework

- by mehanik

Is it possible to make some work in parallel with entity framework for following example? using (var dbContext = new DB()) { var res = (from c in dbContext.Customers orderby c.Name select new { c.Id, c.Name, c.Role } ).ToDictionary(c => c.Id, c => new Dictionary<string, object> { { "Name",c.Name }, { "Role", c.Role } }); } For exampe what will be changed if I add AsParrallel? using (var dbContext = new DB()) { var res = (from c in dbContext.Customers orderby c.Name select new { c.Id, c.Name, c.Role } ).AsParallel().ToDictionary(c => c.Id, c => new Dictionary<string, object> { { "Name",c.Name }, { "Role", c.Role } }); }

Read the article

What parallel programming model do you recommend today to take advantage of the manycore processors

- by Doctor J

If you were writing a new application from scratch today, and wanted it to scale to all the cores you could throw at it tomorrow, what parallel programming model/system/language/library would you choose? Why? I am particularly interested in answers along these axes: Programmer productivity / ease of use (can mortals successfully use it?) Target application domain (what problems is it (not) good at?) Concurrency style (does it support tasks, pipelines, data parallelism, messages...?) Maintainability / future-proofing (will anybody still be using it in 20 years?) Performance (how does it scale on what kinds of hardware?) I am being deliberately vauge on the nature of the application in anticipation of getting good general answers useful for a variety of applications.

Read the article

Can Parallel.ForEach be used safely with CloudTableQuery

- by knightpfhor

I have a reasonable number of records in an Azure Table that I'm attempting to do some one time data encryption on. I thought that I could speed things up by using a Parallel.ForEach. Also because there are more than 1K records and I don't want to mess around with continuation tokens myself I'm using a CloudTableQuery to get my enumerator. My problem is that some of my records have been double encrypted and I realised that I'm not sure how thread safe the enumerator returned by CloudTableQuery.Execute() is. Has anyone else out there had any experience with this combination?

Read the article

Parallel WCF calls to multiple servers

- by gregmac

I have a WCF service (the same one) running on multiple servers, and I'd like to call all instances in parallel from a single client. I'm using ChannelFactory and the interface (contract) to call the service. Each service has a local <endpoint> client defined in the .config file. What I'm trying to do is build some kind of generic framework to avoid code duplication. For example a synchronous call in a single thread looks something like this: Dim remoteName As String = "endpointName1" Dim svcProxy As ChannelFactory(Of IMyService) = New ChannelFactory(Of IMyService)(remoteName) Try svcProxy.Open() Dim svc As IMyService = svcProxy.CreateChannel() nodeResult = svc.TestRemote("foo") Finally svcProxy.Close() End Try The part I'm having difficulty with is how to specify and actually invoke the actual remote method (eg "TestRemote") without having to duplicate the above code, and all the thread-related stuff that invokes that, for each method. In the end, I'd like to be able to write code along the lines of (consider this psuedo code): Dim results as Dictionary(Of Node, ExpectedReturnType) results = ParallelInvoke(IMyService.SomeMethod, parameter1, parameter2) where ParallelInvoke() will take the method as an argument, as well as the parameters (paramArray or object() .. whatever) and then go run the request on each remote node, block until they all return an answer or timeout, and then return the results into a Dictionary with the key as the node, and the value as whatever value it returned. I can then (depending on the method) pick out the single value I need, or aggregate all the values from each server together, etc. I'm pretty sure I can do this using reflection and InvokeMember(), but that requires passing the method as a string (which can lead to errors like calling a non-existing method that can't be caught at compile time), so I'd like to see if there is a cleaner way to do this. Thanks

Search Results

Search found 1794 results on 72 pages for 'embarrassingly parallel'.

Page 5/72 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by zhang18

- by Il-Bhima

- by chibacity

- by Brendan Long

- by Abhijit

- by prateek

- by Iman

- by iulianchira

- by Algorist

- by Alxandr

- by Toby Hede

- by java_pill

- by Toby Hede

- by osgx

- by zaphod

- by zaphod

- by DigiMortal

- by jean-pierre.dijcks

- by drknexus

- by Daniel Moth

- by Heisenbug

- by mehanik

- by Doctor J

- by knightpfhor

- by gregmac

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >