Search Results

Search found 3521 results on 141 pages for 'parallel computing'.

Page 19/141 | < Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >

  • DNA and Quantum computing

    - by Jacques
    I recently(A couple of weeks ago) read an article about the future of processing and how quantum-processors and DNA-processors(DNA-computing) are the future competitors of computing since both will completely outperform the computers of this era. In terms of processing speeds, what do we expect from these two different processing techniques ? Personally I believe that DNA-processing will be a major step towards AI. For labs and office work I think quantum-processing which will be more logical. I'm quite excited that i'm still so young - to see what the future of technology holds! Then again my parents will soon find out what the after-life holds... just as bloody exciting, if not more..

    Read the article

  • Using Parallel Extensions with ThreadStatic attribute. Could it leak memory?

    - by the-locster
    I'm using Parallel Extensions fairly heavily and I've just now encountered a case where using thread locla storrage might be sensible to allow re-use of objects by worker threads. As such I was lookign at the ThreadStatic attribute which marks a static field/variable as having a unique value per thread. It seems to me that it would be unwise to use PE with the ThreadStatic attribute without any guarantee of thread re-use by PE. That is, if threads are created and destroyed to some degree would the variables (and thus objects they point to) remain in thread local storage for some indeterminate amount of time, thus causing a memory leak? Or perhaps the thread storage is tied to the threads and disposed of when the threads are disposed? But then you still potentially have threads in a pool that are longed lived and that accumulate thread local storage from various pieces of code the threads are used for. Is there a better approach to obtaining thread local storage with PE? Thankyou.

    Read the article

  • How does Task Parallel Library scale on a terminal server or in a web application?

    - by Lasse V. Karlsen
    I understand that the TPL uses work-stealing queues for its tasks when I execute things like Parallel.For and similar constructs. If I understand this correctly, the construct will spin up a number of tasks, where each will start processing items. If one of the tasks complete their allotted items, it will start stealing items from the other tasks which hasn't yet completed theirs. This solves the problem where items 1-100 are cheap to process and items 101-200 are costly, and one of the two tasks would just sit idle until the other completed. (I know this is a simplified exaplanation.) However, how will this scale on a terminal server or in a web application (assuming we use TPL in code that would run in the web app)? Can we risk saturating the CPUs with tasks just because there are N instances of our application running side by side? Is there any information on this topic that I should read? I've yet to find anything in particular, but that doesn't mean there is none.

    Read the article

  • how to generate uncorrelated random numbers in repeated calls in parallel?

    - by user1446948
    I want to write a function which will be repeatedly called by other functions many times. Inside this function it is supposed to generate a lot of random numbers and this part will be treated in parallel. If only for one run, the seed can be chosen differently for each thread, so that the random numbers will be uncorrelated. However, if this function will be called the 2nd time, it seems that the random numbers will repeat unless the seed will be again changed during the later calls. So my question is, is there a good way to generate the random numbers or reset the seed so that the random numbers generated by repeated calls to this function and also by different threads are really random? Thank you.

    Read the article

  • Limiting DOPs &ndash; Who rules over whom?

    - by jean-pierre.dijcks
    I've gotten a couple of questions from Dan Morgan and figured I start to answer them in this way. While Dan is running on a big system he is running with Database Resource Manager and he is trying to make sure the system doesn't go crazy (remember end user are never, ever crazy!) on very high DOPs. Q: How do I control statements with very high DOPs driven from user hints in queries? A: The best way to do this is to work with DBRM and impose limits on consumer groups. The Max DOP setting you can set in DBRM allows you to overwrite the hint. Now let's go into some more detail here. Assume my object (and for simplicity we assume there is a single object - and do remember that we always pick the highest DOP when in doubt and when conflicting DOPs are available in a query) has PARALLEL 64 as its setting. Assume that the query that selects something cool from that table lives in a consumer group with a max DOP of 32. Assume no goofy things (like running out of parallel_max_servers) are happening. A query selecting from this table will run at DOP 32 because DBRM caps the DOP. As of 11.2.0.1 we also use the DBRM cap to create the original plan (at compile time) and not just enforce the cap at runtime. Now, my user is smart and writes a query with a parallel hint requesting DOP 128. This query is still capped by DBRM and DBRM overrules the hint in the statement. The statement, despite the hint, runs at DOP 32. Note that in the hinted scenario we do compile the statement with DOP 128 (the optimizer obeys the hint). This is another reason to use table decoration rather than hints. Q: What happens if I set parallel_max_servers higher than processes (e.g. the max number of processes allowed to run on my machine)? A: Processes rules. It is important to understand that processes are fixed at startup time. If you increase parallel_max_servers above the number of processes in the processes parameter you should get a warning in the alert log stating it can not take effect. As a follow up, a hinted query requesting more parallel processes than either parallel_max_servers or processes will not be able to acquire the requested number. Parallel_max_processes will prevent this. And since parallel_max_servers should be lower than max processes you can never go over either...

    Read the article

  • Do you know what is a DevOps Project?

    - by Gopinath
    Yesterday I wrote about OpenStack project, an open source cloud computing stack that lets you build Cloud Computing environments. While reading more on this topic I stumbled about a new type of projects called DevOps projects.  OpenStack is all set to become the first DevOps project, reports Forbes …the way OpenStack is applying the open source model to creating cloud infrastructure, the open source model is on the verge of being extended so that the collaboration and design process will include software, hardware, and networking in the data center as well as operational processes. In modern development, the idea of designing software, data center, and operations using one integrated team is called DevOps.

    Read the article

  • Developing a Support Plan for Cloud Applications

    - by BuckWoody
    Last week I blogged about developing a High-Availability plan. The specifics of a given plan aren't as simple as "Step 1, then Step 2" because in a hybrid environment (which most of us have) the situation changes the requirements. There are those that look for simple "template" solutions, but unless you settle on a single vendor and a single way of doing things, that's not really viable. The same holds true for support. As I've mentioned before, I'm not fond of the term "cloud", and would rather use the tem "Distributed Computing". That being said, more people understand the former, so I'll just use that for now. What I mean by Distributed Computing is leveraging another system or setup to perform all or some of a computing function. If this definition holds true, then you're essentially creating a partnership with a vendor to run some of your IT - whether that be IaaS, PaaS or SaaS, or more often, a mix. In your on-premises systems, you're the first and sometimes only line of support. That changes when you bring in a Cloud vendor. For Windows Azure, we have plans for support that you can pay for if you like. http://www.windowsazure.com/en-us/support/plans/ You're not off the hook entirely, however. You still need to create a plan to support your users in their applications, especially for the parts you control. The last thing they want to hear is "That's vendor X's problem - you'll have to call them." I find that this is often the last thing the architects think about in a solution. It's fine to put off the support question prior to deployment, but I would hold off on calling it "production" until you have that plan in place. There are lots of examples, like this one: http://www.va-interactive.com/inbusiness/editorial/sales/ibt/customer.html some of which are technology-specific. Once again, this is an "it depends" kind of approach. While it would be nice if there was just something in a box we could buy, it just doesn't work that way in a hybrid system. You have to know your options and apply them appropriately.

    Read the article

  • Developing a Support Plan for Cloud Applications

    - by BuckWoody
    Last week I blogged about developing a High-Availability plan. The specifics of a given plan aren't as simple as "Step 1, then Step 2" because in a hybrid environment (which most of us have) the situation changes the requirements. There are those that look for simple "template" solutions, but unless you settle on a single vendor and a single way of doing things, that's not really viable. The same holds true for support. As I've mentioned before, I'm not fond of the term "cloud", and would rather use the tem "Distributed Computing". That being said, more people understand the former, so I'll just use that for now. What I mean by Distributed Computing is leveraging another system or setup to perform all or some of a computing function. If this definition holds true, then you're essentially creating a partnership with a vendor to run some of your IT - whether that be IaaS, PaaS or SaaS, or more often, a mix. In your on-premises systems, you're the first and sometimes only line of support. That changes when you bring in a Cloud vendor. For Windows Azure, we have plans for support that you can pay for if you like. http://www.windowsazure.com/en-us/support/plans/ You're not off the hook entirely, however. You still need to create a plan to support your users in their applications, especially for the parts you control. The last thing they want to hear is "That's vendor X's problem - you'll have to call them." I find that this is often the last thing the architects think about in a solution. It's fine to put off the support question prior to deployment, but I would hold off on calling it "production" until you have that plan in place. There are lots of examples, like this one: http://www.va-interactive.com/inbusiness/editorial/sales/ibt/customer.html some of which are technology-specific. Once again, this is an "it depends" kind of approach. While it would be nice if there was just something in a box we could buy, it just doesn't work that way in a hybrid system. You have to know your options and apply them appropriately.

    Read the article

  • Why is PLINQ slower than LINQ for this code?

    - by Rob Packwood
    First off, I am running this on a dual core 2.66Ghz processor machine. I am not sure if I have the .AsParallel() call in the correct spot. I tried it directly on the range variable too and that was still slower. I don't understand why... Here are my results: Process non-parallel 1000 took 146 milliseconds Process parallel 1000 took 156 milliseconds Process non-parallel 5000 took 5187 milliseconds Process parallel 5000 took 5300 milliseconds using System; using System.Collections.Generic; using System.Diagnostics; using System.Linq; namespace DemoConsoleApp { internal class Program { private static void Main() { ReportOnTimedProcess( () => GetIntegerCombinations(), "non-parallel 1000"); ReportOnTimedProcess( () => GetIntegerCombinations(runAsParallel: true), "parallel 1000"); ReportOnTimedProcess( () => GetIntegerCombinations(5000), "non-parallel 5000"); ReportOnTimedProcess( () => GetIntegerCombinations(5000, true), "parallel 5000"); Console.Read(); } private static List<Tuple<int, int>> GetIntegerCombinations( int iterationCount = 1000, bool runAsParallel = false) { IEnumerable<int> range = Enumerable.Range(1, iterationCount); IEnumerable<Tuple<int, int>> integerCombinations = from x in range from y in range select new Tuple<int, int>(x, y); return runAsParallel ? integerCombinations.AsParallel().ToList() : integerCombinations.ToList(); } private static void ReportOnTimedProcess( Action process, string processName) { var stopwatch = new Stopwatch(); stopwatch.Start(); process(); stopwatch.Stop(); Console.WriteLine("Process {0} took {1} milliseconds", processName, stopwatch.ElapsedMilliseconds); } } }

    Read the article

  • Is there the equivalent of cloud computing for modems?

    - by morpheous
    I asked this question on SF, and someone recommended that I ask it here - (I don't think I have enough points to move a question from SF to SO - and in any case, I don't know how to do it - so here is the question again): I am interested in the concept of PAAS (platform as a service). However, all talk about SAAS/PAAS seems to focus on only the computer itself - not its peripherals. Is it possible to 'outsource' modems as a resource - so that an app running remotely can pump data to a modem in the cloud? As a bit of background to the question, a group of us are thinking of starting a company that offers similar services to companies like twilio etc - but I want to 'outsource' both the computing hardware (thats PAAS - the easy bit) and the modems (thats what I cant seem to find any info on). Does anyone know if modems can be bundled as part of a PAAS service? - alternatively, is there a way that an application running on one computer can communicate (i.e. pump data) to a remote modem residing on another machine?. I assume I can come up with some protocol over UDP or TCP - but there is no point reinventing the wheel - if such a protocol like that already exists (or if it some open source software allows one to do this). Any suggestions on how to solve this problem?

    Read the article

  • How to perform a Depth First Search iteratively using async/parallel processing?

    - by Prabhu
    Here is a method that does a DFS search and returns a list of all items given a top level item id. How could I modify this to take advantage of parallel processing? Currently, the call to get the sub items is made one by one for each item in the stack. It would be nice if I could get the sub items for multiple items in the stack at the same time, and populate my return list faster. How could I do this (either using async/await or TPL, or anything else) in a thread safe manner? private async Task<IList<Item>> GetItemsAsync(string topItemId) { var items = new List<Item>(); var topItem = await GetItemAsync(topItemId); Stack<Item> stack = new Stack<Item>(); stack.Push(topItem); while (stack.Count > 0) { var item = stack.Pop(); items.Add(item); var subItems = await GetSubItemsAsync(item.SubId); foreach (var subItem in subItems) { stack.Push(subItem); } } return items; } EDIT: I was thinking of something along these lines, but it's not coming together: var tasks = stack.Select(async item => { items.Add(item); var subItems = await GetSubItemsAsync(item.SubId); foreach (var subItem in subItems) { stack.Push(subItem); } }).ToList(); if (tasks.Any()) await Task.WhenAll(tasks); UPDATE: If I wanted to chunk the tasks, would something like this work? foreach (var batch in items.BatchesOf(100)) { var tasks = batch.Select(async item => { await DoSomething(item); }).ToList(); if (tasks.Any()) { await Task.WhenAll(tasks); } } The language I'm using is C#.

    Read the article

  • Setting up MongoDB in High Performance Computing LSF linux cluster

    - by Dnaiel
    I am trying to run mongo in a LSF cluster computing environment where I have no admin control. Our sysadmin installed mongodb, but it is not running. Any ideas on what should I ask the server admin to do for it to run? Or if I could run it locally? [node1382]allelix> mongod --dbpath /users/dnaiel/ma/mongodb/ Tue Oct 2 21:33:48 [initandlisten] MongoDB starting : pid=22436 port=27017 dbpath=/seq/epigenome01/allelix/ma/mongodb/ 64-bit host=node1382 Tue Oct 2 21:33:48 [initandlisten] Tue Oct 2 21:33:48 [initandlisten] ** WARNING: You are running on a NUMA machine. Tue Oct 2 21:33:48 [initandlisten] ** We suggest launching mongod like this to avoid performance problems: Tue Oct 2 21:33:48 [initandlisten] ** numactl --interleave=all mongod [other options] Tue Oct 2 21:33:48 [initandlisten] Tue Oct 2 21:33:48 [initandlisten] db version v2.2.0, pdfile version 4.5 Tue Oct 2 21:33:48 [initandlisten] git version: f5e83eae9cfbec7fb7a071321928f00d1b0c5207 Tue Oct 2 21:33:48 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49 Tue Oct 2 21:33:48 [initandlisten] options: { dbpath: "/users/dnaiel/ma/mongodb/" } Tue Oct 2 21:33:48 [initandlisten] journal dir=users/dnaiel/ma/mongodb/journal Tue Oct 2 21:33:48 [initandlisten] recover begin Tue Oct 2 21:33:48 [initandlisten] info no lsn file in journal/ directory Tue Oct 2 21:33:48 [initandlisten] recover lsn: 0 Tue Oct 2 21:33:48 [initandlisten] recover /seq/epigenome01/allelix/ma/mongodb/journal/j._0 Tue Oct 2 21:33:48 [initandlisten] recover cleaning up Tue Oct 2 21:33:48 [initandlisten] removeJournalFiles Tue Oct 2 21:33:48 [initandlisten] recover done Tue Oct 2 21:33:48 [websvr] admin web console waiting for connections on port 28017 Tue Oct 2 21:33:48 [initandlisten] waiting for connections on port 27017 It basically waits forever and cannot start mongodb. These servers are not webservers but they do have network access, it's a cloud computing LSF environment system. Any advice would be welcome, thanks in advance.

    Read the article

  • Parallel shell loops

    - by brubelsabs
    Hi, I want to process many files and since I've here a bunch of cores I want to do it in parallel: for i in *.myfiles; do do_something $i `derived_params $i` other_params; done I know of a Makefile solution but my commands needs the arguments out of the shell globbing list. What I found is: > function pwait() { > while [ $(jobs -p | wc -l) -ge $1 ]; do > sleep 1 > done > } > To use it, all one has to do is put & after the jobs and a pwait call, the parameter gives the number of parallel processes: > for i in *; do > do_something $i & > pwait 10 > done But this doesn't work very well, e.g. I tried it with e.g. a for loop converting many files but giving me error and left jobs undone. I can't belive that this isn't done yet since the discussion on zsh mailing list is so old by now. So do you know any better?

    Read the article

  • IndexOutofRangeException while using WriteLine in nested Parallel.For loops

    - by Umar Asif
    I am trying to write kinect depth data to a text file using nested Parallel.For loops with the following code. However, it gives IndexOutofRangeException. The code works perfect if using simple for loops but it hangs the UI since the depth format is set to 640x480 causing the loops to write 307200 lines in the text file at 30fps. Therefore, I switched to Parallel. For scheme. If I omit the writeLine command from the nested loops, the code works fine, which indicates that the IndexOutofRangeException is arising at the writeline command. I do not know how to troubleshoot this. Please advise. Any better workarounds to avoid UI freezing? Thanks. using (DepthImageFrame depthImageframe = d.OpenDepthImageFrame()) { if (depthImageframe == null) return; depthImageframe.CopyPixelDataTo(depthPixelData); swDepth = new StreamWriter(@"E:\depthData.txt", false); int i = 0; Parallel.For(0, depthImageframe.Width, delegate(int x) { Parallel.For(0, depthImageframe.Height, delegate(int y) { p[i] = sensor.MapDepthToSkeletonPoint(depthImageframe.Format, x, y, depthPixelData[x + depthImageframe.Width * y]); swDepth.WriteLine(i + "," + p[k].X + "," + p[k].Y + "," + p[k].Z); i++; }); }); swDepth.Close(); } }

    Read the article

  • EL FUTURO DEL CLOUD, A DEBATE EN EL XX CONGRESO NACIONAL DE USUARIOS ORACLE

    - by comunicacion-es_es(at)oracle.com
    Normal 0 21 false false false ES X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} ¡Vuelta a un mini Oracle OpenWorld! La Comunidad de Usuarios de Oracle celebrará en Madrid los próximos 16 y 17 de marzo su XX Congreso Nacional, donde estarán representadas TODAS las áreas de Oracle (aplicaciones, tecnología, hardware y canal). Bajo el lema "Agilidad, innovación y optimización del negocio", contaremos con prestigiosos ponentes internacionales como Massimo Pezzini, vicepresidente de Gartner; Rex Wang, experto en Cloud Computing y vicepresidente de marketing de producto de Oracle; y Janny Ekelson, director de aplicaciones y arquitectura FedEx Express Europa. A parte de los más de 15 casos de éxito, en las más de 40 presentaciones programadas, el Cloud Computing será uno de los temas estrella junto a la estrategia en hardware de Oracle tras la adquisición de Sun. ¡Os esperamos!

    Read the article

  • MaxStartups and MaxSessions configurations parameter for ssh connections?

    - by Webby
    I am copying the files from machineB and machineC into machineA as I am running my below shell script on machineA. If the files is not there in machineB then it should be there in machineC for sure so I will try copying the files from machineB first, if it is not there in machineB then I will try copying the same files from machineC. I am copying the files in parallel using GNU Parallel library and it is working fine. Currently I am copying 10 files in parallel. Below is my shell script which I have - #!/bin/bash export PRIMARY=/test01/primary export SECONDARY=/test02/secondary readonly FILERS_LOCATION=(machineB machineC) export FILERS_LOCATION_1=${FILERS_LOCATION[0]} export FILERS_LOCATION_2=${FILERS_LOCATION[1]} PRIMARY_PARTITION=(550 274 2 546 278) # this will have more file numbers SECONDARY_PARTITION=(1643 1103 1372 1096 1369 1568) # this will have more file numbers export dir3=/testing/snapshot/20140103 find "$PRIMARY" -mindepth 1 -delete find "$SECONDARY" -mindepth 1 -delete do_Copy() { el=$1 PRIMSEC=$2 scp david@$FILERS_LOCATION_1:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMSEC/. || scp david@$FILERS_LOCATION_2:$dir3/new_weekly_2014_"$el"_200003_5.data $PRIMSEC/. } export -f do_Copy parallel --retries 10 -j 10 do_Copy {} $PRIMARY ::: "${PRIMARY_PARTITION[@]}" & parallel --retries 10 -j 10 do_Copy {} $SECONDARY ::: "${SECONDARY_PARTITION[@]}" & wait echo "All files copied." Problem Statement:- With the above script at some point I am getting this exception - ssh_exchange_identification: Connection closed by remote host ssh_exchange_identification: Connection closed by remote host ssh_exchange_identification: Connection closed by remote host And I guess the error is typically caused by too many ssh/scp starting at the same time. That leads me to believe /etc/ssh/sshd_config:MaxStartups and MaxSessions is set too low. But my question is on which server it is pretty low? machineB and machineC or machineA? And on what machines I need to increase the number? On machineA this is what I can find - root@machineA:/home/david# grep MaxStartups /etc/ssh/sshd_config #MaxStartups 10:30:60 root@machineA:/home/david# grep MaxSessions /etc/ssh/sshd_config And on machineB and machineC this is what I can find - [root@machineB ~]$ grep MaxStartups /etc/ssh/sshd_config #MaxStartups 10 [root@machineB ~]$ grep MaxSessions /etc/ssh/sshd_config #MaxSessions 10

    Read the article

  • Bash Parallelization of CPU-intensive processes

    - by ehsanul
    tee forwards its stdin to every single file specified, while pee does the same, but for pipes. These programs send every single line of their stdin to each and every file/pipe specified. However, I was looking for a way to "load balance" the stdin to different pipes, so one line is sent to the first pipe, another line to the second, etc. It would also be nice if the stdout of the pipes are collected into one stream as well. The use case is simple parallelization of CPU intensive processes that work on a line-by-line basis. I was doing a sed on a 14GB file, and it could have run much faster if I could use multiple sed processes. The command was like this: pv infile | sed 's/something//' > outfile To parallelize, the best would be if GNU parallel would support this functionality like so (made up the --demux-stdin option): pv infile | parallel -u -j4 --demux-stdin "sed 's/something//'" > outfile However, there's no option like this and parallel always uses its stdin as arguments for the command it invokes, like xargs. So I tried this, but it's hopelessly slow, and it's clear why: pv infile | parallel -u -j4 "echo {} | sed 's/something//'" > outfile I just wanted to know if there's any other way to do this (short of coding it up myself). If there was a "load-balancing" tee (let's call it lee), I could do this: pv infile | lee >(sed 's/something//' >> outfile) >(sed 's/something//' >> outfile) >(sed 's/something//' >> outfile) >(sed 's/something//' >> outfile) Not pretty, so I'd definitely prefer something like the made up parallel version, but this would work too.

    Read the article

  • While using ConcurrentQueue, trying to dequeue while looping through in parallel

    - by James Black
    I am using the parallel data structures in my .NET 4 application and I have a ConcurrentQueue that gets added to while I am processing through it. I want to do something like: personqueue.AsParallel().WithDegreeOfParallelism(20).ForAll(i => ... ); as I make database calls to save the data, so I am limiting the number of concurrent threads. But, I expect that the ForAll isn't going to dequeue, and I am concerned about just doing ForAll(i => { personqueue.personqueue.TryDequeue(...); ... }); as there is no guarantee that I am popping off the correct one. So, how can I iterate through the collection and dequeue, in a parallel fashion. Or, would it be better to use PLINQ to do this processing, in parallel?

    Read the article

  • Rewinding or resetting Parallel effect in Flex 3

    - by errata
    How can I 'rewind' or force the parallel effect to play from the very beginning once it already started to play? Code sample: <mx:Parallel id="parallelEffect" repeatCount="0"> <mx:Fade alphaTo="1" target="{someTarget}" startDelay="2000" /> <mx:Fade alphaTo="1" target="{someOtherTarget}" startDelay="4000" /> <mx:Fade alphaTo="1" target="{thirdTarget}" startDelay="6000" /> <mx:Fade alphaTo="1" target="{fourthTarget}" startDelay="8000" /> <mx:Fade alphaTo="1" target="{fifthTarget}" startDelay="10000" /> </mx:Parallel>

    Read the article

  • Partition Wise Joins

    - by jean-pierre.dijcks
    Some say they are the holy grail of parallel computing and PWJ is the basis for a shared nothing system and the only join method that is available on a shared nothing system (yes this is oversimplified!). The magic in Oracle is of course that is one of many ways to join data. And yes, this is the old flexibility vs. simplicity discussion all over, so I won't go there... the point is that what you must do in a shared nothing system, you can do in Oracle with the same speed and methods. The Theory A partition wise join is a join between (for simplicity) two tables that are partitioned on the same column with the same partitioning scheme. In shared nothing this is effectively hard partitioning locating data on a specific node / storage combo. In Oracle is is logical partitioning. If you now join the two tables on that partitioned column you can break up the join in smaller joins exactly along the partitions in the data. Since they are partitioned (grouped) into the same buckets, all values required to do the join live in the equivalent bucket on either sides. No need to talk to anyone else, no need to redistribute data to anyone else... in short, the optimal join method for parallel processing of two large data sets. PWJ's in Oracle Since we do not hard partition the data across nodes in Oracle we use the Partitioning option to the database to create the buckets, then set the Degree of Parallelism (or run Auto DOP - see here) and get our PWJs. The main questions always asked are: How many partitions should I create? What should my DOP be? In a shared nothing system the answer is of course, as many partitions as there are nodes which will be your DOP. In Oracle we do want you to look at the workload and concurrency, and once you know that to understand the following rules of thumb. Within Oracle we have more ways of joining of data, so it is important to understand some of the PWJ ideas and what it means if you have an uneven distribution across processes. Assume we have a simple scenario where we partition the data on a hash key resulting in 4 hash partitions (H1 -H4). We have 2 parallel processes that have been tasked with reading these partitions (P1 - P2). The work is evenly divided assuming the partitions are the same size and we can scan this in time t1 as shown below. Now assume that we have changed the system and have a 5th partition but still have our 2 workers P1 and P2. The time it takes is actually 50% more assuming the 5th partition has the same size as the original H1 - H4 partitions. In other words to scan these 5 partitions, the time t2 it takes is not 1/5th more expensive, it is a lot more expensive and some other join plans may now start to look exciting to the optimizer. Just to post the disclaimer, it is not as simple as I state it here, but you get the idea on how much more expensive this plan may now look... Based on this little example there are a few rules of thumb to follow to get the partition wise joins. First, choose a DOP that is a factor of two (2). So always choose something like 2, 4, 8, 16, 32 and so on... Second, choose a number of partitions that is larger or equal to 2* DOP. Third, make sure the number of partitions is divisible through 2 without orphans. This is also known as an even number... Fourth, choose a stable partition count strategy, which is typically hash, which can be a sub partitioning strategy rather than the main strategy (range - hash is a popular one). Fifth, make sure you do this on the join key between the two large tables you want to join (and this should be the obvious one...). Translating this into an example: DOP = 8 (determined based on concurrency or by using Auto DOP with a cap due to concurrency) says that the number of partitions >= 16. Number of hash (sub) partitions = 32, which gives each process four partitions to work on. This number is somewhat arbitrary and depends on your data and system. In this case my main reasoning is that if you get more room on the box you can easily move the DOP for the query to 16 without repartitioning... and of course it makes for no leftovers on the table... And yes, we recommend up-to-date statistics. And before you start complaining, do read this post on a cool way to do stats in 11.

    Read the article

  • How to setup matlab for parallel processing on Amazon EC2?

    - by JohnIdol
    I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things. This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM. On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine. On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors. Obviously the difference is that I have my 2 cores on a single processor, while the EC2 instance has 8 distinct processors. My question is, how do I get matlab to work with those 8 processors? I found this paper, but it seems related to setting up matlab with multiple EC2 instances, which is not my problem. Any help appreciated!

    Read the article

< Previous Page | 15 16 17 18 19 20 21 22 23 24 25 26  | Next Page >