Search Results

Search found 6559 results on 263 pages for 'parallel foreach'.

Page 71/263 | < Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78  | Next Page >

  • SQL Server Optimizer Malfunction?

    - by Tony Davis
    There was a sharp intake of breath from the audience when Adam Machanic declared the SQL Server optimizer to be essentially "stuck in 1997". It was during his fascinating "Query Tuning Mastery: Manhandling Parallelism" session at the recent PASS SQL Summit. Paraphrasing somewhat, Adam (blog | @AdamMachanic) offered a convincing argument that the optimizer often delivers flawed plans based on assumptions that are no longer valid with today’s hardware. In 1997, when Microsoft engineers re-designed the database engine for SQL Server 7.0, SQL Server got its initial implementation of a cost-based optimizer. Up to SQL Server 2000, the developer often had to deploy a steady stream of hints in SQL statements to combat the occasionally wilful plan choices made by the optimizer. However, with each successive release, the optimizer has evolved and improved in its decision-making. It is still prone to the occasional stumble when we tackle difficult problems, join large numbers of tables, perform complex aggregations, and so on, but for most of us, most of the time, the optimizer purrs along efficiently in the background. Adam, however, challenged further any assumption that the current optimizer is competent at providing the most efficient plans for our more complex analytical queries, and in particular of offering up correctly parallelized plans. He painted a picture of a present where complex analytical queries have become ever more prevalent; where disk IO is ever faster so that reads from disk come into buffer cache faster than ever; where the improving RAM-to-data ratio means that we have a better chance of finding our data in cache. Most importantly, we have more CPUs at our disposal than ever before. To get these queries to perform, we not only need to have the right indexes, but also to be able to split the data up into subsets and spread its processing evenly across all these available CPUs. Improvements such as support for ColumnStore indexes are taking things in the right direction, but, unfortunately, deficiencies in the current Optimizer mean that SQL Server is yet to be able to exploit properly all those extra CPUs. Adam’s contention was that the current optimizer uses essentially the same costing model for many of its core operations as it did back in the days of SQL Server 7, based on assumptions that are no longer valid. One example he gave was a "slow disk" bias that may have been valid back in 1997 but certainly is not on modern disk systems. Essentially, the optimizer assesses the relative cost of serial versus parallel plans based on the assumption that there is no IO cost benefit from parallelization, only CPU. It assumes that a single request will saturate the IO channel, and so a query would not run any faster if we parallelized IO because the disk system simply wouldn’t be able to handle the extra pressure. As such, the optimizer often decides that a serial plan is lower cost, often in cases where a parallel plan would improve performance dramatically. It was challenging and thought provoking stuff, as were his techniques for driving parallelism through query logic based on subsets of rows that define the "grain" of the query. I highly recommend you catch the session if you missed it. I’m interested to hear though, when and how often people feel the force of the optimizer’s shortcomings. Barring mistakes, such as stale statistics, how often do you feel the Optimizer fails to find the plan you think it should, and what are the most common causes? Is it fighting to induce it toward parallelism? Combating unexpected plans, arising from table partitioning? Something altogether more prosaic? Cheers, Tony.

    Read the article

  • About the K computer

    - by nospam(at)example.com (Joerg Moellenkamp)
    Okay ? after getting yet another mail because of the new #1 on the Top500 list, I want to add some comments from my side: Yes, the system is using SPARC processor. And that is great news for a SPARC fan like me. It is using the SPARC VIIIfx processor from Fujitsu clocked at 2 GHz. No, it isn't the only one. Most people are saying there are two in the Top500 list using SPARC (#77 JAXA and #1 K) but in fact there are three. The Tianhe-1 (#2 on the Top500 list) super computer contains 2048 Galaxy "FT-1000" 1 GHz 8-core processors. Don't know it? The FeiTeng-1000 ? this proc is a 8 core, 8 threads per core, 1 ghz processor made in China. And it's SPARC based. By the way ? this sounds really familiar to me ? perhaps the people just took the opensourced UltraSPARC-T2 design, because some of the parameters sound just to similar. However it looks like that Tianhe-1 is using the SPARCs as input nodes and not as compute notes. No, I don't see it as the next M-series processor. Simple reason: You can't create SMP systems out of them ? it simply hasn't the functionality to do so. Even when there are multiple CPUs on a single board, they are not connected like an SMP/NUMA machine to a shared memory machine ? they are connected with the cluster interconnect (in this case the Tofu interconnect) and work like a large cluster. Yes, it has a lot of oomph in Linpack ? however I assume a lot came from the extensions to the SPARCv9 standard. No, Linpack has no relevance for any commercial workload ? Linpack is such a special load, that even some HPC people are arguing that it isn't really a good benchmark for HPC. It's embarrassingly parallel, it can work with relatively small interconnects compared to the interconnects in SMP systems (however we get in spheres SMP interconnects where a few years ago). Amdahl isn't hitting that hard when running Linpack. Yes, it's a good move to use SPARC. At some time in the last 10 years, there was an interesting twist in perception: SPARC was considered as proprietary architecture and x86 was the open architecture. However it's vice versa ? try to create a x86 clone and you have a lot of intellectual property problems, create a SPARC clone and you have to spend 100 bucks or so to get the specification from the SPARC Foundation and develop your own SPARC processor. Fujitsu is doing this for a long time now. So they had their own processor, their own know-how. So why was SPARC a good choice? Well ? essentially Fujitsu can do what they want with their core as it is their core, for example adding the extensions to the SPARCv9 chipset ? getting Intel to create extensions to x86 to help you with your product is a little bit harder. So Fujitsu could do they needed to do with their processor in order to create such a supercomputer. No, the K is really using no FPGA or GPU as accelerators. The K is really using the CPU at doing this job. Yes, it has a significantly enhanced FPU capable to execute 8 instructions in parallel. No, it doesn't run Solaris. Yes, it uses Linux. No, it doesn't hurt me ... as my colleague Roland Rambau (he knows a lot about HPC) said once to me ... it doesn't matter which OS is staying out of the way of the workload in HPC.

    Read the article

  • Faster Memory Allocation Using vmtasks

    - by Steve Sistare
    You may have noticed a new system process called "vmtasks" on Solaris 11 systems: % pgrep vmtasks 8 % prstat -p 8 PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP 8 root 0K 0K sleep 99 -20 9:10:59 0.0% vmtasks/32 What is vmtasks, and why should you care? In a nutshell, vmtasks accelerates creation, locking, and destruction of pages in shared memory segments. This is particularly helpful for locked memory, as creating a page of physical memory is much more expensive than creating a page of virtual memory. For example, an ISM segment (shmflag & SHM_SHARE_MMU) is locked in memory on the first shmat() call, and a DISM segment (shmflg & SHM_PAGEABLE) is locked using mlock() or memcntl(). Segment operations such as creation and locking are typically single threaded, performed by the thread making the system call. In many applications, the size of a shared memory segment is a large fraction of total physical memory, and the single-threaded initialization is a scalability bottleneck which increases application startup time. To break the bottleneck, we apply parallel processing, harnessing the power of the additional CPUs that are always present on modern platforms. For sufficiently large segments, as many of 16 threads of vmtasks are employed to assist an application thread during creation, locking, and destruction operations. The segment is implicitly divided at page boundaries, and each thread is given a chunk of pages to process. The per-page processing time can vary, so for dynamic load balancing, the number of chunks is greater than the number of threads, and threads grab chunks dynamically as they finish their work. Because the threads modify a single application address space in compressed time interval, contention on locks protecting VM data structures locks was a problem, and we had to re-scale a number of VM locks to get good parallel efficiency. The vmtasks process has 1 thread per CPU and may accelerate multiple segment operations simultaneously, but each operation gets at most 16 helper threads to avoid monopolizing CPU resources. We may reconsider this limit in the future. Acceleration using vmtasks is enabled out of the box, with no tuning required, and works for all Solaris platform architectures (SPARC sun4u, SPARC sun4v, x86). The following tables show the time to create + lock + destroy a large segment, normalized as milliseconds per gigabyte, before and after the introduction of vmtasks: ISM system ncpu before after speedup ------ ---- ------ ----- ------- x4600 32 1386 245 6X X7560 64 1016 153 7X M9000 512 1196 206 6X T5240 128 2506 234 11X T4-2 128 1197 107 11x DISM system ncpu before after speedup ------ ---- ------ ----- ------- x4600 32 1582 265 6X X7560 64 1116 158 7X M9000 512 1165 152 8X T5240 128 2796 198 14X (I am missing the data for T4 DISM, for no good reason; it works fine). The following table separates the creation and destruction times: ISM, T4-2 before after ------ ----- create 702 64 destroy 495 43 To put this in perspective, consider creating a 512 GB ISM segment on T4-2. Creating the segment would take 6 minutes with the old code, and only 33 seconds with the new. If this is your Oracle SGA, you save over 5 minutes when starting the database, and you also save when shutting it down prior to a restart. Those minutes go directly to your bottom line for service availability.

    Read the article

  • Is Rails Metal (& Rack) a good way to implement a high traffic web service api?

    - by Greg
    I am working on a very typical web application. The main component of the user experience is a widget that a site owner would install on their front page. Every time their front page loads, the widget talks to our server and displays some of the data that returns. So there are two components to this web application: the front end UI that the site owner uses to configure their widget the back end component that responds to the widget's web api call Previously we had all of this running in PHP. Now we are experimenting with Rails, which is fantastic for #1 (the front end UI). The question is how to do #2, the back serving of widget information, efficiently. Obviously this is much higher load than the front end, since it is called every time the front page loads on one of our clients' websites. I can see two obvious approaches: A. Parallel Stack: Set up a parallel stack that uses something other than rails (e.g. our old PHP-based approach) but accesses the same database as the front end B. Rails Metal: Use Rails Metal/Rack to bypass the Rails routing mechanism, but keep the api call responder within the Rails app My main question: Is Rails/Metal a reasonable approach for something like this? But also... Will the overhead of loading the Rails environment still be too heavy? Is there a way to get even closer to the metal with Rails, bypassing most of the environment? Will Rails/Metal performance approach the perf of a similar task on straight PHP (just looking for ballpark here)? And... Is there a 'C' option that would be much better than both A and B? That is, something before going to the lengths of C code compiled to binary and installed as an nginx or apache module? Thanks in advance for any insights.

    Read the article

  • solve a classic map-reduce problem with opencl?

    - by liuliu
    I am trying to parallel a classic map-reduce problem (which can parallel well with MPI) with OpenCL, namely, the AMD implementation. But the result bothers me. Let me brief about the problem first. There are two type of data that flow into the system: the feature set (30 parameters for each) and the sample set (9000+ dimensions for each). It is a classic map-reduce problem in the sense that I need to calculate the score of every feature on every sample (Map). And then, sum up the overall score for every feature (Reduce). There are around 10k features and 30k samples. I tried different ways to solve the problem. First, I tried to decompose the problem by features. The problem is that the score calculation consists of random memory access (pick some of the 9000+ dimensions and do plus/subtraction calculations). Since I cannot coalesce memory access, it costs. Then, I tried to decompose the problem by samples. The problem is that to sum up overall score, all threads are competing for few score variables. It keeps overwriting the score which turns out to be incorrect. (I cannot carry out individual score first and sum up later because it requires 10k * 30k * 4 bytes). The first method I tried gives me the same performance on i7 860 CPU with 8 threads. However, I don't think the problem is unsolvable: it is remarkably similar to ray tracing problem (for which you carry out calculation that millions of rays against millions of triangles). Any ideas?

    Read the article

  • animation extender in datalist control in asp.net 2008

    - by BibiBuBu
    Good Day! i have a question that how can i use animation extender in datalist control in asp.net with c#. i want the animation when i click the delete button (delete button will be in repeater). so that when i remove one record then it shows animation to bring the next record. it is in update panel. <cc1:AnimationExtender ID="AnimationExtender1" runat="server" Enabled="True" TargetControlID="btnDeleteId"> <Animations> <OnClick> <Sequence> <EnableAction Enabled="false" /> <Parallel Duration=".2"> <Resize Height="0" Width="0" Unit="px" /> <FadeOut /> </Parallel> <HideAction /> </Sequence> </OnClick> </Animations> </cc1:AnimationExtender> now if i put my button id in the Target control id then it gives error that it should not be in same update panel etc... but over all nothing working for animation. i am binding my datalist in itemDataBound....e.g. ImageButton imgbtn = (ImageButton)e.Item.FindControl("imgBtnPic"); Label lblAvatar = (Label)e.Item.FindControl("lblAvatar"); LinkButton lbName = (LinkButton)e.Item.FindControl("lbtnName"); Can somebody please suggest me something. thanks

    Read the article

  • Spatial Rotation in Gmod Expression2.

    - by Fascia
    I'm using expression2 to program behavior in Garry's mod (http://wiki.garrysmod.com/?title=Wire_Expression2) Okay so, to set the precedent. In Gmod I have a block and I am at a complete loss of how to get it to rotate around the 3 up, down and right vectors (Which are local. ie; if I pitch it 45 degrees the forward vector is 0.707, 0.707, 0). Essentially, From the 3 vectors I'd like to be able to get local Pitch/Roll/Yaw. By Local Pitch Roll Yaw I mean that they are completely independent of one another allowing true 3d rotation. So for example; if I place my craft so its nose is parallel to the floor the X,Y,Z would be 0,0,0. If I turn it parallel to the floor (World and Local Yaw) 90 degrees it's now 0, 0, 90. If I then pitch it (World Roll, Local Pitch) it 180 degrees it's now 180, 0, 90. I've already explored quaternions however I don't believe I should post my code here as I think I was re-inventing the wheel. I know I didn't explain that well but I believe the problem is pretty generic. Any help anyone could offer is greatly appreciated. Oh, I'd like to avoid gimblelock too. Essentially calculating the rotation around each of the crafts up/forward/right vectors using the up/forward/right vectors. To simply the question a generic implementation rather than one specific to Gmod is absolutely fine.

    Read the article

  • Getting browser to make an AJAX call ASAP, while page is still loading

    - by Chris
    I'm looking for tips on how to get the browser to kick off an AJAX call as soon as possible into processing a web page, ideally before the page is fully downloaded. Here's my approximate motivation: I have a web page that takes, say, 5 seconds to load. It calls a web service that takes, say, 10 seconds to load. If loading the page and calling the web service happened sequentially, the user would have to wait 15 seconds to see all the information. However, if I can get the web service call started before the 5 second page load is complete, then at least some of the work can happened in parallel. Ideally I'd like as much of the work to happen in parallel as possible. My initial theory was that I should place the AJAX-calling javascript as high up as possible in the web page HTML source (being mindful of putting it after the jquery.js include, because I'm making the call using jquery ajax). I'm also being mindful not to wrap the AJAX call in a jquery ready event handler. (I mention this because ready events are popular in a lot of jquery example code.) However, the AJAX call still doesn't seem to get kicked off as early as I'm hoping (at least as judged by the Google Chrome "Timeline" feature), so I'm wondering what other considerations apply. One thing that might potentially be detrimental is that the AJAX call is back to the same web server that's serving the original web page, so I might be in danger of hitting a browser limit on the # of HTTP connections back to that one machine? (The HTML page loads a number of images, css files, etc..)

    Read the article

  • P6 Architecture - Register renaming aside, does the limited user registers result in more ops spent

    - by mrjoltcola
    I'm studying JIT design with regard to dynamic languages VM implementation. I haven't done much Assembly since the 8086/8088 days, just a little here or there, so be nice if I'm out of sorts. As I understand it, the x86 (IA-32) architecture still has the same basic limited register set today that it always did, but the internal register count has grown tremendously, but these internal registers are not generally available and are used with register renaming to achieve parallel pipelining of code that otherwise could not be parallelizable. I understand this optimization pretty well, but my feeling is, while these optimizations help in overall throughput and for parallel algorithms, the limited register set we are still stuck with results in more register spilling overhead such that if x86 had double, or quadruple the registers available to us, there may be significantly less push/pop opcodes in a typical instruction stream? Or are there other processor optmizations that also optimize this away that I am unaware of? Basically if I've a unit of code that has 4 registers to work with for integer work, but my unit has a dozen variables, I've got potentially a push/pop for every 2 or so instructions. Any references to studies, or better yet, personal experiences?

    Read the article

  • How can I improve my real-time behavior in multi-threaded app using pthreads and condition variables

    - by WilliamKF
    I have a multi-threaded application that is using pthreads. I have a mutex() lock and condition variables(). There are two threads, one thread is producing data for the second thread, a worker, which is trying to process the produced data in a real time fashion such that one chuck is processed as close to the elapsing of a fixed time period as possible. This works pretty well, however, occasionally when the producer thread releases the condition upon which the worker is waiting, a delay of up to almost a whole second is seen before the worker thread gets control and executes again. I know this because right before the producer releases the condition upon which the worker is waiting, it does a chuck of processing for the worker if it is time to process another chuck, then immediately upon receiving the condition in the worker thread, it also does a chuck of processing if it is time to process another chuck. In this later case, I am seeing that I am late processing the chuck many times. I'd like to eliminate this lost efficiency and do what I can to keep the chucks ticking away as close to possible to the desired frequency. Is there anything I can do to reduce the delay between the release condition from the producer and the detection that that condition is released such that the worker resumes processing? For example, would it help for the producer to call something to force itself to be context switched out? Bottom line is the worker has to wait each time it asks the producer to create work for itself so that the producer can muck with the worker's data structures before telling the worker it is ready to run in parallel again. This period of exclusive access by the producer is meant to be short, but during this period, I am also checking for real-time work to be done by the producer on behalf of the worker while the producer has exclusive access. Somehow my hand off back to running in parallel again results in significant delay occasionally that I would like to avoid. Please suggest how this might be best accomplished.

    Read the article

  • Where to put external archives to configure running in Eclipse?

    - by Buggieboy
    As a Java/Eclipse noob coming from a Visual Studio .NET background, I'm trying to understand how to set up my run/debug environment in the Eclipse IDE so I can start stepping through code. I have built my application with separate src and bin hierarchies for each library project, and each project has its own jar. For example, in a Windows environment, I have something like: c:\myapp\myapp_main\src\com\mycorp\myapp\main ...and a parallel "bin" tree like this: c:\myapp\myapp_main\bin\com\mycorp\myapp\main Other supporting projects are, similarly: **c:\myapp\myapp_util\src\com\mycorp\myapp\uti**l (and a parallel "bin" tree.) ... etc. So, I end up with, e.g., myapp_util.jar in the ...\myapp_util\bin... path and then add that as an external archive to my myapp_main project. I also use utilities like gluegen-rt.jar, which I add ad external dependencies to the projects requiring them. I have been able to run outside of the Eclipse environment, by copying all my project jars, gluegen-rt DLL, etc., into a "lib" subfolder of some directory and executing something like: java -Djava.library.path=lib -DfullScreen=false -cp lib/gluegen-rt.jar;lib/myapp_main.jar;lib/myapp_util.jar; com.mycorp.myapp.Main When I first pressed F11 to debug, however, I got a message about something like /com/sun/../glugen... not being found by the class loader. So, to debug, or even just run, in Ecplipse, I tried setting up my VM arguments in the Galileo Debug - (Run/Debug) Configurations to be the command line above, beginning at "-Djava.libary.path...". I've put a lib subdirectory - just like the above with all jars and the native gluegen DLL - in various places, such as beneath the folder that my main jar is built in and as a subfolder of my Ecplipse starting workspace folder, but now Eclipse can't find the main class: java.lang.NoClassDefFoundError: com.mycorp.myapp.Main Although the Classpath says that it is using the "default classpath", whatever that is. Bottom line, how do I assemble the constituent files of a multi-project application so that I can run or debug in Ecplipse?

    Read the article

  • java class creation dynamically and make it accessible across the network different jvms i.e. serial

    - by inj.rav
    Hi. I have a requirement of creating java classes dynamically and make it accessible different jvms across the network. I tried to use reflection and javassist tool,but nothing worked. Let me explain the scenario we are using Coherence distributed cache. It has a power of doing aggregation/filtering in parallel across the cluster. For example if a class has [dynamic class] has amount variable and getAmount/setAmount methods. Then if we execute COHERENCE queries, it will start process in parallel across the cluster. I tried to create classes at run time by using javassist and reflection. I am able to access it from single JVM, but when I tried to access the same class from other jvm [through coherence cluster]. I am getting exception of class not found [as remote jvm is not having idea of this class].I can over come this by creating same class dynamically on remote jvm also and access the methods. But coherence in built methods/functions are not able to find the class. could some one help me on this matter

    Read the article

  • MySql Query lag time / deadlock?

    - by Click Upvote
    When there are multiple PHP scripts running in parallel, each making an UPDATE query to the same record in the same table repeatedly, is it possible for there to be a 'lag time' before the table is updated with each query? I have basically 5-6 instances of a PHP script running in parallel, having been launched via cron. Each script gets all the records in the items table, and then loops through them and processes them. However, to avoid processing the same item more than once, I store the id of the last item being processed in a seperate table. So this is how my code works: function getCurrentItem() { $sql = "SELECT currentItemId from settings"; $result = $this->db->query($sql); return $result->get('currentItemId'); } function setCurrentItem($id) { $sql = "UPDATE settings SET currentItemId='$id'"; $this->db->query($sql); } $currentItem = $this->getCurrentItem(); $sql = "SELECT * FROM items WHERE status='pending' AND id > $currentItem'"; $result = $this->db->query($sql); $items = $result->getAll(); foreach ($items as $i) { //Check if $i has been processed by a different instance of the script, and if so, //leave it untouched. if ($this->getCurrentItem() > $i->id) continue; $this->setCurrentItem($i->id); // Process the item here } But despite of all the precautions, most items are being processed more than once. Which makes me think that there is some lag time between the update queries being run by the PHP script, and when the database actually updates the record. Is it true? And if so, what other mechanism should I use to ensure that the PHP scripts always get only the latest currentItemId even when there are multiple scripts running in parrallel? Would using a text file instead of the db help?

    Read the article

  • What's the compelling reason to upgrade to Visual Studio 2010 from VS2008?

    - by Cheeso
    Are there new features in Visual Studio 2010 that are must-haves? If so, which ones? For me, the big draws for VS2008 as compared to VS2005 were LINQ, .NET Framework multitargeting, WCF (REST + Syndication), and general devenv.exe reliability. Granted, some of these features are framework things, and not tool things. For the purposes of this discussion, I'm willing to combine them into one bucket. What is the list of must-have features for VS2010 versus VS2008? Are there any? I am particularly interested in C#. Update: I know how to google, so I can get the official list from Microsoft. I guess what I really wanted was, the assessment from people using it, as to which things are really notable. Microsoft went on for 3 pages about 2008/3.5 features, and many people sort of boiled it down to LINQ, and a few other things. What is that short list for VS2010? Summary so far, what people think is cool or compelling: Visual Studio engine multi-monitor support new extensibility model based on WPF, prettier and more usable new TFS stuff, incl automated test tools parallel debugging .NET Framework parallel extensions for .NET C# 4.0 generic variance optional and named params easier interop with non-managed environments, like COM or Javascript VB 10.0 collection and array literals / initializers automatic properties anonymous methods / statement lambdas I read up on these at Zander's blog. He described these and other features. Nobody on this list said anything about: Visual Studio engine F# support Javascript code-completion JQuery is now included UML better Sharepoint capabilities C++ moves to msbuild project files

    Read the article

  • Quickest algorithm for finding sets with high intersection

    - by conradlee
    I have a large number of user IDs (integers), potentially millions. These users all belong to various groups (sets of integers), such that there are on the order of 10 million groups. To simplify my example and get to the essence of it, let's assume that all groups contain 20 user IDs (i.e., all integer sets have a cardinality of 100). I want to find all pairs of integer sets that have an intersection of 15 or greater. Should I compare every pair of sets? (If I keep a data structure that maps userIDs to set membership, this would not be necessary.) What is the quickest way to do this? That is, what should my underlying data structure be for representing the integer sets? Sorted sets, unsorted---can hashing somehow help? And what algorithm should I use to compute set intersection)? I prefer answers that relate C/C++ (especially STL), but also any more general, algorithmic insights are welcome. Update Also, note that I will be running this in parallel in a shared memory environment, so ideas that cleanly extend to a parallel solution are preferred.

    Read the article

  • InvalidOperationException when executing SqlCommand with transaction

    - by Serhat Özgel
    I have this code, running parallel in two separate threads. It works fine for a few times, but at some random point it throws InvalidOperationException: The transaction is either not associated with the current connection or has been completed. At the point of exception, I am looking inside the transaction with visual studio and verify its connection is set normally. Also command.Transaction._internalTransaction. _transactionState is set to Active and IsZombied property is set to false. This is a test application and I am using Thread.Sleep for creating longer transactions and causing overlaps. Why may the exception being thrown and what can I do about it? IDbCommand command = new SqlCommand("Select * From INFO"); IDbConnection connection = new SqlConnection(connectionString); command.Connection = connection; IDbTransaction transaction = null; try { connection.Open(); transaction = connection.BeginTransaction(); command.Transaction = transaction; command.ExecuteNonQuery(); // Sometimes throws exception Thread.Sleep(forawhile); // For overlapping transactions running in parallel transaction.Commit(); } catch (ApplicationException exception) { if (transaction != null) { transaction.Rollback(); } } finally { connection.Close(); }

    Read the article

  • How to append to a log file in powershell?

    - by Mark Allison
    Hi there, I am doing some parallel SQL Server 2005 database restores in powershell. The way I have done it is to use cmd.exe and start so that powershell doesn't wait for it to complete. What I need to do is to pipe the output into a log file with append. If I use Add-Content, then powershell waits, which is not what I want. My code snippet is foreach ($line in $database_list) { <snip> # Create logins sqlcmd.exe -S $instance -E -d master -i $loginsFile -o $logFile # Read commands from a temp file and execute them in parallel with sqlcmd.exe cmd.exe /c start "Restoring $database" /D"$pwd" sqlcmd.exe -S $instance -E -d master -i $tempSQLFile -t 0 -o $logFile [void]$logFiles.Add($logFile) } The problem is that sqlcmd.exe -o overwrites. I've tried doing this to append: cmd.exe /c start "Restoring $database" /D"$pwd" sqlcmd.exe -S $instance -E -d master -i $tempSQLFile -t 0 >> $logFile But it doesn't work because the output stays in the SQLCMD window and doesn't go to the file. Any suggestions? Thanks, Mark.

    Read the article

  • MySql Query lag time?

    - by Click Upvote
    When there are multiple PHP scripts running in parallel, each making an UPDATE query to the same record in the same table repeatedly, is it possible for there to be a 'lag time' before the table is updated with each query? I have basically 5-6 instances of a PHP script running in parallel, having been launched via cron. Each script gets all the records in the items table, and then loops through them and processes them. However, to avoid processing the same item more than once, I store the id of the last item being processed in a seperate table. So this is how my code works: function getCurrentItem() { $sql = "SELECT currentItemId from settings"; $result = $this->db->query($sql); return $result->get('currentItemId'); } function setCurrentItem($id) { $sql = "UPDATE settings SET currentItemId='$id'"; $this->db->query($sql); } $currentItem = $this->getCurrentItem(); $sql = "SELECT * FROM items WHERE status='pending' AND id > $currentItem'"; $result = $this->db->query($sql); $items = $result->getAll(); foreach ($items as $i) { //Check if $i has been processed by a different instance of the script, and if so, //leave it untouched. if ($this->getCurrentItem() > $i->id) continue; $this->setCurrentItem($i->id); // Process the item here } But despite of all the precautions, most items are being processed more than once. Which makes me think that there is some lag time between the update queries being run by the PHP script, and when the database actually updates the record. Is it true? And if so, what other mechanism should I use to ensure that the PHP scripts always get only the latest currentItemId even when there are multiple scripts running in parrallel? Would using a text file instead of the db help?

    Read the article

  • How can I determine PerlLogHandler performance impact?

    - by Timmy
    I want to create a custom Apache2 log handler, and the template that is found on the apache site is: #file:MyApache2/LogPerUser.pm #--------------------------- package MyApache2::LogPerUser; use strict; use warnings; use Apache2::RequestRec (); use Apache2::Connection (); use Fcntl qw(:flock); use File::Spec::Functions qw(catfile); use Apache2::Const -compile => qw(OK DECLINED); sub handler { my $r = shift; my ($username) = $r->uri =~ m|^/~([^/]+)|; return Apache2::Const::DECLINED unless defined $username; my $entry = sprintf qq(%s [%s] "%s" %d %d\n), $r->connection->remote_ip, scalar(localtime), $r->uri, $r->status, $r->bytes_sent; my $log_path = catfile Apache2::ServerUtil::server_root, "logs", "$username.log"; open my $fh, ">>$log_path" or die "can't open $log_path: $!"; flock $fh, LOCK_EX; print $fh $entry; close $fh; return Apache2::Const::OK; } 1; What is the performance cost of the flocks? Is this logging process done in parallel, or in serial with the HTTP request? In parallel the performance would not matter as much, but I wouldn't want the user to wait another split second to add something like this.

    Read the article

  • Time with and without OpenMP

    - by was
    I have a question.. I tried to improve a well known program algorithm in C, FOX algorithm for matrix multiplication.. relative link without openMP: (http://web.mst.edu/~ercal/387/MPI/ppmpi_c/chap07/fox.c). The initial program had only MPI and I tried to insert openMP in the matrix multiplication method, in order to improve the time of computation: (This program runs in a cluster and computers have 2 cores, thus I created 2 threads.) The problem is that there is no difference of time, with and without openMP. I observed that using openMP sometimes, time is equivalent or greater than the time without openMP. I tried to multiply two 600x600 matrices. void Local_matrix_multiply( LOCAL_MATRIX_T* local_A /* in */, LOCAL_MATRIX_T* local_B /* in */, LOCAL_MATRIX_T* local_C /* out */) { int i, j, k; chunk = CHUNKSIZE; // 100 #pragma omp parallel shared(local_A, local_B, local_C, chunk, nthreads) private(i,j,k,tid) num_threads(2) { /* tid = omp_get_thread_num(); if(tid == 0){ nthreads = omp_get_num_threads(); printf("O Pollaplasiamos pinakwn ksekina me %d threads\n", nthreads); } printf("Thread %d use the matrix: \n", tid); */ #pragma omp for schedule(static, chunk) for (i = 0; i < Order(local_A); i++) for (j = 0; j < Order(local_A); j++) for (k = 0; k < Order(local_B); k++) Entry(local_C,i,j) = Entry(local_C,i,j) + Entry(local_A,i,k)*Entry(local_B,k,j); } //end pragma omp parallel } /* Local_matrix_multiply */

    Read the article

  • What limits scaling in this simple OpenMP program?

    - by Douglas B. Staple
    I'm trying to understand limits to parallelization on a 48-core system (4xAMD Opteron 6348, 2.8 Ghz, 12 cores per CPU). I wrote this tiny OpenMP code to test the speedup in what I thought would be the best possible situation (the task is embarrassingly parallel): // Compile with: gcc scaling.c -std=c99 -fopenmp -O3 #include <stdio.h> #include <stdint.h> int main(){ const uint64_t umin=1; const uint64_t umax=10000000000LL; double sum=0.; #pragma omp parallel for reduction(+:sum) for(uint64_t u=umin; u<umax; u++) sum+=1./u/u; printf("%e\n", sum); } I was surprised to find that the scaling is highly nonlinear. It takes about 2.9s for the code to run with 48 threads, 3.1s with 36 threads, 3.7s with 24 threads, 4.9s with 12 threads, and 57s for the code to run with 1 thread. Unfortunately I have to say that there is one process running on the computer using 100% of one core, so that might be affecting it. It's not my process, so I can't end it to test the difference, but somehow I doubt that's making the difference between a 19~20x speedup and the ideal 48x speedup. To make sure it wasn't an OpenMP issue, I ran two copies of the program at the same time with 24 threads each (one with umin=1, umax=5000000000, and the other with umin=5000000000, umax=10000000000). In that case both copies of the program finish after 2.9s, so it's exactly the same as running 48 threads with a single instance of the program. What's preventing linear scaling with this simple program?

    Read the article

  • Async task ASP.net HttpContext.Current.Items is empty - How do handle this?

    - by GuruC
    We are running a very large web application in asp.net MVC .NET 4.0. Recently we had an audit done and the performance team says that there were a lot of null reference exceptions. So I started investigating it from the dumps and event viewer. My understanding was as follows: We are using Asyn Tasks in our controllers. We rely on HttpContext.Current.Items hashtable to store a lot of Application level values. Task<Articles>.Factory.StartNew(() => { System.Web.HttpContext.Current = ControllerContext.HttpContext.ApplicationInstance.Context; var service = new ArticlesService(page); return service.GetArticles(); }).ContinueWith(t => SetResult(t, "articles")); So we are copying the context object onto the new thread that is spawned from Task factory. This context.Items is used again in the thread wherever necessary. Say for ex: public class SomeClass { internal static int StreamID { get { if (HttpContext.Current != null) { return (int)HttpContext.Current.Items["StreamID"]; } else { return DEFAULT_STREAM_ID; } } } This runs fine as long as number of parallel requests are optimal. My questions are as follows: 1. When the load is more and there are too many parallel requests, I notice that HttpContext.Current.Items is empty. I am not able to figure out a reason for this and this causes all the null reference exceptions. 2. How do we make sure it is not null ? Any workaround if present ? NOTE: I read through in StackOverflow and people have questions like HttpContext.Current is null - but in my case it is not null and its empty. I was reading one more article where the author says that sometimes request object is terminated and it may cause problems since dispose is already called on objects. I am doing a copy of Context object - its just a shallow copy and not a deep copy.

    Read the article

  • A generic C++ library that provides QtConcurrent functionality?

    - by Lucas
    QtConcurrent is awesome. I'll let the Qt docs speak for themselves: QtConcurrent includes functional programming style APIs for parallel list processing, including a MapReduce and FilterReduce implementation for shared-memory (non-distributed) systems, and classes for managing asynchronous computations in GUI applications. For instance, you give QtConcurrent::map() an iterable sequence and a function that accepts items of the type stored in the sequence, and that function is applied to all the items in the collection. This is done in a multi-threaded manner, with a thread pool equal to the number of logical CPU's on the system. There are plenty of other function in QtConcurrent, like filter(), filteredReduced() etc. The standard CompSci map/reduce functions and the like. I'm totally in love with this, but I'm starting work on an OSS project that will not be using the Qt framework. It's a library, and I don't want to force others to depend on such a large framework like Qt. I'm trying to keep external dependencies to a minimum (it's the decent thing to do). I'm looking for a generic C++ framework that provides me with the same/similar high-level primitives that QtConcurrent does. AFAIK boost has nothing like this (I may be wrong though). boost::thread is very low-level compared to what I'm looking for. I know C# has something very similar with their Parallel Extensions so I know this isn't a Qt-only idea. What do you suggest I use?

    Read the article

  • Multithreading for loop while maintaining order

    - by David
    I started messing around with multithreading for a CPU intensive batch process I'm running. Essentially I'm trying to condense multiple single page tiffs into single PDF documents. This works fine with a foreach loop or standard iteration but can be very slow for several 100 page documents. I tried the following based on a some examples I found to use multithreading and it has significant performance improvements however it obliterates the page order instead of 1,2,3,4 it will be 1,3,4,2,6,5 on what thread completes first. My question is how would I utilize this technique while maintaining the page order and if I can will it negate the performance benefit of the multithreading? Thank you in advance. PdfDocument doc = new PdfDocument(); string mail = textBox1.Text; string[] split = mail.Split(new string[] { Environment.NewLine }, StringSplitOptions.None); int counter = split.Count(); // Source must be array or IList. var source = Enumerable.Range(0, 100000).ToArray(); // Partition the entire source array. var rangePartitioner = Partitioner.Create(0, counter); double[] results = new double[counter]; // Loop over the partitions in parallel. Parallel.ForEach(rangePartitioner, (range, loopState) => { // Loop over each range element without a delegate invocation. for (int i = range.Item1; i < range.Item2; i++) { f_prime = split[i].Replace(" " , ""); PdfPage page = doc.AddPage(); XGraphics gfx = XGraphics.FromPdfPage(page); XImage image = XImage.FromFile(f_prime); double x = 0; gfx.DrawImage(image, x, 0); } });

    Read the article

  • Why aren't my threads start at the same time? Java

    - by Ada
    Hi, I have variable number of threads which are used for parallel downloading. I used this, for(int i = 0; i< sth; i++){ thrList.add(new myThread (parameters)); thrList.get(i).start(); thrList.get(i).join(); } I don't know why but they wait for each other to complete. When using threads, I am supposed get mixed print outs, since right then there are several threads running that code. However, when I print them out, they are always in order and one thread waits for the previous one to finish first. I only want them to join the main thread, not wait for each other. I noticed that when I measured time while downloading in parallel. How can I fix this? Why are they doing it in order? In my .java, there is MyThread class with run and there is Downloader class with static methods and variables. Would they be the cause of this? The static methods and variables? How can I fix this problem?

    Read the article

< Previous Page | 67 68 69 70 71 72 73 74 75 76 77 78  | Next Page >