Search Results

Search found 1014 results on 41 pages for 'iteration'.

Page 12/41 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

Search and Matching algorithm

- by Tony

Hello everyone. I am trying to come up with an algorithm to do the following: I have total 12 cells that I need to fill until program stops. I have 3 rows and each row has 4 columns. As an example, let me illustrate this as in airplane. So you have 3 rows and each row has 4 columns and you have window/aisle seats. Each row will have a window seat, aisle seat, aisle seat and window seat (|WA AW| Just like seat arrangement in airplane). At each iteration (different set of passengers), there would be some number of passengers (between 1 and 12) and I need to seat them closest together possible (Seat together). And I do this for next group (each iteration) until program stops (It will stop when I am done with every group). For example, I have 3 passengers (A,B,and C) and A wants to seat in Window, B wants to seat in Aisle and C wants to seat in Window. Assuming that all the seats (all 12) are available, I could place them like |A# BC| or |CB #A| and mark the seats dirty (so I don’t pick same seats again for next passengers). And I do this for next group (iteration). I am not sure if this right forum, but if somebody can advise me how I should accomplish, I would really appreciate it. Thanks.

Read the article
Calling a constructor to reinitialize variables doesn't seem to work?

- by Matt

I wanted to run 1,000 iterations of a program, so set a counter for 1000 in main. I needed to reinitialize various variables after each iteration, and since the class constructor had all the initializations already written out - I decided to call that after each iteration, with the result of each iteration being stored in a variable in main. However, when I called the constructor, it had no effect...it took me a while to figure out - but it didn't reinitialize anything! I created a function exactly like the constructor - so the object would have its own version. When I called that, it reinitialized everything as I expected. int main() { Class MyClass() int counter = 0; while ( counter < 1000 ) { stuff happens } Class(); // This is how I tried to call the constructor initially. // After doing some reading here, I tried: // Class::Class(); // - but that didn't work either /* Later I used... MyClass.function_like_my_constructor; // this worked perfectly */ } ...Could someone try to explain why what I did was wrong, or didn't work, or was silly or what have you? I mean - mentally, I just figured - crap, I can call this constructor and have all this stuff reinitialized. Are constructors (ideally) ONLY called when an object is created?

Read the article
Picking good first estimates for Goldschmidt division

- by Mads Elvheim

I'm calculating fixedpoint reciprocals in Q22.10 with Goldschmidt division for use in my software rasterizer on ARM. This is done by just setting the nominator to 1, i.e the nominator becomes the scalar on the first iteration. To be honest, I'm kind of following the wikipedia algorithm blindly here. The article says that if the denominator is scaled in the half-open range (0.5, 1.0], a good first estimate can be based on the denominator alone: Let F be the estimated scalar and D be the denominator, then F = 2 - D. But when doing this, I lose a lot of precision. Say if I want to find the reciprocal of 512.00002f. In order to scale the number down, I lose 10 bits of precision in the fraction part, which is shifted out. So, my questions are: Is there a way to pick a better estimate which does not require normalization? Also, is it possible to pre-calculate the first estimates so the series converges faster? Right now, it converges after the 4th iteration on average. On ARM this is about ~50 cycles worst case, and that's not taking emulation of clz/bsr into account, nor memory lookups. Here is my testcase. Note: The software implementation of clz on line 13 is from my post here. You can replace it with an intrinsic if you want. #include <stdio.h> #include <stdint.h> const unsigned int BASE = 22ULL; static unsigned int divfp(unsigned int val, int* iter) { /* Nominator, denominator, estimate scalar and previous denominator */ unsigned long long N,D,F, DPREV; int bitpos; *iter = 1; D = val; /* Get the shift amount + is right-shift, - is left-shift. */ bitpos = 31 - clz(val) - BASE; /* Normalize into the half-range (0.5, 1.0] */ if(0 < bitpos) D >>= bitpos; else D <<= (-bitpos); /* (FNi / FDi) == (FN(i+1) / FD(i+1)) */ /* F = 2 - D */ F = (2ULL<<BASE) - D; /* N = F for the first iteration, because the nominator is simply 1. So don't waste a 64-bit UMULL on a multiply with 1 */ N = F; D = ((unsigned long long)D*F)>>BASE; while(1){ DPREV = D; F = (2<<(BASE)) - D; D = ((unsigned long long)D*F)>>BASE; /* Bail when we get the same value for two denominators in a row. This means that the error is too small to make any further progress. */ if(D == DPREV) break; N = ((unsigned long long)N*F)>>BASE; *iter = *iter + 1; } if(0 < bitpos) N >>= bitpos; else N <<= (-bitpos); return N; } int main(int argc, char* argv[]) { double fv, fa; int iter; unsigned int D, result; sscanf(argv[1], "%lf", &fv); D = fv*(double)(1<<BASE); result = divfp(D, &iter); fa = (double)result / (double)(1UL << BASE); printf("Value: %8.8lf 1/value: %8.8lf FP value: 0x%.8X\n", fv, fa, result); printf("iteration: %d\n",iter); return 0; }

Read the article
Parallelism in .NET – Part 3, Imperative Data Parallelism: Early Termination

- by Reed

Although simple data parallelism allows us to easily parallelize many of our iteration statements, there are cases that it does not handle well. In my previous discussion, I focused on data parallelism with no shared state, and where every element is being processed exactly the same. Unfortunately, there are many common cases where this does not happen. If we are dealing with a loop that requires early termination, extra care is required when parallelizing. Often, while processing in a loop, once a certain condition is met, it is no longer necessary to continue processing. This may be a matter of finding a specific element within the collection, or reaching some error case. The important distinction here is that, it is often impossible to know until runtime, what set of elements needs to be processed. In my initial discussion of data parallelism, I mentioned that this technique is a candidate when you can decompose the problem based on the data involved, and you wish to apply a single operation concurrently on all of the elements of a collection. This covers many of the potential cases, but sometimes, after processing some of the elements, we need to stop processing. As an example, lets go back to our previous Parallel.ForEach example with contacting a customer. However, this time, we’ll change the requirements slightly. In this case, we’ll add an extra condition – if the store is unable to email the customer, we will exit gracefully. The thinking here, of course, is that if the store is currently unable to email, the next time this operation runs, it will handle the same situation, so we can just skip our processing entirely. The original, serial case, with this extra condition, might look something like the following: foreach(var customer in customers) { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { // Exit gracefully if we fail to email, since this // entire process can be repeated later without issue. if (theStore.EmailCustomer(customer) == false) break; customer.LastEmailContact = DateTime.Now; } } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re processing our loop, but at any point, if we fail to send our email successfully, we just abandon this process, and assume that it will get handled correctly the next time our routine is run. If we try to parallelize this using Parallel.ForEach, as we did previously, we’ll run into an error almost immediately: the break statement we’re using is only valid when enclosed within an iteration statement, such as foreach. When we switch to Parallel.ForEach, we’re no longer within an iteration statement – we’re a delegate running in a method. This needs to be handled slightly differently when parallelized. Instead of using the break statement, we need to utilize a new class in the Task Parallel Library: ParallelLoopState. The ParallelLoopState class is intended to allow concurrently running loop bodies a way to interact with each other, and provides us with a way to break out of a loop. In order to use this, we will use a different overload of Parallel.ForEach which takes an IEnumerable<T> and an Action<T, ParallelLoopState> instead of an Action<T>. Using this, we can parallelize the above operation by doing: Parallel.ForEach(customers, (customer, parallelLoopState) => { // Run some process that takes some time... DateTime lastContact = theStore.GetLastContact(customer); TimeSpan timeSinceContact = DateTime.Now - lastContact; // If it's been more than two weeks, send an email, and update... if (timeSinceContact.Days > 14) { // Exit gracefully if we fail to email, since this // entire process can be repeated later without issue. if (theStore.EmailCustomer(customer) == false) parallelLoopState.Break(); else customer.LastEmailContact = DateTime.Now; } }); There are a couple of important points here. First, we didn’t actually instantiate the ParallelLoopState instance. It was provided directly to us via the Parallel class. All we needed to do was change our lambda expression to reflect that we want to use the loop state, and the Parallel class creates an instance for our use. We also needed to change our logic slightly when we call Break(). Since Break() doesn’t stop the program flow within our block, we needed to add an else case to only set the property in customer when we succeeded. This same technique can be used to break out of a Parallel.For loop. That being said, there is a huge difference between using ParallelLoopState to cause early termination and to use break in a standard iteration statement. When dealing with a loop serially, break will immediately terminate the processing within the closest enclosing loop statement. Calling ParallelLoopState.Break(), however, has a very different behavior. The issue is that, now, we’re no longer processing one element at a time. If we break in one of our threads, there are other threads that will likely still be executing. This leads to an important observation about termination of parallel code: Early termination in parallel routines is not immediate. Code will continue to run after you request a termination. This may seem problematic at first, but it is something you just need to keep in mind while designing your routine. ParallelLoopState.Break() should be thought of as a request. We are telling the runtime that no elements that were in the collection past the element we’re currently processing need to be processed, and leaving it up to the runtime to decide how to handle this as gracefully as possible. Although this may seem problematic at first, it is a good thing. If the runtime tried to immediately stop processing, many of our elements would be partially processed. It would be like putting a return statement in a random location throughout our loop body – which could have horrific consequences to our code’s maintainability. In order to understand and effectively write parallel routines, we, as developers, need a subtle, but profound shift in our thinking. We can no longer think in terms of sequential processes, but rather need to think in terms of requests to the system that may be handled differently than we’d first expect. This is more natural to developers who have dealt with asynchronous models previously, but is an important distinction when moving to concurrent programming models. As an example, I’ll discuss the Break() method. ParallelLoopState.Break() functions in a way that may be unexpected at first. When you call Break() from a loop body, the runtime will continue to process all elements of the collection that were found prior to the element that was being processed when the Break() method was called. This is done to keep the behavior of the Break() method as close to the behavior of the break statement as possible. We can see the behavior in this simple code: var collection = Enumerable.Range(0, 20); var pResult = Parallel.ForEach(collection, (element, state) => { if (element > 10) { Console.WriteLine("Breaking on {0}", element); state.Break(); } Console.WriteLine(element); }); If we run this, we get a result that may seem unexpected at first: 0 2 1 5 6 3 4 10 Breaking on 11 11 Breaking on 12 12 9 Breaking on 13 13 7 8 Breaking on 15 15 What is occurring here is that we loop until we find the first element where the element is greater than 10. In this case, this was found, the first time, when one of our threads reached element 11. It requested that the loop stop by calling Break() at this point. However, the loop continued processing until all of the elements less than 11 were completed, then terminated. This means that it will guarantee that elements 9, 7, and 8 are completed before it stops processing. You can see our other threads that were running each tried to break as well, but since Break() was called on the element with a value of 11, it decides which elements (0-10) must be processed. If this behavior is not desirable, there is another option. Instead of calling ParallelLoopState.Break(), you can call ParallelLoopState.Stop(). The Stop() method requests that the runtime terminate as soon as possible , without guaranteeing that any other elements are processed. Stop() will not stop the processing within an element, so elements already being processed will continue to be processed. It will prevent new elements, even ones found earlier in the collection, from being processed. Also, when Stop() is called, the ParallelLoopState’s IsStopped property will return true. This lets longer running processes poll for this value, and return after performing any necessary cleanup. The basic rule of thumb for choosing between Break() and Stop() is the following. Use ParallelLoopState.Stop() when possible, since it terminates more quickly. This is particularly useful in situations where you are searching for an element or a condition in the collection. Once you’ve found it, you do not need to do any other processing, so Stop() is more appropriate. Use ParallelLoopState.Break() if you need to more closely match the behavior of the C# break statement. Both methods behave differently than our C# break statement. Unfortunately, when parallelizing a routine, more thought and care needs to be put into every aspect of your routine than you may otherwise expect. This is due to my second observation: Parallelizing a routine will almost always change its behavior. This sounds crazy at first, but it’s a concept that’s so simple its easy to forget. We’re purposely telling the system to process more than one thing at the same time, which means that the sequence in which things get processed is no longer deterministic. It is easy to change the behavior of your routine in very subtle ways by introducing parallelism. Often, the changes are not avoidable, even if they don’t have any adverse side effects. This leads to my final observation for this post: Parallelization is something that should be handled with care and forethought, added by design, and not just introduced casually.

Read the article
How to conciliate OOAD and Database Design?

- by user1620696

Recently I've studied about object oriented analysis and design and I liked a lot about it. In every place I've read people say that the idea is to start with the minimum set of requirements and go improving along the way, revisiting this each iteration and making it better as we contiuously develop and contact the customer interested in the software. In particular, one course from Lynda.com said a lot of that: we don't want to spend a lot of time planing everything upfront, we just want to have the minimum to get started and then improve this each iteration. Now, I've also seem a course from the same guy about database design, and there he says differently. He says that although when working with object orientation he likes the agile iterative approach, for database design we should really spend a lot of time planing things upfront instead of just going along the way with the minimum. But this confuses me a little. Indeed, the database will persist important data from our domain model and perhaps configurations of the software and so on. Now, if I'm going to continuously revist the analysis and design of the model, it seems the database design should change also. In the same way, if we plan all the database upfront it seems we are also planing all the model upfront, so the two ideas seems to be incompatible. I really like agile iterative approach, but I'm also looking at getting better design for the database also, so when working with agile iterative approach, how should we deal with the database design?

Read the article
How do you keep from running into the same problems over and over?

- by Stephen Furlani

I keep running into the same problems. The problem is irrelevant, but the fact that I keep running into is completely frustrating. The problem only happens once every, 3-6 months or so as I stub out a new iteration of the project. I keep a journal every time, but I spend at least a day or two each iteration trying to get the issue resolved. How do you guys keep from making the same mistakes over and over? I've tried a journal but it apparently doesn't work for me. [Edit] A few more details about the issue: Each time I make a new project to hold the files, I import a particular library. The library is a C++ library which imports glew.h and glx.h GLX redefines BOOL and that's not kosher since BOOL is a keyword for ObjC. I had a fix the last time I went through this. I #ifndef the header in the library to exclude GLEW and GLX and everything worked hunky-dory. This time, however, I do the same thing, use the same #ifndef block but now it throws a bunch of errors. I go back to the old project, and it works. New project no-worky. It seems like it does this every time, and my solution to it is new each time for some reason. I know #defines and #includes are one of the trickiest areas of C++ (and cross-language with Objective-C), but I had this working and now it's not.

Read the article
As an indie game dev, what processes are the best for soliciting feedback on my design/spec/idea? [closed]

- by Jess Telford

Background I have worked in a professional environment where the process usually goes like the following: Brain storm idea Solidify the game mechanics / design Iterate on design/idea to create a more solid experience Spec out the details of the design/idea Build it Step 3. is generally done with the stakeholders of the game (developers, designers, investors, publishers, etc) to reach an 'agreement' which meets the goals of all involved. Due to this process involving a series of often opposing and unique view points, creative solutions can surface through discussion / iteration. This is backed up by a process for collating the changes / new ideas, as well as structured time for discussion. As a (now) indie developer, I have to play the role of all the stakeholders (developers, designers, investors, publishers, etc), and often find myself too close to the idea / design to do more than minor changes, which I feel to be local maxima when it comes to the best result (I'm looking for the global maxima, of course). I have read that ideas / game designs / unique mechanics are merely multipliers of execution, and that keeping them secret is just silly. In sharing the idea with others outside the realm of my own thinking, I hope to replicate the influence other stakeholders have. I am struggling with the collation of changes / new ideas, and any kind of structured method of receiving feedback. My question: As an indie game developer, how and where can I share my ideas/designs to receive meaningful / constructive feedback? How can I successfully collate the feedback into a new iteration of the design? Are there any specialized websites, etc?

Read the article
Marking Discussions as Answered

As a contributor to a number of projects on CodePlex I really like the fact that the discussions feature exists but also I need ways to help me sort the discussions threads so I can make sure no-one is getting forgotten about. Seems like a lot of you agreed as the feature request Provide feature to allow Coordinators to mark Discussions threads as 'Answered' is our number 2 voted feature right now with 178 votes. Today we rolled out the first iteration of “answer” support to discussions. In this first iteration we wanted to keep it simple and lightweight. The original poster of the thread along with project owners, developers or editors can mark any post to the thread as an answer. You can have any number of answers marked in a thread and it’s very quick to mark or unmark a post as an answer. We deliberately keep the answers in the originally posted order so that you can see them in context with the discussion thread. When viewing discussions the default view is still to see everything, but you can easily filter by “Unanswered”. You can even save that as a bookmark so as someone interested in the project can quickly jump to the unanswered discussion threads to go help out on. As I mention, we kept this first pass of the answering feature as simple and as lightweight as possible so that we can get some feedback on it. Head on over to the issue tracking this feature if you have any thoughts once you have used it for a bit or feel free to respond in the comments. I already have a couple of things I think we want to do such as a refresh of the look and feel of discussions in general along, make it easier to navigate to posts that are marked an answered and surface posts that you do that were marked as answered in your profile page - but if you have ideas then please let us know.

Read the article
Day in the Life of Agile - The Forge Michigan November 27, 2012

- by csmith18119

Went to training at The Forge yesterday and did a Day in the life of Agile with Pillar. It was pretty good. Check them out at: http://pillartechnology.com/ Abstract: A single-day agile project simulation that is engaging, educational, provocative, and fun. This simulation introduces concepts like time-boxed iterations, User Stories, collective estimation, commitment to a product owner for iteration scope, formal verification ritual at iteration conclusion, tracking velocity, and making results big and visible through charts. The exercise is designed to simulate not only how agile teams and practices work, but the inevitable challenges that arise as teams attempt to adopt such practices. One of the best parts of this training was getting some hands on experience with agile. We used a program called Scratch to create an arcade video game. Our team chose Frogger. We had 3 iterations at 20 minutes each. I think we did pretty good but in the panic of trying to get a bunch done in only 20 minutes made it interesting. To check out our project, I uploaded it to my CodePlex site Download Source Code (Under Scratch/Frogger) Cool class! I highly recommend if you get the opportunity.

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
Python: Random is barely random at all?

- by orokusaki

I did this to test the randomness of randint: >>> from random import randint >>> >>> uniques = [] >>> for i in range(4500): # You can see I optimistic. ... x = randint(500, 5000) ... if x in uniques: ... raise Exception('We duped ' + str(x) + ' at iteration number ' + str(i)) ... uniques.append(x) ... Traceback (most recent call last): File "(stdin)", line 4, in (module) Exception: 'We duped 4061 at iteration number 67 I tried about 10 times more and the best result I got was 121 iterations before a repeater. Is this the best sort of result you can get from the standard library?

Read the article
Masked Edit extender inside a Repeater

- by Viswa

How can i dynamically create a textBox and a Masked Edit extender inside a Panel. My code is something like this: In the ASPX page: In the Aspx.cs page Private DataView Function1() { Dataview dv =new dataview(); return dv; } Private void ShowProducts_OntemDataBound(object sender, RepeaterEventItem e) { //Consider For the First Iteration of the Repeater I am Creating a Simple Text Box Dynamically Textbox txt = new textbox(); txt.Text = "8888888888"; txt.Id = "TextBox1"; //Consider For the Second Iteration of the Repeater I am Creating another TextBox and a Textbox txt1 = new textBox(); txt1.text="2223334444"; txt1.Id = "TextBox2"; MaskedEditExtender mskEdit = (MaskedEditExtender)e.Item.FindControl("MskEdit"); mskEdit.TargetControlId = txt1.Id; Panel panel1 = (Panel)e.item.Findcontrol("Panel1"); panel1.Controls.Add(txt1); } When running the above code it is giving me "Null Reference Exception for MaskedEditExtender".Please suggest me some way for this.

Read the article
I am working in JSF and Richfaces technology.

- by sankar

I need to club rich panel menu with iteration to fetch data from database. kindly help

Read the article
NHybernate, the Parallel Framework, and SQL Server

- by andy

hey guys, we have a loop that: 1.Loops over several thousand xml files. Altogether we're parsing millions of "user" nodes. 2.In each iteration we parse a "user" xml, do custom deserialization 3.finally, in each iteration, we send our object to nhibernate for saving. We use: .SaveOrUpdateAndFlush(user); This is a lengthy process, and we thought it would be a perfect candidate for testing out the .NET 4.0 Parallel libraries. So we wrapped the loop in a: Parallel.ForEach(); After doing this, we start getting "random" Timeout Exceptions from SQL Server, and finally, after leaving it running all night, OutOfMemory unhandled exceptions. I haven't done deep debugging on this yet, but what do you guys think. Is this simply a limitation of SQL Server, or could it be our NHibernate setup, or what? cheers andy

Read the article
How can I skip the current item and the next in a Python loop?

- by uberjumper

This might be a really dumb question, however I've looked around online, etc. And have not seen a solid answer. Is there a simple way to do something like this? lines = open('something.txt', 'r').readlines() for line in lines: if line == '!': # force iteration forward twice line.next().next() <etc> It's easy to do in C++; just increment the iterator an extra time. Is there an easy way to do that in Python? I would just like to point, out the main purpose of this question is not about "reading files and such" and skipping things. I was more looking for C++ iterator style iteration. Also the new title is kinda dumb, and i dont really think it reflects the nature of my question.

Read the article
NHibernate, the Parallel Framework, and SQL Server

- by andy

hey guys, we have a loop that: 1.Loops over several thousand xml files. Altogether we're parsing millions of "user" nodes. 2.In each iteration we parse a "user" xml, do custom deserialization 3.finally, in each iteration, we send our object to nhibernate for saving. We use: .SaveOrUpdateAndFlush(user); This is a lengthy process, and we thought it would be a perfect candidate for testing out the .NET 4.0 Parallel libraries. So we wrapped the loop in a: Parallel.ForEach(); After doing this, we start getting "random" Timeout Exceptions from SQL Server, and finally, after leaving it running all night, OutOfMemory unhandled exceptions. I haven't done deep debugging on this yet, but what do you guys think. Is this simply a limitation of SQL Server, or could it be our NHibernate setup, or what? cheers andy

Read the article
appending to cursor in oracle

- by Omnipresent

I asked a question yesterday which got answers but didnt answer the main point. I wanted to reduce amount of time it took to do a MINUS operation. Now, I'm thinking about doing MINUS operation in blocks of 5000, appending each iterations results to the cursor and finally returning the cursor. I have following: V_CNT NUMBER :=0; V_INTERVAL NUMBER := 5000; begin select count(1) into v_cnt from TABLE_1 while (v_cnt > 0) loop open cv_1 for SELECT A.HEAD,A.EFFECTIVE_DATE, FROM TABLE_1 A WHERE A.TYPE_OF_ACTION='6' AND A.EFFECTIVE_DATE >= ADD_MONTHS(SYSDATE,-15) AND A.ROWNUM <= V_INTERVAL MINUS SELECT B.head,B.EFFECTIVE_DATE, FROM TABLE_2 B AND B.ROWNUM <= V_INTERVAL V_CNT := V_CNT - V_INTERVAL; END LOOP; end; However, as you see...in each iteration the cursor is overwritten. How can I change the code so that in each iteration it appends to cv_1 cursor rather than overwriting?

Read the article
Runge-Kutta Method with adaptive step

- by infoholic_anonymous

I am implementing Runge-Kutta method with adaptive step in matlab. I get different results as compared to matlab's own ode45 and my own implementation of Runge-Kutta method with fixed step. What am I doing wrong in my code? Is it possible? function [ result ] = rk4_modh( f, int, init, h, h_min ) % % f - function handle % int - interval - pair (x_min, x_max) % init - initial conditions - pair (y1(0),y2(0)) % h_min - lower limit for h (step length) % h - initial step length % x - independent variable ( for example time ) % y - dependent variable - vertical vector - in our case ( y1, y2 ) function [ k1, k2, k3, k4, ka, y ] = iteration( f, h, x, y ) % core functionality performed within loop k1 = h * f(x,y); k2 = h * f(x+h/2, y+k1/2); k3 = h * f(x+h/2, y+k2/2); k4 = h * f(x+h, y+k3); ka = (k1 + 2*k2 + 2*k3 + k4)/6; y = y + ka; end % constants % relative error eW = 1e-10; % absolute error eB = 1e-10; s = 0.9; b = 5; % initialization i = 1; x = int(1); y = init; while true hy = y; hx = x; %algorithm [ k1, k2, k3, k4, ka, y ] = iteration( f, h, x, y ); % error estimation for j=1:2 [ hk1, hk2, hk3, hk4, hka, hy ] = iteration( f, h/2, hx, hy ); hx = hx + h/2; end err(:,i) = abs(hy - y); % step adjustment e = abs( hy ) * eW + eB; a = min( e ./ err(:,i) )^(0.2); mul = a * s; if mul >= 1 % step length admitted keepH(i) = h; k(:,:,i) = [ k1, k2, k3, k4, ka ]; previous(i,:) = [ x+h, y' ]; %' i = i + 1; if floor( x + h + eB ) == int(2) break; else h = min( [mul*h, b*h, int(2)-x] ); x = x + keepH(i-1); end else % step length requires further adjustments h = mul * h; if ( h < h_min ) error('Computation with given precision impossible'); end end end result = struct( 'val', previous, 'k', k, 'err', err, 'h', keepH ); end The function in question is: function [ res ] = fun( x, y ) % res(1) = y(2) + y(1) * ( 0.9 - y(1)^2 - y(2)^2 ); res(2) = -y(1) + y(2) * ( 0.9 - y(1)^2 - y(2)^2 ); res = res'; %' end The call is: res = rk4( @fun, [0,20], [0.001; 0.001], 0.008 ); The resulting plot for x1 : The result of ode45( @fun, [0, 20], [0.001, 0.001] ) is:

Read the article
What to do of exceptions when implementing java.lang.Iterator

- by Vincent Robert

The java.lang.Iterator interface has 3 methods: hasNext, next and remove. In order to implement a read-only iterator, you have to provide an implementation for 2 of those: hasNext and next. My problem is that these methods does not declare any exceptions. So if my code inside the iteration process declares exceptions, I must enclose my iteration code inside a try/catch block. My current policy has been to rethrow the exception enclosed in a RuntimeException. But this has issues because the checked exceptions are lost and the client code no longer can catch those exceptions explicitly. How can I work around this limitation in the Iterator class? Here is a sample code for clarity: class MyIterator implements Iterator { @Override public boolean hasNext() { try { return implementation.testForNext(); } catch ( SomethingBadException e ) { throw new RuntimeException(e); } } @Override public boolean next() { try { return implementation.getNext(); } catch ( SomethingBadException e ) { throw new RuntimeException(e); } } ... }

Read the article
Chain call in clojure?

- by Konrad Garus

I'm trying to implement sieve of Eratosthenes in Clojure. One approach I would like to test is this: Get range (2 3 4 5 6 ... N) For 2 <= i <= N Pass my range through filter that removes multiplies of i For i+1th iteration, use result of the previous filtering I know I could do it with loop/recur, but this is causing stack overflow errors (for some reason tail call optimization is not applied). How can I do it iteratively? I mean invoking N calls to the same routine, passing result of ith iteration to i+1th.

Read the article
How to commit inside a CURSOR Loop?

- by user320587

Hi, I am trying to see if its possible to perform Update within a cursor loop and this updated data gets reflected during the second iteration in the loop. DECLARE cur CURSOR FOR SELECT [Product], [Customer], [Date], [Event] FROM MyTable WHERE [Event] IS NULL OPEN cur FETCH NEXT INTO @Product, @Customer, @Date, @Event WHILE @@FETCH_STATUS = 0 BEGIN SELECT * FROM MyTable WHERE [Event] = 'No Event' AND [Date] < @DATE -- Now I update my Event value to 'No Event' for records whose date is less than @Date UPDATE MyTable SET [Event] = 'No Event' WHERE [Product] = @Product AND [Customer] = @Customer AND [Date] < @DATE FETCH NEXT INTO @Product, @Customer, @Date, @Event END CLOSE cur DEALLOCATE cur Assume when the sql executes the Event column is NULL for all records In the above sql, I am doing a select inside the cursor loop to query MyTable where Event value is 'No Event' but the query returns no value even though I am doing an update in the next line. So, I am thinking if it is even possible to update a table and the updated data get reflected in the next iteration of the cursor loop. Thanks for any help, Javid

Read the article
Is do-notation specific to "base:GHC.Base.Monad"?

- by yairchu

The idea that the standard Monad class is flawed and that it should actually extend Functor or Pointed is floating around. I'm not necessarily claiming that it is the right thing to do, but suppose that one was trying to do it: import Prelude hiding (Monad(..)) class Functor m => Monad m where return :: a -> m a join :: m (m a) -> m a join = (>>= id) (>>=) :: m a -> (a -> m b) -> m b a >>= t = join (fmap t a) (>>) :: m a -> m b -> m b a >> b = a >>= const b So far so good, but then when trying to use do-notation: whileM :: Monad m => m Bool -> m () whileM iteration = do done <- iteration if done then return () else whileM iteration The compiler complains: Could not deduce (base:GHC.Base.Monad m) from the context (Monad m) Question: Does do-notation work only for base:GHC.Base.Monad? Is there a way to make it work with an alternative Monad class? Extra context: What I really want to do is replace base:Control.Arrow.Arrow with a "generalized" Arrow class: {-# LANGUAGE TypeFamilies #-} class Category a => Arrow a where type Pair a :: * -> * -> * arr :: (b -> c) -> a b c first :: a b c -> a (Pair a b d) (Pair a c d) second :: a b c -> a (Pair a d b) (Pair a d c) (***) :: a b c -> a b' c' -> a (Pair a b b') (Pair a c c') (&&&) :: a b c -> a b c' -> a b (Pair a c c') And then use the Arrow's proc-notation with my Arrow class, but that fails like in the example above of do-notation and Monad. I'll use mostly Either as my pair type constructor and not the (,) type constructor as with the current Arrow class. This might allow to make the code of my toy RTS game (cabal install DefendTheKind) much prettier.

Read the article
How can I eliminate an element in a vector if a condition is met

- by michael

Hi, I have a vector of Rect: vector<Rect> myRecVec; I would like to remove the ones which are overlapping in the vector: So I have 2 nested loop like this: vector<Rect>::iterator iter1 = myRecVec.begin(); vector<Rect>::iterator iter2 = myRecVec.begin(); while( iter1 != myRecVec.end() ) { Rectangle r1 = *iter1; while( iter2 != myRecVec.end() ) { Rectangle r2 = *iter1; if (r1 != r2) { if (r1.intersects(r2)) { // remove r2 from myRectVec } } } } My question is how can I remove r2 from the myRectVect without screwing up both my iterators? Since I am iterating a vector and modifying the vector at the same time? I have thought about putting r2 in a temp rectVect and then remove them from the rectVect later (after the iteration). But how can I skip the ones in this temp rectVect during iteration?

Read the article
How to return arrays with the biggest elements in C#?

- by theateist

I have multiple int arrays: 1) [1 , 202,4 ,55] 2) [40, 7] 3) [2 , 48 ,5] 4) [40, 8 ,90] I need to get the array that has the biggest numbers in all positions. In my case that would be array #4. Explanation: arrays #2, #4 have the biggest number in 1st position, so after the first iteration these 2 arrays will be returned ([40, 7] and [40, 8 ,90]) now after comparing 2nd position of the returned array from previous iteration we will get array #4 because 8 7 and so on... Can you suggest an efficient algorithm for this? Doing with Linq will be preferable. UPDATE There is no limitation for the length, but as soon as some number in any position is greater, so this array is the biggest.

Read the article
XSLT Similar functionality to SQL "WHERE LIKE"

- by Fishcake

I have the following XML <Answer> <Label>FCVCMileage</Label> <Value>3258</Value> <Iteration>0</Iteration> <DataType>NUMBER</DataType> </Answer> And need to get the Value underneath the Mileage label. However the 4 letter prefix to Mileage could be any 1 of 8 different prefixes. I know I could find the value by testing for every combination using xsl:if or xsl:choose but is there a more elegant way that would work similar to the following SQL and also wouldn't need changes to code being made if other prefixes were added. WHERE label LIKE '%Mileage' NB. There will only ever be 1 label element containing the word Mileage Cheers!

Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >