Search Results

Search found 192 results on 8 pages for 'iterative'.

Page 6/8 | < Previous Page | 2 3 4 5 6 7 8  | Next Page >

  • Purple Cows, Copernicus, and Shampoo – Lessons in Customer Experience

    - by Christina McKeon
    What makes a great customer experience? And, why should you or your organization care? These are the questions that set the stage for the Oracle Customer Experience Summit, which kicked off yesterday in San Francisco. Day 1: The first day was filled with demos and insights from customer experience experts and Oracle customers sharing what it takes to deliver great customer experiences. Author Seth Godin delivered an entertaining presentation that included an in-depth exploration of the always-connected, always-sharing experience revolution that we are witnessing and yes, talked about the purple cow. It turns out that customer experience is your way to be the purple cow. Before everyone headed out to see Pearl Jam and Kings of Leon at the Oracle customer appreciation event, the day wrapped up with a discussion around building a customer-centric culture. Where do you start? Whom does it involve? What are some pitfalls to avoid? Day 2: The second day addressed the details behind all the questions brought up at the end of Day 1. Before you start on a customer experience initiative, Paul Hagen noted that you must understand you will forge a path similar to Copernicus. You will be proposing ideas and approaches that challenge current thinking in your organization. Just as Copernicus' heliocentric theory started a scientific revolution, your customer-centric efforts will start an experience revolution. If you think customer experience is like a traditional marketing approach, think again. It’s not about controlling your customers and leading them where you want them to go. It might sound like heresy to some, but your customers are already in control, whether or not your company realizes and acknowledges it. And, to survive and thrive, you'll have to focus on customers by thinking outside-in and working towards a brand that is better and more authentic. We learned how Vail Resorts takes this customer-centric approach. Employees must experience the mountain themselves and understand the experience from the guest’s standpoint. This has created a culture where employees do things for guests that are not expected. We also learned a valuable lesson in designing and innovating customer-centered experiences from Kerry Bodine. First you make the thing, and then you make the thing right. In this case, the thing is customer experience. Getting customer experience right means iterative prototyping and testing of your ideas. This is where shampoo comes in—think lather, rinse, repeat. Be prepared to keep repeating until the customer experience is right. Many of these sessions will be posted to YouTube in the coming weeks so be sure to subscribe to our CX channel.

    Read the article

  • The Use-Case Driven Approach to Change Management

    - by Lauren Clark
    In the third entry of the series on OUM and PMI’s Pulse of the Profession, we took a look at the continued importance of change management and risk management. The topic of change management and OUM’s use-case driven approach has come up in few recent conversations. So I thought I would jot down a few thoughts on how the use-case driven approach aids a project team in managing the project’s scope. The use-case model is one of several tools in OUM that is used to establish and manage the project's scope.  Because a use-case model can be understood by both business and IT project team members, it can serve as a bridge for ongoing collaboration as well as a visual diagram that encapsulates all agreed-upon functionality. This makes it a vital artifact in identifying changes to the project’s scope. Here are some of the primary benefits of using the use-case model as part of the effort for establishing and managing project scope: The use-case model quickly communicates scope in a straightforward manner. All project stakeholders can have a common foundation for the decisions regarding architecture and design and how they relate to the project's objectives. Once agreed upon, the model can be put under change control and any updates to the model can then be quickly identified as potentially affecting the project’s scope.  Changes requested or discovered later in the project can be analyzed objectively for their impact on project's budget, resources and schedule. A modular foundation for the design of the software solution can be established in Elaboration.  This permits work to be divided up effectively and executed in so that the most important and riskiest use-cases can be tackled early in the project. The use-case model helps the team make informed decisions about implementation priorities, which allows effective allocation of limited project resources.  This is very helpful in not only managing scope, but in doing iterative and incremental planning which relies heavily on the ability to identify project priorities. Bottom line is that the use-case model gives the project team solid understanding of scope early in the project.  Combine this understanding with effective project management and communication and you have an effective tool for reducing the risk of overruns in budget and/or time due to out of control scope changes. Now that you’ve had a chance to read these thoughts on the use-case model and project scope, please let me know your feedback based on your experience.

    Read the article

  • Innovation for Retailers

    - by David Dorf
    One of my main objectives for this blog is to point out emerging technologies and how they might apply to the retail industry.  But ideas are just the beginning; retailers either have to rely on vendors or have their own lab to explore these ideas and see which ones work.  (A healthy dose of both is probably the best solution.)  The Nordstrom Innovation Lab is a fine example of dedicating resources to cultivate ideas and test prototypes. The video below, from 2011, is a case study in which the team builds an iPad app that helps customers purchase sunglasses in the store.  Customers take pictures of themselves wearing different sunglasses, then can do side-by-side comparisons. There are a few interesting take-aways from their process.  First, they are working in the store alongside employees and customers.  There's no concept of documenting all the requirements then building the product.  Instead, they work closely with those that will be using the app in order to fully understand what's needed.  When they find an issue, they change the software onsite and try again.  This iterative prototyping ensures their product hits the mark.  Feels like Extreme Programming if you recall that movement. Second, they have time-boxed the project to one week.  Either it works or it doesn't, and either way they've only expended a week's worth of resources.  Innovation always entails failure, and those that succeed are often good at detecting failure quickly then adjusting.  Fail fast and fail often. Third, its not always about technology.  I was impressed they used paper designs to walk through user stories and help understand the needs of the customer.  Pen and paper is the innovator's most powerful tool. Our Retail Applied Research (RAR) team uses some of these concepts in our development process.  (Calling it a process is probably overkill.)  We try to give life to concepts quickly so the rest of organization can help us decide if we're heading the right direction.  It takes many failures before finding a successful product.

    Read the article

  • Are these non-standard applications of rendering practical in games?

    - by maul
    I've recently got into 3D and I came up with a few different "tricky" rendering techniques. Unfortunately I don't have the time to work on this myself, but I'd like to know if these are known methods and if they can be used in practice. Hybrid rendering Now I know that ray-tracing is still not fast enough for real-time rendering, at least on home computers. I also know that hybrid rendering (a combination of rasterization and ray-tracing) is a well known theory. However I had the following idea: one could separate a scene into "important" and "not important" objects. First you render the "not important" objects using traditional rasterization. In this pass you also render the "important" objects using a special shader that simply marks these parts on the image using a special color, or some stencil/depth buffer trickery. Then in the second pass you read back the results of the first pass and start ray tracing, but only from the pixels that were marked by the "important" object's shader. This would allow you to only ray-trace exactly what you need to. Could this be fast enough for real-time effects? Rendered physics I'm specifically talking about bullet physics - intersection of a very small object (point/bullet) that travels across a straight line with other, relatively slow-moving, fairly constant objects. More specifically: hit detection. My idea is that you could render the scene from the point of view of the gun (or the bullet). Every object in the scene would draw a different color. You only need to render a 1x1 pixel window - the center of the screen (again, from the gun's point of view). Then you simply check that central pixel and the color tells you what you hit. This is pixel-perfect hit detection based on the graphical representation of objects, which is not common in games. Afaik traditional OpenGL "picking" is a similar method. This could be extended in a few ways: For larger (non-bullet) objects you render a larger portion of the screen. If you put a special-colored plane in the middle of the scene (exactly where the bullet will be after the current frame) you get a method that works as the traditional slow-moving iterative physics test as well. You could simulate objects that the bullet can pass through (with decreased velocity) using alpha blending or some similar trick. So are these techniques in use anywhere, and/or are they practical at all?

    Read the article

  • Tiling Problem Solutions for Various Size "Dominoes"

    - by user67081
    I've got an interesting tiling problem, I have a large square image (size 128k so 131072 squares) with dimensons 256x512... I want to fill this image with certain grain types (a 1x1 tile, a 1x2 strip, a 2x1 strip, and 2x2 square) and have no overlap, no holes, and no extension past the image boundary. Given some probability for each of these grain types, a list of the number required to be placed is generated for each. Obviously an iterative/brute force method doesn't work well here if we just randomly place the pieces, instead a certain algorithm is required. 1) all 2x2 square grains are randomly placed until exhaustion. 2) 1x2 and 2x1 grains are randomly placed alternatively until exhaustion 3) the remaining 1x1 tiles are placed to fill in all holes. It turns out this algorithm works pretty well for some cases and has no problem filling the entire image, however as you might guess, increasing the probability (and thus number) of 1x2 and 2x1 grains eventually causes the placement to stall (since there are too many holes created by the strips and not all them can be placed). My approach to this solution has been as follows: 1) Create a mini-image of size 8x8 or 16x16. 2) Fill this image randomly and following the algorithm specified above so that the desired probability of the entire image is realized in the mini-image. 3) Create N of these mini-images and then randomly successively place them in the large image. Unfortunately there are some downfalls to this simplification. 1) given the small size of the mini-images, nailing an exact probability for the entire image is not possible. Example if I want p(2x1)=P(1x2)=0.4, the mini image may only give 0.41 as the closes probability. 2) The mini-images create a pseudo boundary where no overlaps occur which isn't really descriptive of the model this is being used for. 3) There is only a fixed number of mini-images so i'm not sure how random this really is. I'm really just looking to brainstorm about possible solutions to this. My main concern is really to nail down closer probabilities, now one might suggest I just increase the mini-image size. Well I have, and it turns out that in certain cases(p(1x2)=p(2x1)=0.5) the mini-image 16x16 isn't even iteratively solvable.. So it's pretty obvious how difficult it is to randomly solve this for anything greater than 8x8 sizes.. So I'd love to hear some ideas. Thanks

    Read the article

  • Distributing processing for an application that wasn't designed with that in mind

    - by Tim
    We've got the application at work that just sits and does a whole bunch of iterative processing on some data files to perform some simulations. This is done by an "old" Win32 application that isn't multi-processor aware, so new(ish) computers and workstations are mostly sitting idle running this application. However, since it's installed by a typical Windows Install Shield installer, I can't seem to install and run multiple copies of the application. The work can be split up manually before processing, enabling the work to be distributed across multiple machines, but we still can't take advantage of multiple core CPUs. The results can be joined back together after processing to make a complete simulation. Is there a product out there that would let me "compartmentalize" an installation (or 4) so I can take advantage of a multi-core CPU? I had thought of using MS Softgrid, but I believe that still depends on a remote server to do the heavy lifting (though please correct me if I'm wrong). Furthermore, is there a way I can distribute the workload off the one machine? So an input could be split into 50 chunks, handed out to 50 machines, and worked on? All without really changing the initial application? In a perfect world, I'd get the application to take advantage of a DesktopGrid (BOINC), but like most "mission critical corporate applications", the need is there, but the money is not. Thank you in advance (and sorry if this isn't appropriate for serverfault).

    Read the article

  • What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

    - by Alex Dunlop
    Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

    Read the article

  • How John Got 15x Improvement Without Really Trying

    - by rchrd
    The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here.  How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

    Read the article

  • Developing an Implementation Plan with Iterations by Russ Pitts

    - by user535886
    Developing an Implementation Plan with Iterations by Russ Pitts  Ok, so you have come to grips with understanding that applying the iterative concept, as defined by OUM is simply breaking up the project effort you have estimated for each phase into one or more six week calendar duration blocks of work. Idea being the business user(s) or key recipient(s) of work product(s) being developed never go longer than six weeks without having some sort of review or prototyping of the work results for an iteration…”think-a-little”, “do-a-little”, and “show-a-little” in a six week or less timeframe…ideally the business user(s) or key recipients(s) are involved throughout. You also understand the OUM concept that you only plan for that which you have knowledge of. The concept further defined, a project plan initially is developed at a high-level, and becomes more detailed as project knowledge grows. Agreeing to this concept means you also have to admit to the fallacy that one can plan with precision beyond six weeks into a project…Anything beyond six weeks is a best guess in most cases when dealing with software implementation projects. Project planning, as defined by OUM begins with the Implementation Plan view, which is a very high-level perspective of the effort estimated for each of the five OUM phases, as well as the number of iterations within each phase. You might wonder how can you predict the number of iterations for each phase at this early point in the project. Remember project planning is not an exact science, and initially is high-level and abstract in nature, and then becomes more detailed and precise as the project proceeds. So where do you start in defining iterations for each phase for a project? The following are three easy steps to initially define the number of iterations for each phase: Step 1 => Start with identifying the known factors… …Prior to starting a project you should know: · The agreed upon time-period for an iteration (e.g 6 weeks, or 4 weeks, or…) within a phase (recommend keeping iteration time-period consistent within a phase, if not for the entire project) · The number of resources available for the project · The number of total number of man-day (effort) you have estimated for each of the five OUM phases of the project · The number of work days for a week Step 2 => Calculate the man-days of effort required for an iteration within a phase… Lets assume for the sake of this example there are 10 project resources, and you have estimated 2,536 man-days of work effort which will need to occur for the elaboration phase of the project. Let’s also assume a week for this project is defined as 5 business days, and that each iteration in the elaboration phase will last a calendar duration of 6 weeks. A simple calculation is performed to calculate the daily burn rate for a single iteration, which produces a result of… ((Number of resources * days per week) * duration of iteration) = Number of days required per iteration ((10 resources * 5 days/week) * 6 weeks) = 300 man days of effort required per iteration Step 3 => Calculate the number of iterations that can occur within a phase Next calculate the number of iterations that can occur for the amount of man-days of effort estimated for the phase being considered… (number of man-days of effort estimated / number of man-days required per iteration) = # of iterations for phase (2,536 man-days of estimated effort for phase / 300 man days of effort required per iteration) = 8.45 iterations, which should be rounded to a whole number such as 9 iterations* *Note - It is important to note this is an approximate calculation, not an exact science. This particular example is a simple one, which assumes all resources are utilized throughout the phase, including tech resources, etc. (rounding down or up to a whole number based on project factor considerations). It is also best in many cases to round up to higher number, as this provides some calendar scheduling contingency.

    Read the article

  • How can I thoroughly evaluate a prospective employer?

    - by glenviewjeff
    We hear much about code smells, test smells, and even project smells, but I have heard no discussion about employer "smells" outside of the Joel Test. After much frustration working for employers with a bouquet of unpleasant corporate-culture odors, I believe it's time for me to actively seek a more mature development environment. I've started assembling a list of questions to help vet employers by identifying issues during a job interview, and am looking for additional ideas. I suppose this list could easily be modified by an employer to vet an employee as well, but please answer from the interviewee's perspective. I think it would be important to ask many of these questions of multiple people to find out if consistent answers are given. For the most part, I tried to put the questions in each section in the order they could be asked. An undesired answer to an early question will often make follow-ups moot. Values What constitutes "well-written" software? What attributes does a good developer have? Same question for manager. Process Do you have a development process? How rigorously do you follow it? How do you decide how much process to apply to each project? Describe a typical project lifecycle. Ask the following if they don't come up otherwise: Waterfall/iterative: How much time is spent in upfront requirements gathering? upfront design? Testing Who develops tests (developers or separate test engineers?) When are they developed? When are the tests executed? How long do they take to execute? What makes a good test? How do you know you've tested enough? What percentage of code is tested? Review What is the review process like? What percentage of code is reviewed? Design? How frequently can I expect to participate as code/design reviewer/reviewee? What are the criteria applied to review and where do the criteria come from? Improvement What new tools and techniques have you evaluated or deployed in the past year? What training courses have your employees been given in the past year? What will I be doing for the first six months in your company (hinting at what kind of organized mentorship/training has been thought through, if any) What changes to your development process have been made in the past year? How do you improve and learn from your mistakes as an organization? What was your organizations biggest mistake in the past year, and how was it addressed? What feedback have you given to management lately? Was it implemented? If not, why? How does your company use "best practices?" How do you seek them out from the outside or within, and how do you share them with each other? Ethics Tell me about an ethical problem you or your employees experienced recently and how was it resolved? Do you use open-source software? What open-source contributions have you made? Follow-Ups I liked what @jim-leonardo said on this Stack Overflow question: Really a thing to ask yourself: "Does this person seem like they are trying to recruit me and make me interested?" I think this is one of the most important bits. If they seem to be taking the attitude that the only one being interviewed is you, then they probably will treat you poorly. Good interviewers understand they have to sell the position as much as the candidate needs to sell themselves. @SethP added: Glassdoor.com is a good web site for researching potential employers. It contains information about how specific companies conduct interviews...

    Read the article

  • How to Build Services from Legacy Applications

    - by Chris Falter
    The SOA consultants invaded the executive suite at your company or agency, preached the true religion, and converted the unbelievers. Now by divine imperative you must convert your legacy applications into a suite of reusable services.  But as usual, you lack the time and resources that you need in order to develop the services properly.  So you googled or bing’ed, found this blog post, and began crying in gratitude.  Yes, as the title implies, I am going to reveal my easy, 3-step, works-every-time process for converting silos of legacy applications into the inventory of services your CIO has been dreaming about.  So just close your eyes and count to 3 … now open them … and here it is…. Not. While wishful thinking is too often the coin of the IT realm, even the most naive practitioner knows that converting legacy applications into reusable services requires more than a magic wand.  The reason is simple: if your starting point is your legacy applications, then you will simply be bolting a web service technology layer on top of your legacy API.  And that legacy API is built in the image of the silo applications.  Enter the wide gate of the legacy API, follow the broad path of generating service interfaces from existing code, and you will arrive at the siloed enterprise destruction that you thought you were escaping. The Straight and Narrow Path This past week I had the opportunity to learn how the FBI Criminal Justice Information Systems department has been transitioning from silo applications to a service inventory.  Lafe Hutcheson, IT Specialist in the architecture group and fellow attendee at an SOA Architect Certification Workshop, was my guide.  Lafe has survived the chaos of an SOA initiative, so it is not surprising that he was able to return from a US Army deployment to Kabul, Afghanistan with nary a scratch.  According to Lafe, building their service inventory is a three-phase process: Model a business process.  This requires intense collaboration between the IT and business wings of the organization, of course.  The FBI uses IBM Websphere tools to model the process with BPMN. Identify candidate services to facilitate the business process. Convert the BPMN to an executable BPEL orchestration, model and develop the services, and use a BPEL engine to run the process.  The FBI uses ActiveVOS for orchestration services. The 12 Step Program to End Your Legacy API Addiction Thomas Erl has documented a process for building a web service inventory that is quite similar to the FBI process. Erl’s process adds a technology architecture definition phase, which allows for the technology environment to influence the inventory blueprint.  For example, if you are using an enterprise service bus, you will probably not need to build your own utility services for logging or intermediate routing.  Erl also lists a service-oriented analysis phase that highlights the 12-step process of applying the principles of service orientation to modeling your services.  Erl depicts the modeling of a service inventory as an iterative process: model a business process, define the relevant technology architecture, define the service inventory blueprint, analyze the services, then model another business process, rinse and repeat.  (Astute readers will note that Erl’s diagram, restricted to analysis and modeling process, does not include the implementation phase that concludes the FBI service development methodology.) The service-oriented analysis phase is where you find the 12 steps that will free you from your legacy API addiction. In a nutshell, you identify the steps in the process that need services; identify the different types of services (agnostic entity services, service compositions, and utility services) that are required; apply service-orientation principles; and normalize the inventory into cohesive service models. Rather than discuss each of the 12 steps individually, I will close by simply referring my readers to Erl’s explanation.

    Read the article

  • Open source adventures with... wait for it... Microsoft

    - by Jeff
    Last week, Microsoft announced that it was going to open source the rest of the ASP.NET MVC Web stack. The core MVC framework has been open source for a long time now, but the other pieces around it are also now out in the wild. Not only that, but it's not what I call "big bang" open source, where you release the source with each version. No, they're actually committing in real time to a public repository. They're also taking contributions where it makes sense. If that weren't exciting enough, CodePlex, which used to be a part of the team I was on, has been re-org'd to a different part of the company where it is getting the love and attention (and apparently money) that it deserves. For a period of several months, I lobbied to get a PM gig with that product, but got nowhere. A year and a half later, I'm happy to see it finally treated right. In any case, I found a bug in Razor, the rendering engine, before the beta came out. I informally sent the bug info to some people, but it wasn't fixed for the beta. Now, with the project being developed in the open, I was able to submit the issue, and went back and forth with the developer who wrote the code (I met him once at a meet up in Bellevue, I think), and he committed a fix. I tried it a day later, and the bug was gone. There's a lot to learn from all of this. That open source software is surprisingly efficient and often of high quality is one part of it. For me the win is that it demonstrates how open and collaborative processes, as light as possible, lead to better software. In other words, even if this were a project being developed internally, at a bank or something, getting stakeholders involved early and giving people the ability to respond leads to awesomeness. While there is always a place for big thinking, experience has shown time and time again that trying to figure everything out up front takes too long, and rarely meets expectations. This is a lesson that probably half of Microsoft has yet to learn, including the team I was on before I split. It's the reason that team still hasn't shipped anything to general availability. But I've seen what an open and iterative development style can do for teams, at Microsoft and other places that I've worked. When you can have a conversation with people, and take ideas and turn them into code quickly, you're winning. So why don't people like winning? I think there are a lot of reasons, and they can generally be categorized into fear, skepticism and bad experiences. I can't give the Web stack teams enough credit. Not only did they dream big, but they changed a culture that often seems immovable and hopelessly stuck. This is a very public example of this culture change, but it's starting to happen at every scale in Microsoft. It's really interesting to see in a company that has been written off as dead the last decade.

    Read the article

  • Valuing "Working Software over Comprehensive Documentation"

    - by tom.spitz
    Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;} I subscribe to the tenets put forth in the Manifesto for Agile Software Development - http://agilemanifesto.org. As Oracle's chief methodologist, that might seem a self-deprecating attitude. After all, the agile manifesto tells us that we should value "individuals and interactions" over "processes and tools." My job includes process development. I also subscribe to ideas put forth in a number of subsequent works including Balancing Agility and Discipline: A Guide for the Perplexed (Boehm/Turner, Addison-Wesley) and Agile Project Management: Creating Innovative Products (Highsmith, Addison-Wesley). Both of these books talk about finding the right balance between "agility and discipline" or between a "predictive and adaptive" project approach. So there still seems to be a place for us in creating the Oracle Unified Method (OUM) to become the "single method framework that supports the successful implementation of every Oracle product." After all, the real idea is to apply just enough ceremony and produce just enough documentation to suit the needs of the particular project that supports an enterprise in moving toward its desired future state. The thing I've been struggling with - and the thing I'd like to hear from you about right now - is the prevalence of an ongoing obsession with "documents." OUM provides a comprehensive set of guidance for an iterative and incremental approach to engineering and implementing software systems. Our intent is first to support the information technology system implementation and, as necessary, support the creation of documentation. OUM, therefore, includes a supporting set of document templates. Our guidance is to employ those templates, sparingly, as needed; not create piles of documentation that you're not gonna (sic) need. In other words, don't serve the method, make the method serve you. Yet, there seems to be a "gimme" mentality in some circles that if you give me a sample document - or better yet - a repository of samples - then I will be able to do anything cheaply and quickly. The notion is certainly appealing AND reuse can save time. Plus, documents are a lowest common denominator way of packaging reusable stuff. However, without sustained investment and management I've seen "reuse repositories" turn quickly into garbage heaps. So, I remain a skeptic. I agree that providing document examples that promote consistency is helpful. However, there may be too much emphasis on the documents themselves and not enough on creating a system that meets the evolving needs of the business. How can we shift the emphasis toward working software and away from our dependency on documents - especially on large, complex implementation projects - while still supporting the need for documentation? I'd like to hear your thoughts.

    Read the article

  • When will EBS 12.2 be released?

    - by Steven Chan (Oracle Development)
    The most frequently asked question at OpenWorld this year was, "When will EBS 12.2 be released?" Sadly, Oracle's communication policies prohibit us from speculating about release dates for unreleased software. We are not permitted to give estimates, rough timelines, guesses, or anything else that remotely resembles specific guidance on release dates. You can monitor My Oracle Support and this blog for updates on EBS 12.2.  I'll post them here as soon as they're available.  I'm embedding an old favourite from 2007 in its entirety here, since it applies equally to new releases as well as certifications. "Loose Lips Sink Ships" (March 20, 2007)If I were to sort emails in my inbox into groups, the biggest -- by far -- would be the one for emails that start with, "When will _____ be certified with the E-Business Suite?"  I answer these dutifully but know that my replies can sometimes be maddening, for two reasons:  technical uncertainty, and Oracle's rules for such communications. On the Spiral Model of CertificationsTechnology stack certifications tend to be highly iterative in nature.  As a result, statements about certification dates tend to be accurate only when made in hindsight.  Laypeople are horrified to hear this, but it's the ugly truth.  Uncertainty is simply inherent to the process.  I've become inured to it over the years, but it might come as a surprise to you that it can take many cycles to get fully-released software to work together.  Take this scenario: We test a particular combination of Component A and B. If we encounter a problem, say, with Component A, we log a bug. We receive a new version of Component A. The process iterates again. The reality is this: until a certification is completed and released, there's no accurate way of telling how many iterations are yet to come.  This is true regardless of the number of iterations that have already been completed.  Our Lips Are SealedGenerally, people understand that things are subject to change, so the second reason I can't say anything specific is actually much more important than the first.  "Loose lips might sink ships" was coined in World War II in an effort to remind people that careless talk can have serious consequences.  Curiously, this applies to Oracle's communications about upcoming features, configurations, and releases, too.  As a publicly traded company, we have very strict policies that prohibit us from linking specific releases to specific dates.  If you've ever listened to an earnings call with analysts, you'll often hear them asking, "Can you add a little more color to that statement?"  For certifications, color is usually the only thing that I have.  Sometimes I can provide a bit more information about the technical nature of the certification in question, such as expected footprints or version levels.  I can occasionally share technical issues that we've found, too, to convey the degree of risk or complexity involved in the certification.  Aside from that, there's little additional information about specific dates, date ranges, or even speculation about dates that I can provide... that is, without having one of those uncomfortable conversations with Oracle Legal.  So, as much as it pains me to do so, when it comes to dates, I'm always forced to conclude with a generic reply that blandly states one of the following: We're working on that certification right now That certification is in the pipeline but hasn't been started yet We don't have plans for that certification Don't Shoot the MessengerThankfully, I've developed a thick skin over the years -- which is a good thing, considering the colorful and energetic responses I've received over the years after answering these questions.  However, on behalf of my Oracle colleagues who are faced with these questions every day in the field, I urge you to remember that they're required to follow these same corporate rules about date disclosures.  It never hurts to ask, but don't be too disappointed if we can't provide you with a detailed answer.  The Go-Go's had it right, after all.  Related Articles Webcast Replay Available: Technical Preview of EBS 12.2 Online Patching

    Read the article

  • Solving Diophantine Equations Using Python

    - by HARSHITH
    In mathematics, a Diophantine equation (named for Diophantus of Alexandria, a third century Greek mathematician) is a polynomial equation where the variables can only take on integer values. Although you may not realize it, you have seen Diophantine equations before: one of the most famous Diophantine equations is: We are not certain that McDonald's knows about Diophantine equations (actually we doubt that they do), but they use them! McDonald's sells Chicken McNuggets in packages of 6, 9 or 20 McNuggets. Thus, it is possible, for example, to buy exactly 15 McNuggets (with one package of 6 and a second package of 9), but it is not possible to buy exactly 16 nuggets, since no non- negative integer combination of 6's, 9's and 20's adds up to 16. To determine if it is possible to buy exactly n McNuggets, one has to solve a Diophantine equation: find non-negative integer values of a, b, and c, such that 6a + 9b + 20c = n. Write an iterative program that finds the largest number of McNuggets that cannot be bought in exact quantity. Your program should print the answer in the following format (where the correct number is provided in place of n): "Largest number of McNuggets that cannot be bought in exact quantity: n"

    Read the article

  • Improving long-polling Ajax performance

    - by Bears will eat you
    I'm writing a webapp (Firefox-compatible only) which uses long polling (via jQuery's ajax abilities) to send more-or-less constant updates from the server to the client. I'm concerned about the effects of leaving this running for long periods of time, say, all day or overnight. The basic code skeleton is this: function processResults(xml) { // do stuff with the xml from the server } function fetch() { setTimeout(function () { $.ajax({ type: 'GET', url: 'foo/bar/baz', dataType: 'xml', success: function (xml) { processResults(xml); fetch(); }, error: function (xhr, type, exception) { if (xhr.status === 0) { console.log('XMLHttpRequest cancelled'); } else { console.debug(xhr); fetch(); } } }); }, 500); } (The half-second "sleep" is so that the client doesn't hammer the server if the updates are coming back to the client quickly - which they usually are.) After leaving this running overnight, it tends to make Firefox crawl. I'd been thinking that this could be partially caused by a large stack depth since I've basically written an infinitely recursive function. However, if I use Firebug and throw a breakpoint into fetch, it looks like this is not the case. The stack that Firebug shows me is only about 4 or 5 frames deep, even after an hour. One of the solutions I'm considering is changing my recursive function to an iterative one, but I can't figure out how I would insert the delay in between Ajax requests without spinning. I've looked at the JS 1.7 "yield" keyword but I can't quite wrap my head around it, to figure out if it's what I need here. Is the best solution just to do a hard refresh on the page periodically, say, once every hour? Is there a better/leaner long-polling design pattern that won't put a hurt on the browser even after running for 8 or 12 hours? Or should I just skip the long polling altogether and use a different "constant update" pattern since I usually know how frequently the server will have a response for me?

    Read the article

  • Python progression path - From apprentice to guru

    - by Morlock
    Hi all, I've been learning, working, and playing with Python for a year and a half now. As a biologist slowly making the turn to bio-informatics, this language has been a the very core of all the major contributions I have made in the lab. (bash and R scripts have helped some too. My C++ capabilities are very not functional yet). I more or less fell in love with the way Python permits me to express beautiful solutions and also with the semantics of the language that allows such a natural flow from thoughts to workable code. What I would like to know from you is your answer to a kind of question I have seldom seen in this or other forums. Let me sum up what I do NOT want to ask first ;) I don't want to know how to QUICKLY learn Python Nor do I want to find out the best way to get acquainted with the language Finally, I don't want to know a 'one trick that does it all' approach. What I do want to know your opinion about, is: What are the steps YOU would recommend to a Python journeyman, from apprenticeship to guru status (feel free to stop wherever your expertise dictates it), in order that one IMPROVES CONSTANTLY, becoming a better and better Python coder, one step at a time. The kind of answers I would enjoy (but feel free to surprise the readership :P ), is formatted more or less like this: Read this (eg: python tutorial), pay attention to that kind of details Code for so manytime/problems/lines of code Then, read this (eg: this or that book), but this time, pay attention to this Tackle a few real-life problems Then, proceed to reading Y. Be sure to grasp these concepts Code for X time Come back to such and such basics or move further to... (you get the point :) This process depicts an iterative Learn/Code cycle, and I really care about knowing your opinion on what exactly one should pay attention to, at various stages, in order to progress CONSTANTLY (with due efforts, of course). If you come from a specific field of expertise, discuss the path you see as appropriate in this field. Thanks a lot for sharing your opinions and good Python coding!

    Read the article

  • Disco/MapReduce: Using results of previous iteration as input to new iteration

    - by muckabout
    Currently am implementing PageRank on Disco. As an iterative algorithm, the results of one iteration are used as input to the next iteration. I have a large file which represents all the links, with each row representing a page and the values in the row representing the pages to which it links. For Disco, I break this file into N chunks, then run MapReduce for one round. As a result, I get a set of (page, rank) tuples. I'd like to feed this rank to the next iteration. However, now my mapper needs two inputs: the graph file, and the pageranks. I would like to "zip" together the graph file and the page ranks, such that each line represents a page, it's rank, and it's out links. Since this graph file is separated into N chunks, I need to split the pagerank vector into N parallel chunks, and zip the regions of the pagerank vectors to the graph chunks This all seems more complicated than necessary, and as a pretty straightforward operation (with the quintessential mapreduce algorithm), it seems I'm missing something about Disco that could really simplify the approach. Any thoughts?

    Read the article

  • Project Euler Question 14 (Collatz Problem)

    - by paradox
    The following iterative sequence is defined for the set of positive integers: n -n/2 (n is even) n -3n + 1 (n is odd) Using the rule above and starting with 13, we generate the following sequence: 13 40 20 10 5 16 8 4 2 1 It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1. Which starting number, under one million, produces the longest chain? NOTE: Once the chain starts the terms are allowed to go above one million. I tried coding a solution to this in C using the bruteforce method. However, it seems that my program stalls when trying to calculate 113383. Please advise :) #include <stdio.h> #define LIMIT 1000000 int iteration(int value) { if(value%2==0) return (value/2); else return (3*value+1); } int count_iterations(int value) { int count=1; //printf("%d\n", value); while(value!=1) { value=iteration(value); //printf("%d\n", value); count++; } return count; } int main() { int iteration_count=0, max=0; int i,count; for (i=1; i<LIMIT; i++) { printf("Current iteration : %d\n", i); iteration_count=count_iterations(i); if (iteration_count>max) { max=iteration_count; count=i; } } //iteration_count=count_iterations(113383); printf("Count = %d\ni = %d\n",max,count); }

    Read the article

  • java BufferedReader specific length returns NUL characters

    - by Bastien
    I have a TCP socket client receiving messages (data) from a server. messages are of the type length (2 bytes) + data (length bytes), delimited by STX & ETX characters. I'm using a bufferedReader to retrieve the two first bytes, decode the length, then read again from the same bufferedReader the appropriate length and put the result in a char array. most of the time, I have no problem, but SOMETIMES (1 out of thousands of messages received), when attempting to read (length) bytes from the reader, I get only part of it, the rest of my array being filled with "NUL" characters. I imagine it's because the buffer has not yet been filled. char[] bufLen = new char[2]; _bufferedReader.read(bufLen); int len = decodeLength(bufLen); char[] _rawMsg = new char[len]; _bufferedReader.read(_rawMsg); return _rawMsg; I solved the problem in several iterative ways: first I tested the last char of my array: if it wasn't ETX I would read chars from the bufferedReader one by one until I would reach ETX, then start over my regular routine. the consequence is that I would basically DROP one message. then, in order to still retrieve that message, I would find the first occurence of the NUL char in my "truncated" message, read & store additional characters one at a time until I reached ETX, and append them to my "truncated" messages, confirming length is ok. it works also, but I'm really thinking there's something I could do better, like checking if the total number of characters I need are available in the buffer before reading it, but can't find the right way to do it... any idea / pointer ? thanks !

    Read the article

  • How to perform spatial partitioning in n-dimensions?

    - by kevin42
    I'm trying to design an implementation of Vector Quantization as a c++ template class that can handle different types and dimensions of vectors (e.g. 16 dimension vectors of bytes, or 4d vectors of doubles, etc). I've been reading up on the algorithms, and I understand most of it: here and here I want to implement the Linde-Buzo-Gray (LBG) Algorithm, but I'm having difficulty figuring out the general algorithm for partitioning the clusters. I think I need to define a plane (hyperplane?) that splits the vectors in a cluster so there is an equal number on each side of the plane. [edit to add more info] This is an iterative process, but I think I start by finding the centroid of all the vectors, then use that centroid to define the splitting plane, get the centroid of each of the sides of the plane, continuing until I have the number of clusters needed for the VQ algorithm (iterating to optimize for less distortion along the way). The animation in the first link above shows it nicely. My questions are: What is an algorithm to find the plane once I have the centroid? How can I test a vector to see if it is on either side of that plane?

    Read the article

  • Windows Batch file to echo a specific line number

    - by Lee
    So for the second part of my current dilemma, I have a list of folders in c:\file_list.txt. I need to be able to extract them (well, echo them with some mods) based on the line number because this batch script is being called by an iterative macro process. I'm passing the line number as a parameter. @echo off setlocal enabledelayedexpansion set /a counter=0 set /a %%a = "" for /f "usebackq delims=" %%a in (c:\file_list.txt) do ( if "!counter!"=="%1" goto :printme & set /a counter+=1 ) :printme echo %%a which gives me an output of %a. Doh! So, I've tried echoing !a! (result: ECHO is off.); I've tried echoing %a (result: a) I figured the easy thing to do would be to modify the head.bat code found here: http://stackoverflow.com/questions/130116/dos-batch-commands-to-read-first-line-from-text-file except rather than echoing every line - I'd just echo the last line found. Not as simple as one might think. I've noticed that my counter is staying at zero for some reason; I'm wondering if the set /a counter+=1 is doing what I think it's doing.

    Read the article

  • converting a Tree to newick format. java

    - by Esmond
    I'm having problems converting a binary rooted tree to newick format. The full explanation for such a format can be found: http://code.google.com/p/mrsrf/wiki/NewickTree An example of a newick format would be as follows: for a tree T such as http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic8/images/completetreetwo.gif the newick representation would be: (((8,9),(10,11)),((12,13),(14,15))) the internal node will become the commas while the leaves will be retained. such trees have internal nodes which will always have 2 children. I have a problem using recursion to come out with this newick format. The output contains far too many nodes and braces. Any comments to resolve this problem is appreciated or even an iterative algorithm would be welcomed import java.util.Stack; public class Tree { .... public String inOrderNewick(Node root, String output) throws ItemNotFoundException { if (root.hasChild()) { output += "("; output += inOrderNewick(root.child1, output); output += ","; output += inOrderNewick(root.child2, output); output += ")"; return output; } else { output += root.getSeq(); return output; } } }

    Read the article

  • dotNet Templated, Repeating, Databound ServerControl: Modifying underlying ServerControl data per te

    - by Campbeln
    I have a server control that wraps an underlying class which manages a number of indexes to track where it is in a dataset (ie: RenderedRecordCount, ErroredRecordCount, NewRecordCount, etc.). I've got the server control rendering great, but OnDataBinding I'm having an issue as to seems to happen after CreateChildControls and before Render (both of which properly manage the iteration of the underlying indexes). While I'm somewhat familiar with the ASP.NET page lifecycle, this one seems to be beyond me at the moment. So... How do I hook into the iterative process OnDataBinding uses so I can manage the underlying indexes? Will I have to iterate over the ITemplates myself, managing the indexes as I go or is there an easier solution? [edit: Agh... writing the problem out is very cathartic... I'm thinking this is exactly what I will need to do...] Also... I implemented the iteration of the underlying indexes during CreateChildControls originally in the belief that was the proper place to hook in for events like OnDataBinding (thinking it was done as the controls were being .Add'd). Now it seems that this may actually be unnecessary. So I guess the secondary question is: What happens during CreateChildControls? Are the unadulterated (read: with various <%-tags in place) controls added to the .Controls collection without any other processing?

    Read the article

  • Performance question: Inverting an array of pointers in-place vs array of values

    - by Anders
    The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence. The way I'm doing it now, using a matrix library (MTL4): For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option. Using an array of pointers instead: For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations. So the performance question boils down to this, as I see it: - The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. - The array of pointers does not need the extra memory for matrix A containing values. - The array of pointers option does not have to copy all NxN values of A between each iteration. - The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

    Read the article

< Previous Page | 2 3 4 5 6 7 8  | Next Page >