flemish bee cycle - Page 64

What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article

Red Gate Coder interviews: Alex Davies

- by Michael Williamson

Alex Davies has been a software engineer at Red Gate since graduating from university, and is currently busy working on .NET Demon. We talked about tackling parallel programming with his actors framework, a scientific approach to debugging, and how JavaScript is going to affect the programming languages we use in years to come. So, if we start at the start, how did you get started in programming? When I was seven or eight, I was given a BBC Micro for Christmas. I had asked for a Game Boy, but my dad thought it would be better to give me a proper computer. For a year or so, I only played games on it, but then I found the user guide for writing programs in it. I gradually started doing more stuff on it and found it fun. I liked creating. As I went into senior school I continued to write stuff on there, trying to write games that weren’t very good. I got a real computer when I was fourteen and found ways to write BASIC on it. Visual Basic to start with, and then something more interesting than that. How did you learn to program? Was there someone helping you out? Absolutely not! I learnt out of a book, or by experimenting. I remember the first time I found a loop, I was like “Oh my God! I don’t have to write out the same line over and over and over again any more. It’s amazing!” When did you think this might be something that you actually wanted to do as a career? For a long time, I thought it wasn’t something that you would do as a career, because it was too much fun to be a career. I thought I’d do chemistry at university and some kind of career based on chemical engineering. And then I went to a careers fair at school when I was seventeen or eighteen, and it just didn’t interest me whatsoever. I thought “I could be a programmer, and there’s loads of money there, and I’m good at it, and it’s fun”, but also that I shouldn’t spoil my hobby. Now I don’t really program in my spare time any more, which is a bit of a shame, but I program all the rest of the time, so I can live with it. Do you think you learnt much about programming at university? Yes, definitely! I went into university knowing how to make computers do anything I wanted them to do. However, I didn’t have the language to talk about algorithms, so the algorithms course in my first year was massively important. Learning other language paradigms like functional programming was really good for breadth of understanding. Functional programming influences normal programming through design rather than actually using it all the time. I draw inspiration from it to write imperative programs which I think is actually becoming really fashionable now, but I’ve been doing it for ages. I did it first! There were also some courses on really odd programming languages, a bit of Prolog, a little bit of C. Having a little bit of each of those is something that I would have never done on my own, so it was important. And then there are knowledge-based courses which are about not programming itself but things that have been programmed like TCP. Those are really important for examples for how to approach things. Did you do any internships while you were at university? Yeah, I spent both of my summers at the same company. I thought I could code well before I went there. Looking back at the crap that I produced, it was only surpassed in its crappiness by all of the other code already in that company. I’m so much better at writing nice code now than I used to be back then. Was there just not a culture of looking after your code? There was, they just didn’t hire people for their abilities in that area. They hired people for raw IQ. The first indicator of it going wrong was that they didn’t have any computer scientists, which is a bit odd in a programming company. But even beyond that they didn’t have people who learnt architecture from anyone else. Most of them had started straight out of university, so never really had experience or mentors to learn from. There wasn’t the experience to draw from to teach each other. In the second half of my second internship, I was being given tasks like looking at new technologies and teaching people stuff. Interns shouldn’t be teaching people how to do their jobs! All interns are going to have little nuggets of things that you don’t know about, but they shouldn’t consistently be the ones who know the most. It’s not a good environment to learn. I was going to ask how you found working with people who were more experienced than you… When I reached Red Gate, I found some people who were more experienced programmers than me, and that was difficult. I’ve been coding since I was tiny. At university there were people who were cleverer than me, but there weren’t very many who were more experienced programmers than me. During my internship, I didn’t find anyone who I classed as being a noticeably more experienced programmer than me. So, it was a shock to the system to have valid criticisms rather than just formatting criticisms. However, Red Gate’s not so big on the actual code review, at least it wasn’t when I started. We did an entire product release and then somebody looked over all of the UI of that product which I’d written and say what they didn’t like. By that point, it was way too late and I’d disagree with them. Do you think the lack of code reviews was a bad thing? I think if there’s going to be any oversight of new people, then it should be continuous rather than chunky. For me I don’t mind too much, I could go out and get oversight if I wanted it, and in those situations I felt comfortable without it. If I was managing the new person, then maybe I’d be keener on oversight and then the right way to do it is continuously and in very, very small chunks. Have you had any significant projects you’ve worked on outside of a job? When I was a teenager I wrote all sorts of stuff. I used to write games, I derived how to do isomorphic projections myself once. I didn’t know what the word was so I couldn’t Google for it, so I worked it out myself. It was horrifically complicated. But it sort of tailed off when I started at university, and is now basically zero. If I do side-projects now, they tend to be work-related side projects like my actors framework, NAct, which I started in a down tools week. Could you explain a little more about NAct? It is a little C# framework for writing parallel code more easily. Parallel programming is difficult when you need to write to shared data. Sometimes parallel programming is easy because you don’t need to write to shared data. When you do need to access shared data, you could just have your threads pile in and do their work, but then you would screw up the data because the threads would trample on each other’s toes. You could lock, but locks are really dangerous if you’re using more than one of them. You get interactions like deadlocks, and that’s just nasty. Actors instead allows you to say this piece of data belongs to this thread of execution, and nobody else can read it. If you want to read it, then ask that thread of execution for a piece of it by sending a message, and it will send the data back by a message. And that avoids deadlocks as long as you follow some obvious rules about not making your actors sit around waiting for other actors to do something. There are lots of ways to write actors, NAct allows you to do it as if it was method calls on other objects, which means you get all the strong type-safety that C# programmers like. Do you think that this is suitable for the majority of parallel programming, or do you think it’s only suitable for specific cases? It’s suitable for most difficult parallel programming. If you’ve just got a hundred web requests which are all independent of each other, then I wouldn’t bother because it’s easier to just spin them up in separate threads and they can proceed independently of each other. But where you’ve got difficult parallel programming, where you’ve got multiple threads accessing multiple bits of data in multiple ways at different times, then actors is at least as good as all other ways, and is, I reckon, easier to think about. When you’re using actors, you presumably still have to write your code in a different way from you would otherwise using single-threaded code. You can’t use actors with any methods that have return types, because you’re not allowed to call into another actor and wait for it. If you want to get a piece of data out of another actor, then you’ve got to use tasks so that you can use “async” and “await” to await asynchronously for it. But other than that, you can still stick things in classes so it’s not too different really. Rather than having thousands of objects with mutable state, you can use component-orientated design, where there are only a few mutable classes which each have a small number of instances. Then there can be thousands of immutable objects. If you tend to do that anyway, then actors isn’t much of a jump. If I’ve already built my system without any parallelism, how hard is it to add actors to exploit all eight cores on my desktop? Usually pretty easy. If you can identify even one boundary where things look like messages and you have components where some objects live on one side and these other objects live on the other side, then you can have a granddaddy object on one side be an actor and it will parallelise as it goes across that boundary. Not too difficult. If we do get 1000-core desktop PCs, do you think actors will scale up? It’s hard. There are always in the order of twenty to fifty actors in my whole program because I tend to write each component as actors, and I tend to have one instance of each component. So this won’t scale to a thousand cores. What you can do is write data structures out of actors. I use dictionaries all over the place, and if you need a dictionary that is going to be accessed concurrently, then you could build one of those out of actors in no time. You can use queuing to marshal requests between different slices of the dictionary which are living on different threads. So it’s like a distributed hash table but all of the chunks of it are on the same machine. That means that each of these thousand processors has cached one small piece of the dictionary. I reckon it wouldn’t be too big a leap to start doing proper parallelism. Do you think it helps if actors get baked into the language, similarly to Erlang? Erlang is excellent in that it has thread-local garbage collection. C# doesn’t, so there’s a limit to how well C# actors can possibly scale because there’s a single garbage collected heap shared between all of them. When you do a global garbage collection, you’ve got to stop all of the actors, which is seriously expensive, whereas in Erlang garbage collections happen per-actor, so they’re insanely cheap. However, Erlang deviated from all the sensible language design that people have used recently and has just come up with crazy stuff. You can definitely retrofit thread-local garbage collection to .NET, and then it’s quite well-suited to support actors, even if it’s not baked into the language. Speaking of language design, do you have a favourite programming language? I’ll choose a language which I’ve never written before. I like the idea of Scala. It sounds like C#, only with some of the niggles gone. I enjoy writing static types. It means you don’t have to writing tests so much. When you say it doesn’t have some of the niggles? C# doesn’t allow the use of a property as a method group. It doesn’t have Scala case classes, or sum types, where you can do a switch statement and the compiler checks that you’ve checked all the cases, which is really useful in functional-style programming. Pattern-matching, in other words. That’s actually the major niggle. C# is pretty good, and I’m quite happy with C#. And what about going even further with the type system to remove the need for tests to something like Haskell? Or is that a step too far? I’m quite a pragmatist, I don’t think I could deal with trying to write big systems in languages with too few other users, especially when learning how to structure things. I just don’t know anyone who can teach me, and the Internet won’t teach me. That’s the main reason I wouldn’t use it. If I turned up at a company that writes big systems in Haskell, I would have no objection to that, but I wouldn’t instigate it. What about things in C#? For instance, there’s contracts in C#, so you can try to statically verify a bit more about your code. Do you think that’s useful, or just not worthwhile? I’ve not really tried it. My hunch is that it needs to be built into the language and be quite mathematical for it to work in real life, and that doesn’t seem to have ended up true for C# contracts. I don’t think anyone who’s tried them thinks they’re any good. I might be wrong. On a slightly different note, how do you like to debug code? I think I’m quite an odd debugger. I use guesswork extremely rarely, especially if something seems quite difficult to debug. I’ve been bitten spending hours and hours on guesswork and not being scientific about debugging in the past, so now I’m scientific to a fault. What I want is to see the bug happening in the debugger, to step through the bug happening. To watch the program going from a valid state to an invalid state. When there’s a bug and I can’t work out why it’s happening, I try to find some piece of evidence which places the bug in one section of the code. From that experiment, I binary chop on the possible causes of the bug. I suppose that means binary chopping on places in the code, or binary chopping on a stage through a processing cycle. Basically, I’m very stupid about how I debug. I won’t make any guesses, I won’t use any intuition, I will only identify the experiment that’s going to binary chop most effectively and repeat rather than trying to guess anything. I suppose it’s quite top-down. Is most of the time then spent in the debugger? Absolutely, if at all possible I will never debug using print statements or logs. I don’t really hold much stock in outputting logs. If there’s any bug which can be reproduced locally, I’d rather do it in the debugger than outputting logs. And with SmartAssembly error reporting, there’s not a lot that can’t be either observed in an error report and just fixed, or reproduced locally. And in those other situations, maybe I’ll use logs. But I hate using logs. You stare at the log, trying to guess what’s going on, and that’s exactly what I don’t like doing. You have to just look at it and see does this look right or wrong. We’ve covered how you get to grip with bugs. How do you get to grips with an entire codebase? I watch it in the debugger. I find little bugs and then try to fix them, and mostly do it by watching them in the debugger and gradually getting an understanding of how the code works using my process of binary chopping. I have to do a lot of reading and watching code to choose where my slicing-in-half experiment is going to be. The last time I did it was SmartAssembly. The old code was a complete mess, but at least it did things top to bottom. There wasn’t too much of some of the big abstractions where flow of control goes all over the place, into a base class and back again. Code’s really hard to understand when that happens. So I like to choose a little bug and try to fix it, and choose a bigger bug and try to fix it. Definitely learn by doing. I want to always have an aim so that I get a little achievement after every few hours of debugging. Once I’ve learnt the codebase I might be able to fix all the bugs in an hour, but I’d rather be using them as an aim while I’m learning the codebase. If I was a maintainer of a codebase, what should I do to make it as easy as possible for you to understand? Keep distinct concepts in different places. And name your stuff so that it’s obvious which concepts live there. You shouldn’t have some variable that gets set miles up the top of somewhere, and then is read miles down to choose some later behaviour. I’m talking from a very much SmartAssembly point of view because the old SmartAssembly codebase had tons and tons of these things, where it would read some property of the code and then deal with it later. Just thousands of variables in scope. Loads of things to think about. If you can keep concepts separate, then it aids me in my process of fixing bugs one at a time, because each bug is going to more or less be understandable in the one place where it is. And what about tests? Do you think they help at all? I’ve never had the opportunity to learn a codebase which has had tests, I don’t know what it’s like! What about when you’re actually developing? How useful do you find tests in finding bugs or regressions? Finding regressions, absolutely. Running bits of code that would be quite hard to run otherwise, definitely. It doesn’t happen very often that a test finds a bug in the first place. I don’t really buy nebulous promises like tests being a good way to think about the spec of the code. My thinking goes something like “This code works at the moment, great, ship it! Ah, there’s a way that this code doesn’t work. Okay, write a test, demonstrate that it doesn’t work, fix it, use the test to demonstrate that it’s now fixed, and keep the test for future regressions.” The most valuable tests are for bugs that have actually happened at some point, because bugs that have actually happened at some point, despite the fact that you think you’ve fixed them, are way more likely to appear again than new bugs are. Does that mean that when you write your code the first time, there are no tests? Often. The chance of there being a bug in a new feature is relatively unaffected by whether I’ve written a test for that new feature because I’m not good enough at writing tests to think of bugs that I would have written into the code. So not writing regression tests for all of your code hasn’t affected you too badly? There are different kinds of features. Some of them just always work, and are just not flaky, they just continue working whatever you throw at them. Maybe because the type-checker is particularly effective around them. Writing tests for those features which just tend to always work is a waste of time. And because it’s a waste of time I’ll tend to wait until a feature has demonstrated its flakiness by having bugs in it before I start trying to test it. You can get a feel for whether it’s going to be flaky code as you’re writing it. I try to write it to make it not flaky, but there are some things that are just inherently flaky. And very occasionally, I’ll think “this is going to be flaky” as I’m writing, and then maybe do a test, but not most of the time. How do you think your programming style has changed over time? I’ve got clearer about what the right way of doing things is. I used to flip-flop a lot between different ideas. Five years ago I came up with some really good ideas and some really terrible ideas. All of them seemed great when I thought of them, but they were quite diverse ideas, whereas now I have a smaller set of reliable ideas that are actually good for structuring code. So my code is probably more similar to itself than it used to be back in the day, when I was trying stuff out. I’ve got more disciplined about encapsulation, I think. There are operational things like I use actors more now than I used to, and that forces me to use immutability more than I used to. The first code that I wrote in Red Gate was the memory profiler UI, and that was an actor, I just didn’t know the name of it at the time. I don’t really use object-orientation. By object-orientation, I mean having n objects of the same type which are mutable. I want a constant number of objects that are mutable, and they should be different types. I stick stuff in dictionaries and then have one thing that owns the dictionary and puts stuff in and out of it. That’s definitely a pattern that I’ve seen recently. I think maybe I’m doing functional programming. Possibly. It’s plausible. If you had to summarise the essence of programming in a pithy sentence, how would you do it? Programming is the form of art that, without losing any of the beauty of architecture or fine art, allows you to produce things that people love and you make money from. So you think it’s an art rather than a science? It’s a little bit of engineering, a smidgeon of maths, but it’s not science. Like architecture, programming is on that boundary between art and engineering. If you want to do it really nicely, it’s mostly art. You can get away with doing architecture and programming entirely by having a good engineering mind, but you’re not going to produce anything nice. You’re not going to have joy doing it if you’re an engineering mind. Architects who are just engineering minds are not going to enjoy their job. I suppose engineering is the foundation on which you build the art. Exactly. How do you think programming is going to change over the next ten years? There will be an unfortunate shift towards dynamically-typed languages, because of JavaScript. JavaScript has an unfair advantage. JavaScript’s unfair advantage will cause more people to be exposed to dynamically-typed languages, which means other dynamically-typed languages crop up and the best features go into dynamically-typed languages. Then people conflate the good features with the fact that it’s dynamically-typed, and more investment goes into dynamically-typed languages. They end up better, so people use them. What about the idea of compiling other languages, possibly statically-typed, to JavaScript? It’s a reasonable idea. I would like to do it, but I don’t think enough people in the world are going to do it to make it pick up. The hordes of beginners are the lifeblood of a language community. They are what makes there be good tools and what makes there be vibrant community websites. And any particular thing which is the same as JavaScript only with extra stuff added to it, although it might be technically great, is not going to have the hordes of beginners. JavaScript is always to be quickest and easiest way for a beginner to start programming in the browser. And dynamically-typed languages are great for beginners. Compilers are pretty scary and beginners don’t write big code. And having your errors come up in the same place, whether they’re statically checkable errors or not, is quite nice for a beginner. If someone asked me to teach them some programming, I’d teach them JavaScript. If dynamically-typed languages are great for beginners, when do you think the benefits of static typing start to kick in? The value of having a statically typed program is in the tools that rely on the static types to produce a smooth IDE experience rather than actually telling me my compile errors. And only once you’re experienced enough a programmer that having a really smooth IDE experience makes a blind bit of difference, does static typing make a blind bit of difference. So it’s not really about size of codebase. If I go and write up a tiny program, I’m still going to get value out of writing it in C# using ReSharper because I’m experienced with C# and ReSharper enough to be able to write code five times faster if I have that help. Any other visions of the future? Nobody’s going to use actors. Because everyone’s going to be running on single-core VMs connected over network-ready protocols like JSON over HTTP. So, parallelism within one operating system is going to die. But until then, you should use actors. More Red Gater Coder interviews

Read the article

How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

testing Clojure in Maven

- by Ralph

I am new at Maven and even newer at Clojure. As an exercise to learn the language, I am writing a spider solitaire player program. I also plan on writing a similar program in Scala to compare the implementations (see my post http://stackoverflow.com/questions/2571267/modern-java-alternatives-closed). I have configured a Maven directory structure containing the usual src/main/clojure and src/test/clojure directories. My pom.xml file includes the clojure-maven-plugin. When I run "mvn test", it displays "No tests to run", despite my having test code in the src/test/clojure directory. As I misnaming something? Here is my pom.xml file: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>SpiderPlayer</groupId> <artifactId>SpiderPlayer</artifactId> <version>1.0.0-SNAPSHOT</version> <inceptionYear>2010</inceptionYear> <packaging>jar</packaging> <properties> <maven.build.timestamp.format>yyMMdd.HHmm</maven.build.timestamp.format> <main.dir>org/dogdaze/spider_player</main.dir> <main.package>org.dogdaze.spider_player</main.package> <main.class>${main.package}.Main</main.class> </properties> <build> <sourceDirectory>src/main/clojure</sourceDirectory> <testSourceDirectory>src/main/clojure</testSourceDirectory> <plugins> <plugin> <groupId>com.theoryinpractise</groupId> <artifactId>clojure-maven-plugin</artifactId> <version>1.3.1</version> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-antrun-plugin</artifactId> <version>1.3</version> <executions> <execution> <goals> <goal>run</goal> </goals> <phase>generate-sources</phase> <configuration> <tasks> <echo file="${project.build.sourceDirectory}/${main.dir}/Version.clj" message="(ns ${main.package})${line.separator}"/> <echo file="${project.build.sourceDirectory}/${main.dir}/Version.clj" append="true" message="(def version "${maven.build.timestamp}")${line.separator}"/> </tasks> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>2.1</version> <executions> <execution> <goals> <goal>single</goal> </goals> <phase>package</phase> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> <archive> <manifest> <mainClass>${main.class}</mainClass> </manifest> </archive> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <configuration> <redirectTestOutputToFile>true</redirectTestOutputToFile> <skipTests>false</skipTests> <skip>false</skip> </configuration> <executions> <execution> <id>surefire-it</id> <phase>integration-test</phase> <goals> <goal>test</goal> </goals> <configuration> <skip>false</skip> </configuration> </execution> </executions> </plugin> </plugins> </build> <dependencies> <dependency> <groupId>commons-cli</groupId> <artifactId>commons-cli</artifactId> <version>1.2</version> <scope>compile</scope> </dependency> </dependencies> </project> Here is my Clojure source file (src/main/clojure/org/dogdaze/spider_player/Deck.clj): ; Copyright 2010 Dogdaze (ns org.dogdaze.spider_player.Deck (:use [clojure.contrib.seq-utils :only (shuffle)])) (def suits [:clubs :diamonds :hearts :spades]) (def ranks [:ace :two :three :four :five :six :seven :eight :nine :ten :jack :queen :king]) (defn suit-seq "Return 4 suits: if number-of-suits == 1: :clubs :clubs :clubs :clubs if number-of-suits == 2: :clubs :diamonds :clubs :diamonds if number-of-suits == 4: :clubs :diamonds :hearts :spades." [number-of-suits] (take 4 (cycle (take number-of-suits suits)))) (defstruct card :rank :suit) (defn unshuffled-deck "Create an unshuffled deck containing all cards from the number of suits specified." [number-of-suits] (for [rank ranks suit (suit-seq number-of-suits)] (struct card rank suit))) (defn deck "Create a shuffled deck containing all cards from the number of suits specified." [number-of-suits] (shuffle (unshuffled-deck number-of-suits))) Here is my test case (src/test/clojure/org/dogdaze/spider_player/TestDeck.clj): ; Copyright 2010 Dogdaze (ns org.dogdaze.spider_player (:use clojure.set clojure.test org.dogdaze.spider_player.Deck)) (deftest test-suit-seq (is (= (suit-seq 1) [:clubs :clubs :clubs :clubs])) (is (= (suit-seq 2) [:clubs :diamonds :clubs :diamonds])) (is (= (suit-seq 4) [:clubs :diamonds :hearts :spades]))) (def one-suit-deck [{:rank :ace, :suit :clubs} {:rank :ace, :suit :clubs} {:rank :ace, :suit :clubs} {:rank :ace, :suit :clubs} {:rank :two, :suit :clubs} {:rank :two, :suit :clubs} {:rank :two, :suit :clubs} {:rank :two, :suit :clubs} {:rank :three, :suit :clubs} {:rank :three, :suit :clubs} {:rank :three, :suit :clubs} {:rank :three, :suit :clubs} {:rank :four, :suit :clubs} {:rank :four, :suit :clubs} {:rank :four, :suit :clubs} {:rank :four, :suit :clubs} {:rank :five, :suit :clubs} {:rank :five, :suit :clubs} {:rank :five, :suit :clubs} {:rank :five, :suit :clubs} {:rank :six, :suit :clubs} {:rank :six, :suit :clubs} {:rank :six, :suit :clubs} {:rank :six, :suit :clubs} {:rank :seven, :suit :clubs} {:rank :seven, :suit :clubs} {:rank :seven, :suit :clubs} {:rank :seven, :suit :clubs} {:rank :eight, :suit :clubs} {:rank :eight, :suit :clubs} {:rank :eight, :suit :clubs} {:rank :eight, :suit :clubs} {:rank :nine, :suit :clubs} {:rank :nine, :suit :clubs} {:rank :nine, :suit :clubs} {:rank :nine, :suit :clubs} {:rank :ten, :suit :clubs} {:rank :ten, :suit :clubs} {:rank :ten, :suit :clubs} {:rank :ten, :suit :clubs} {:rank :jack, :suit :clubs} {:rank :jack, :suit :clubs} {:rank :jack, :suit :clubs} {:rank :jack, :suit :clubs} {:rank :queen, :suit :clubs} {:rank :queen, :suit :clubs} {:rank :queen, :suit :clubs} {:rank :queen, :suit :clubs} {:rank :king, :suit :clubs} {:rank :king, :suit :clubs} {:rank :king, :suit :clubs} {:rank :king, :suit :clubs}]) (def two-suits-deck [{:rank :ace, :suit :clubs} {:rank :ace, :suit :diamonds} {:rank :ace, :suit :clubs} {:rank :ace, :suit :diamonds} {:rank :two, :suit :clubs} {:rank :two, :suit :diamonds} {:rank :two, :suit :clubs} {:rank :two, :suit :diamonds} {:rank :three, :suit :clubs} {:rank :three, :suit :diamonds} {:rank :three, :suit :clubs} {:rank :three, :suit :diamonds} {:rank :four, :suit :clubs} {:rank :four, :suit :diamonds} {:rank :four, :suit :clubs} {:rank :four, :suit :diamonds} {:rank :five, :suit :clubs} {:rank :five, :suit :diamonds} {:rank :five, :suit :clubs} {:rank :five, :suit :diamonds} {:rank :six, :suit :clubs} {:rank :six, :suit :diamonds} {:rank :six, :suit :clubs} {:rank :six, :suit :diamonds} {:rank :seven, :suit :clubs} {:rank :seven, :suit :diamonds} {:rank :seven, :suit :clubs} {:rank :seven, :suit :diamonds} {:rank :eight, :suit :clubs} {:rank :eight, :suit :diamonds} {:rank :eight, :suit :clubs} {:rank :eight, :suit :diamonds} {:rank :nine, :suit :clubs} {:rank :nine, :suit :diamonds} {:rank :nine, :suit :clubs} {:rank :nine, :suit :diamonds} {:rank :ten, :suit :clubs} {:rank :ten, :suit :diamonds} {:rank :ten, :suit :clubs} {:rank :ten, :suit :diamonds} {:rank :jack, :suit :clubs} {:rank :jack, :suit :diamonds} {:rank :jack, :suit :clubs} {:rank :jack, :suit :diamonds} {:rank :queen, :suit :clubs} {:rank :queen, :suit :diamonds} {:rank :queen, :suit :clubs} {:rank :queen, :suit :diamonds} {:rank :king, :suit :clubs} {:rank :king, :suit :diamonds} {:rank :king, :suit :clubs} {:rank :king, :suit :diamonds}]) (def four-suits-deck [{:rank :ace, :suit :clubs} {:rank :ace, :suit :diamonds} {:rank :ace, :suit :hearts} {:rank :ace, :suit :spades} {:rank :two, :suit :clubs} {:rank :two, :suit :diamonds} {:rank :two, :suit :hearts} {:rank :two, :suit :spades} {:rank :three, :suit :clubs} {:rank :three, :suit :diamonds} {:rank :three, :suit :hearts} {:rank :three, :suit :spades} {:rank :four, :suit :clubs} {:rank :four, :suit :diamonds} {:rank :four, :suit :hearts} {:rank :four, :suit :spades} {:rank :five, :suit :clubs} {:rank :five, :suit :diamonds} {:rank :five, :suit :hearts} {:rank :five, :suit :spades} {:rank :six, :suit :clubs} {:rank :six, :suit :diamonds} {:rank :six, :suit :hearts} {:rank :six, :suit :spades} {:rank :seven, :suit :clubs} {:rank :seven, :suit :diamonds} {:rank :seven, :suit :hearts} {:rank :seven, :suit :spades} {:rank :eight, :suit :clubs} {:rank :eight, :suit :diamonds} {:rank :eight, :suit :hearts} {:rank :eight, :suit :spades} {:rank :nine, :suit :clubs} {:rank :nine, :suit :diamonds} {:rank :nine, :suit :hearts} {:rank :nine, :suit :spades} {:rank :ten, :suit :clubs} {:rank :ten, :suit :diamonds} {:rank :ten, :suit :hearts} {:rank :ten, :suit :spades} {:rank :jack, :suit :clubs} {:rank :jack, :suit :diamonds} {:rank :jack, :suit :hearts} {:rank :jack, :suit :spades} {:rank :queen, :suit :clubs} {:rank :queen, :suit :diamonds} {:rank :queen, :suit :hearts} {:rank :queen, :suit :spades} {:rank :king, :suit :clubs} {:rank :king, :suit :diamonds} {:rank :king, :suit :hearts} {:rank :king, :suit :spades}]) (deftest test-unshuffled-deck (is (= (unshuffled-deck 1) one-suit-deck)) (is (= (unshuffled-deck 2) two-suits-deck)) (is (= (unshuffled-deck 4) four-suits-deck))) (deftest test-shuffled-deck (is (= (set (deck 1)) (set one-suit-deck))) (is (= (set (deck 2)) (set two-suits-deck))) (is (= (set (deck 4)) (set four-suits-deck)))) (run-tests) Any idea why the test is not running? BTW, feel free to suggest improvements to the Clojure code. Thanks, Ralph

Read the article

Div not floating left

- by Davey

Can't seem to get this div to move to the left. Using wordpress. I tried a lot of things but am at a loss. Here is the css for the div: #portfolio li img { position: absolute; float: left; margin: 34px 50px 0 0; width: 942px; } Here is the header.php: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">  <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=<?php bloginfo('charset'); ?>" /> <link rel="stylesheet" type="text/css" href="wp-content/themes/zenlite/layout.css" media="screen" /> <link rel="stylesheet" type="text/css" href="<?php bloginfo('template_directory'); ?>/print.css" media="print" /> <link rel="stylesheet" type="text/css" href="wp-content/themes/zenlite/color.css" /> <script type="text/javascript" src="js/jquery.js"></script> <script type="text/javascript" src="js/jquery.kwicks-1.5.1.js"></script> <script type="text/javascript" src="js/jquery.innerfade.js"></script> <script type="text/javascript" src="js/custom.js"></script> <title> Wildfire </title> <script type="text/javascript" src="http://wfithaca.com/js/jquery.lavalamp.js"></script> <script type="text/javascript" src="http://wfithaca.com/js/jquery.easing.1.1.js"></script> <script type="text/javascript" src="http://wfithaca.com/js/jquery.cycle.all.min.js"></script> <script type="text/javascript"> function my_kwicks(){ $('.kwicks').kwicks({ duration: 300, max: 200, spacing: 0 }); } $(document).ready(function(){ my_kwicks(); }); </script> <script type="text/javascript"> $(document).ready( function(){ $('ul#portfolio').innerfade({ speed: 1000, timeout: 5000, type: 'sequence', }); }); </script> </script> <script type="text/javascript"> $(document).ready(function(){ $('li.headlink').hover( function() { $('ul', this).css('display', 'block'); }, function() { $('ul', this).css('display', 'none'); }); }); </script> </head> <body> <div id="wrapper"> <div id="header"> <ul id="portfolio"> <li> <img src="http://wfithaca.com/images/banner1.png" /> </li> <li> <img src="http://wfithaca.com/images/banner1.png" /> </li> <li> <img src="http://wfithaca.com/images/banner1.png" /> </li> </ul> </div> <div id="navigation"> <div id="kwickbar"> <ul class="kwicks"> <li id="kwick1"><a href="#">Home</a></li> <li id="kwick2"><a href="#">Menu</a></li> <li id="kwick3"><a href="#">Events</a></li> <li id="kwick4"><a href="#">Friends</a></li> <li id="kwick5"><a href="#">Contact</a></li> </ul> </div> </div> Here is the stylesheet: html,body { font-family:Tahoma, Verdana,Arial, Helvetica, sans-serif; font-size:100%; padding:0; color:#fff; border-style:none; } a { text-decoration:none; } a:hover,a:active,a:focus { text-decoration:none; } ul li { list-style-type:none; } ul.dbem_events_list a:link {color: #A32725; text-decoration: underline; } ul.dbem_events_list a:visited {color: #A32725; text-decoration: underline; } ul.dbem_events_list a:hover {color: #ffffff; text-decoration: none; } ul.dbem_events_list{text-decoration:none; list-style-type:none;} ul li ul li { list-style-type:none; } ul li ul li ul li { list-style-type: none; } q:before, q:after { content:""; } #wrapper { width:986px; margin: 0 auto; } #header { background-image:url('images/headframe.png'); width:986px; height:271px; } #kwickbar { padding: 25px 0 0 25px; } #navigation { width:984px; height: 100px; background-color: #000000; text-decoration:none; margin-left:1px; } .update-post { float:left; width:100px; } #content { float:left; height:100%; width:984px; background-color: #000000; text-decoration:none; margin-left:1px; } #postcontent{ height:100%; width:100%; } #content .post { float:left; width:90px; } #content .page,#content .attachment,.postcontent { color:#fff; width:720px; margin-top:15px; margin-left:30px; float:left; text-decoration:none; } .photo { width: 250px; height:700px; background-color:#000000; margin:0 0 0 880px; } .slideshow { height: 232px; width: 232px; margin:0 0 0 880px; } .slideshow img { border: 5px solid #000; } .post-title { margin:0; padding:0; } .post-title a { text-decoration:none; } .post-title a:hover,.post-title a:active,.post-title a:focus { text-decoration:underline; } #content .meta li,#content .prevnext li,#content .gallery li { list-style-image:none; list-style:none; } .meta { margin:5px 0 0; padding:0; font-size:.85em; } .meta ul,.meta li { margin:0; padding:0; } .meta ul { display:inline; } .meta li li { display:inline; padding-right:.3em; } .postfoot { clear:both; margin-bottom:20px; padding-bottom:10px; line-height:1.2em; } .author .posts-by { padding-top:10px; } #footer { clear:both; margin:0; padding:0 0 5px; text-align:center; font-size:.8em; border: 0; width:960px; } #footer ul { clear:both; margin:0; padding:0; } #footer li { display:inline; margin:0; padding:0 5px; } #footer li.rss { position:relative; top:3px; } .copyright { padding:50px 0 0 0; font-family:verdana; color:#ffffff; text-align:left; width:800px; font-size:0.8em; } .copyright a { text-decoration:none; color:#7E0000; font-weight:600; } .copyright a:hover { color:#C0D341; } . .postcontent p { text-decoration:none; border:0; border-style:none; } .postcontent p a:hover { color:#fff; } .kwicks { list-style-type: none; list-style-position:outside; position: relative; margin: 0; padding: 0; } .kwicks li{ display: block; overflow: hidden; padding: 0; cursor: pointer; float: left; width: 125px; height: 40px; margin-right: 0px; background-image:url('http://wfithaca.com/images/kwicks.jpg'); background-repeat:no-repeat; } .kwicks a{ display:block; height:40px; text-indent:-9999px; outline:none; } #kwick1 { background-position:0px 0px; } #kwick2 { background-position:-200px 0px; } #kwick3 { background-position:-400px 0px; } #kwick4 { background-position:-600px 0px; } #kwick5 { background-position:-800px 0px; } #kwick1.active, #kwick1:hover { background-position: 0 bottom; } #kwick2.active, #kwick2:hover{ background-position: -200px bottom; } #kwick3.active, #kwick3:hover { background-position: -400px bottom; } #kwick4.active, #kwick4:hover { background-position: -600px bottom; } #kwick5.active, #kwick5:hover { background-position: -800px bottom; } #portfolio li img { position: absolute; float: left; margin: 34px 50px 0 0; width: 942px; } Just want the #portfolio li img div to move to the left a bit. any help would be greatly appreciated.

Read the article

How to Add Attribute in a XML CDATA? XSLT

- by iCeR

Xml: <Frames> <bannerFrame1> <![CDATA[ <iframe src="https://image.domain.com/promobanner1.html" height="320" width="629" scrolling="no" frameborder="0" marginwidth="0" marginheight="0"></iframe> ]]> </bannerFrame1> <bannerFrame2> <![CDATA[ <iframe src="https://image.domain.com/promobanner2.html" height="320" width="629" scrolling="no" frameborder="0" marginwidth="0" marginheight="0"></iframe> ]]> </bannerFrame2> </Frames> XML: bannerFrame1/iframe CDATA source Value may look like something like this <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <body> <div> <img src="banners/banner1.gif" border="0" alt="Banner"/> </div> </body> </html> XSLT: Im getting the CDATA value of the bannerFrame1 <xsl:template match="/"> <xsl:value-of select="Frames/bannerFrame1" disable-output-escaping="yes" /> </xsl:template> Im trying to add a google analytic event tracking code on the image from CDATA "bannerFrame1". How can I add an onclick attribute on <img src="banners/banner1.gif" border="0" alt="Banner"> while getting the CDATA value of <bannerFrame1>? Is it really possible? Thanks in advance. Expected output: <iframe> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"> <body> <div> <img src="banners/banner1.gif" border="0" alt="Banner" onclick ="GoogleEventTracker();"/> </div> </body> </html> </iframe> Iframe source: <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <meta http-equiv="Pragma" content="no-cache"> <link rel="stylesheet" type="text/css" href="css/style.css"> <link rel="stylesheet" type="text/css" href="css/ui-lightness/jquery-ui-1.7.2.custom.css"> <script type="text/javascript" src="js/jquery-1.3.2.min.js"></script> <script type="text/javascript" src="custombanner/jquery.min.js"> </script> <script type="text/javascript" src="custombanner/jquery.cycle.all.2.74.js"></script> <script type="text/javascript" src="js/jquery-ui-1.7.2.custom.min.js"></script>  <title>domain - It's time everyone flies</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="script" http-equiv="Content-Script-Type" content="text/javascript"> <meta name="script" http-equiv="Content-Style-Type" content="text/css">  <style type="text/css"> img, div, a, input { behavior: url(iepngfix.htc); } #nav { float: left; left: 8px; margin: 15px; position: absolute; top: 235px; padding-left: 242px; /* for 4 frames */ /*padding-left: 278px; /* for 3 frames */ /*padding-left: 312px; /* for 2 frames */ /*padding-left: 242px; /* for 4 frames */ margin-left: 1px; margin-right: 1px; margin-bottom: 1px; height: 40px; /*background :url("banners/gradient.gif") repeat-x scroll 0 0 transparent;*/ } #nav li { display: block; float: left; list-style: none outside none; margin: 2px; padding: 2px; padding-right: 4px; margin-top: 8px; width: 25px; } #nav a { border: 1px solid #ffffff; display: block; padding: 0; width: 25px; } #nav img { border: medium none; display: block; height: 20px; width: 25px; opacity: 0.5; filter: alpha(opacity=50); } #nav li.activeLI { background: #ff0000; } #nav li.activeLI img { opacity: 1; filter: alpha(opacity=100); } </style> </head> <body marginwidth="0" marginheight="0"> <div class="hero"> <div class="slideshow mainpromo" id="slideshow" style="margin-bottom: 50px;">  <img src="banners/banner1.gif" border="0" alt="Banner"> </div> <ul id="nav"> </ul> <div class="rightpane">  <img src="banners/lite-fares.jpg" width="206" height="214" border="0" alt="Thumbnail"> <a href="https://book.domain.com/Register.aspx" target="_top"><img src="images/registernow.gif" width="203" height="20" border="0" alt="See all Low Fares"></a>   </div> <div id="alerts"> <p> <strong>Seat Sale Alert! </strong>Be the first to know thePromos. <a target="_top" href="http://www.domain.com/pages/emms-signup.aspx">SUBSCRIBE NOW »</a> </p> <p>  </p> <div class="clear"> </div> </div> <div style="clear: both;"> </div> <div id="mascot"> <img src="images/mascot.png" alt=""> </div> <div style="clear: both;"> </div> </div> </body></html>

Read the article

Delphi: EInvalidOp in neural network class (TD-lambda)

- by user89818

I have the following draft for a neural network class. This neural network should learn with TD-lambda. It is started by calling the getRating() function. But unfortunately, there is an EInvalidOp (invalid floading point operation) error after about 1000 iterations in the following lines: neuronsHidden[j] := neuronsHidden[j]+neuronsInput[t][i]*weightsInput[i][j]; // input -> hidden weightsHidden[j][k] := weightsHidden[j][k]+LEARNING_RATE_HIDDEN*tdError[k]*eligibilityTraceOutput[j][k]; // adjust hidden->output weights according to TD-lambda Why is this error? I can't find the mistake in my code :( Can you help me? Thank you very much in advance! unit uNeuronalesNetz; interface uses Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms, Dialogs, ExtCtrls, StdCtrls, Grids, Menus, Math; const NEURONS_INPUT = 43; // number of neurons in the input layer NEURONS_HIDDEN = 60; // number of neurons in the hidden layer NEURONS_OUTPUT = 1; // number of neurons in the output layer NEURONS_TOTAL = NEURONS_INPUT+NEURONS_HIDDEN+NEURONS_OUTPUT; // total number of neurons in the network MAX_TIMESTEPS = 42; // maximum number of timesteps possible (after 42 moves: board is full) LEARNING_RATE_INPUT = 0.25; // in ideal case: decrease gradually in course of training LEARNING_RATE_HIDDEN = 0.15; // in ideal case: decrease gradually in course of training GAMMA = 0.9; LAMBDA = 0.7; // decay parameter for eligibility traces type TFeatureVector = Array[1..43] of SmallInt; // definition of the array type TFeatureVector TArtificialNeuralNetwork = class // definition of the class TArtificialNeuralNetwork private // GENERAL SETTINGS START learningMode: Boolean; // does the network learn and change its weights? // GENERAL SETTINGS END // NETWORK CONFIGURATION START neuronsInput: Array[1..MAX_TIMESTEPS] of Array[1..NEURONS_INPUT] of Extended; // array of all input neurons (their values) for every timestep neuronsHidden: Array[1..NEURONS_HIDDEN] of Extended; // array of all hidden neurons (their values) neuronsOutput: Array[1..NEURONS_OUTPUT] of Extended; // array of output neurons (their values) weightsInput: Array[1..NEURONS_INPUT] of Array[1..NEURONS_HIDDEN] of Extended; // array of weights: input->hidden weightsHidden: Array[1..NEURONS_HIDDEN] of Array[1..NEURONS_OUTPUT] of Extended; // array of weights: hidden->output // NETWORK CONFIGURATION END // LEARNING SETTINGS START outputBefore: Array[1..NEURONS_OUTPUT] of Extended; // the network's output value in the last timestep (the one before) eligibilityTraceHidden: Array[1..NEURONS_INPUT] of Array[1..NEURONS_HIDDEN] of Array[1..NEURONS_OUTPUT] of Extended; // array of eligibility traces: hidden layer eligibilityTraceOutput: Array[1..NEURONS_TOTAL] of Array[1..NEURONS_TOTAL] of Extended; // array of eligibility traces: output layer reward: Array[1..MAX_TIMESTEPS] of Array[1..NEURONS_OUTPUT] of Extended; // the reward value for all output neurons in every timestep tdError: Array[1..NEURONS_OUTPUT] of Extended; // the network's error value for every single output neuron t: Byte; // current timestep cyclesTrained: Integer; // number of cycles trained so far (learning rates could be decreased accordingly) last50errors: Array[1..50] of Extended; // LEARNING SETTINGS END public constructor Create; // create the network object and do the initialization procedure UpdateEligibilityTraces; // update the eligibility traces for the hidden and output layer procedure tdLearning; // learning algorithm: adjust the network's weights procedure ForwardPropagation; // propagate the input values through the network to the output layer function getRating(state: TFeatureVector; explorative: Boolean): Extended; // get the rating for a given state (feature vector) function HyperbolicTangent(x: Extended): Extended; // calculate the hyperbolic tangent [-1;1] procedure StartNewCycle; // start a new cycle with everything set to default except for the weights procedure setLearningMode(activated: Boolean=TRUE); // switch the learning mode on/off procedure setInputs(state: TFeatureVector); // transfer the given feature vector to the input layer (set input neurons' values) procedure setReward(currentReward: SmallInt); // set the reward for the current timestep (with learning then or without) procedure nextTimeStep; // increase timestep t function getCyclesTrained(): Integer; // get the number of cycles trained so far procedure Visualize(imgHidden: Pointer); // visualize the neural network's hidden layer end; implementation procedure TArtificialNeuralNetwork.UpdateEligibilityTraces; var i, j, k: Integer; begin // how worthy is a weight to be adjusted? for j := 1 to NEURONS_HIDDEN do begin for k := 1 to NEURONS_OUTPUT do begin eligibilityTraceOutput[j][k] := LAMBDA*eligibilityTraceOutput[j][k]+(neuronsOutput[k]*(1-neuronsOutput[k]))*neuronsHidden[j]; for i := 1 to NEURONS_INPUT do begin eligibilityTraceHidden[i][j][k] := LAMBDA*eligibilityTraceHidden[i][j][k]+(neuronsOutput[k]*(1-neuronsOutput[k]))*weightsHidden[j][k]*neuronsHidden[j]*(1-neuronsHidden[j])*neuronsInput[t][i]; end; end; end; end; procedure TArtificialNeuralNetwork.setReward; VAR i: Integer; begin for i := 1 to NEURONS_OUTPUT do begin // +1 = player A wins // 0 = draw // -1 = player B wins reward[t][i] := currentReward; end; end; procedure TArtificialNeuralNetwork.tdLearning; var i, j, k: Integer; begin if learningMode then begin for k := 1 to NEURONS_OUTPUT do begin if reward[t][k] = 0 then begin tdError[k] := GAMMA*neuronsOutput[k]-outputBefore[k]; // network's error value when reward is 0 end else begin tdError[k] := reward[t][k]-outputBefore[k]; // network's error value in the final state (reward received) end; for j := 1 to NEURONS_HIDDEN do begin weightsHidden[j][k] := weightsHidden[j][k]+LEARNING_RATE_HIDDEN*tdError[k]*eligibilityTraceOutput[j][k]; // adjust hidden->output weights according to TD-lambda for i := 1 to NEURONS_INPUT do begin weightsInput[i][j] := weightsInput[i][j]+LEARNING_RATE_INPUT*tdError[k]*eligibilityTraceHidden[i][j][k]; // adjust input->hidden weights according to TD-lambda end; end; end; end; end; procedure TArtificialNeuralNetwork.ForwardPropagation; var i, j, k: Integer; begin for j := 1 to NEURONS_HIDDEN do begin neuronsHidden[j] := 0; for i := 1 to NEURONS_INPUT do begin neuronsHidden[j] := neuronsHidden[j]+neuronsInput[t][i]*weightsInput[i][j]; // input -> hidden end; neuronsHidden[j] := HyperbolicTangent(neuronsHidden[j]); // activation of hidden neuron j end; for k := 1 to NEURONS_OUTPUT do begin neuronsOutput[k] := 0; for j := 1 to NEURONS_HIDDEN do begin neuronsOutput[k] := neuronsOutput[k]+neuronsHidden[j]*weightsHidden[j][k]; // hidden -> output end; neuronsOutput[k] := HyperbolicTangent(neuronsOutput[k]); // activation of output neuron k end; end; procedure TArtificialNeuralNetwork.setLearningMode; begin learningMode := activated; end; constructor TArtificialNeuralNetwork.Create; var i, j, k: Integer; begin inherited Create; Randomize; // initialize random numbers generator learningMode := TRUE; cyclesTrained := -2; // only set to -2 because it will be increased twice in the beginning StartNewCycle; for j := 1 to NEURONS_HIDDEN do begin for k := 1 to NEURONS_OUTPUT do begin weightsHidden[j][k] := abs(Random-0.5); // initialize weights: 0 <= random < 0.5 end; for i := 1 to NEURONS_INPUT do begin weightsInput[i][j] := abs(Random-0.5); // initialize weights: 0 <= random < 0.5 end; end; for i := 1 to 50 do begin last50errors[i] := 0; end; end; procedure TArtificialNeuralNetwork.nextTimeStep; begin t := t+1; end; procedure TArtificialNeuralNetwork.StartNewCycle; var i, j, k, m: Integer; begin t := 1; // start in timestep 1 cyclesTrained := cyclesTrained+1; // increase the number of cycles trained so far for j := 1 to NEURONS_HIDDEN do begin neuronsHidden[j] := 0; for k := 1 to NEURONS_OUTPUT do begin eligibilityTraceOutput[j][k] := 0; outputBefore[k] := 0; neuronsOutput[k] := 0; for m := 1 to MAX_TIMESTEPS do begin reward[m][k] := 0; end; end; for i := 1 to NEURONS_INPUT do begin for k := 1 to NEURONS_OUTPUT do begin eligibilityTraceHidden[i][j][k] := 0; end; end; end; end; function TArtificialNeuralNetwork.getCyclesTrained; begin result := cyclesTrained; end; procedure TArtificialNeuralNetwork.setInputs; var k: Integer; begin for k := 1 to NEURONS_INPUT do begin neuronsInput[t][k] := state[k]; end; end; function TArtificialNeuralNetwork.getRating; begin setInputs(state); ForwardPropagation; result := neuronsOutput[1]; if not explorative then begin tdLearning; // adjust the weights according to TD-lambda ForwardPropagation; // calculate the network's output again outputBefore[1] := neuronsOutput[1]; // set outputBefore which will then be used in the next timestep UpdateEligibilityTraces; // update the eligibility traces for the next timestep nextTimeStep; // go to the next timestep end; end; function TArtificialNeuralNetwork.HyperbolicTangent; begin if x > 5500 then // prevent overflow result := 1 else result := (Exp(2*x)-1)/(Exp(2*x)+1); end; end.

Read the article

StackOverFlowError while creating Mac object on AS400/Java

- by Prasanna K Rao

Hello all, I am a newbie to AS400-Java programming. I am trying to create my first program to test the implementation of Message Authentication Code (MAC). I am trying to use the HMACSHA1 hash function. My (Java 1.4) program runs fine on a dev box (V5R4).But fails terribly on the QA box (V5R3). My program is as below: ===================================================== import java.security.InvalidKeyException; import java.security.NoSuchAlgorithmException; import java.security.Security; import java.security.Provider; import javax.crypto.Mac; import javax.crypto.spec.SecretKeySpec; import javax.crypto.SecretKey; public class Test01 { private static final String HMAC_SHA1_ALGORITHM = "HmacSHA1"; public static void main (String [] arguments) { byte[] key = { 1,2,3,4,5,6,7,8}; SecretKeySpec SHA1key = new SecretKeySpec(key, "HmacSHA1"); Mac hmac; String strFinalRslt = ""; try { hmac = Mac.getInstance("HmacSHA1"); hmac.init(SHA1key); byte[] result = hmac.doFinal(); strFinalRslt = toHexString(result); }catch (NoSuchAlgorithmException e) { // TODO Auto-generated catch block e.printStackTrace(); }catch (InvalidKeyException e) { // TODO Auto-generated catch block e.printStackTrace(); }catch(StackOverflowError e){ e.printStackTrace(); } System.out.println(strFinalRslt); System.out.println("All done!!!"); } public static byte[] fromHexString ( String s ) { int stringLength = s.length(); if ( (stringLength & 0x1) != 0 ) { throw new IllegalArgumentException ( "fromHexString requires an even number of hex characters" ); } byte[] b = new byte[stringLength / 2]; for ( int i=0,j=0; i 4] ); //look up low nibble char sb.append( hexChar [b[i] & 0x0f] ); } return sb.toString(); } static char[] hexChar = { '0' , '1' , '2' , '3' , '4' , '5' , '6' , '7' , '8' , '9' , 'a' , 'b' , 'c' , 'd' , 'e' , 'f'}; } This program compiles fine and gets the correct response on my win-xp client and also my dev box. But, fails with the following error on the QA box: java.lang.StackOverflowError at java.lang.Throwable.(Throwable.java:180) at java.lang.Error.(Error.java:37) at java.lang.StackOverflowError.(StackOverflowError.java:24) at java.io.Os400FileSystem.list(Native method) at java.io.File.list(File.java:922) at javax.crypto.b.e(Unknown source) at javax.crypto.b.a(Unknown source) at javax.crypto.b.c(Unknown source) at javax.crypto.b£0.run(Unknown source) at javax.crypto.b.(Unknown source) at javax.crypto.Mac.getInstance(Unknown source) I have verified the java.security file and entry corresponding to the jce files are all ok. The DMPJVM command gives me the following response: Thu Jun 03 12:25:34 E Java Virtual Machine Information 016822/QPGMR/11111 ........................................................................ . Classpath . ........................................................................ java.version=1.4 sun.boot.class.path=/QIBM/ProdData/OS400/Java400/jdk/lib/jdkptf14.zip:/QIBM /ProdData/OS400/Java400/ext/ibmjssefw.jar:/QIBM/ProdData/CAP/ibmjsseprovide r.jar:/QIBM/ProdData/OS400/Java400/ext/ibmjsseprovider2.jar:/QIBM/ProdData/ OS400/Java400/ext/ibmpkcs11impl.jar:/QIBM/ProdData/CAP/ibmjssefips.jar:/QIB M/ProdData/OS400/Java400/jdk/lib/IBMiSeriesJSSE.jar:/QIBM/ProdData/OS400/Ja va400/jdk/lib/jce.jar:/QIBM/ProdData/OS400/Java400/jdk/lib/jaas.jar:/QIBM/P rodData/OS400/Java400/jdk/lib/ibmcertpathfw.jar:/QIBM/ProdData/OS400/Java40 0/jdk/lib/ibmcertpathprovider.jar:/QIBM/ProdData/OS400/Java400/ext/ibmpkcs. jar:/QIBM/ProdData/OS400/Java400/jdk/lib/ibmjgssfw.jar:/QIBM/ProdData/OS400 /Java400/jdk/lib/ibmjgssprovider.jar:/QIBM/ProdData/OS400/Java400/jdk/lib/s ecurity.jar:/QIBM/ProdData/OS400/Java400/jdk/lib/charsets.jar:/QIBM/ProdDat a/OS400/Java400/jdk/lib/resources.jar:/QIBM/ProdData/OS400/Java400/jdk/lib/ rt.jar:/QIBM/ProdData/OS400/Java400/jdk/lib/sunrsasign.jar:/QIBM/ProdData/O S400/Java400/ext/IBMmisc.jar:/QIBM/ProdData/Java400/ java.class.path=/myhome/lib/commons-codec-1.3.jar:/myhome/lib/commons-httpc lient-3.1.jar:/myhome/lib/commons-logging-1.1.jar:/myhome/lib/log4j-1.2.15.jar:/myhome/lib/log4j-core.jar ; java.ext.dirs=/QIBM/ProdData/OS400/Java400/jdk/lib/ext:/QIBM/UserData/Java4 00/ext:/QIBM/ProdData/Java400/jdk14/lib/ext java.library.path=/QSYS.LIB/ROBOTLIB.LIB:/QSYS.LIB/QTEMP.LIB:/QSYS.LIB/ODIP GM.LIB:/QSYS.LIB/QGPL.LIB ........................................................................ . Garbage Collection . ........................................................................ Garbage collector parameters Initial size: 16384 K Max size: 240000000 K Current values Heap size: 437952 K Garbage collections: 58 Additional values JIT heap size: 53824 K JVM heap size: 55752 K Last GC cycle time: 1333 ms ........................................................................ . Thread information . ........................................................................ Information for 4 thread(s) of 4 thread(s) processed Thread: 00000004 Thread-0 TDE: B00380000BAA0000 Thread priority: 5 Thread status: Running Thread group: main Runnable: java/lang/Thread Stack: java/io/Os400FileSystem.list(Ljava/io/File;)[Ljava/lang/String;+0 (Os400FileSystem.java:0) java/io/File.list()[Ljava/lang/String;+19 (File.java:922) javax/crypto/b.e()[B+127 (:0) javax/crypto/b.a(Ljava/security/cert/X509Certificate;)V+7 (:0) javax/crypto/b.access$500(Ljava/security/cert/X509Certificate;)V+1 (:0) javax/crypto/b$0.run()Ljava/lang/Object;+98 (:0) javax/crypto/b.()V+507 (:0) javax/crypto/Mac.getInstance(Ljava/lang/String;)Ljavax/crypto/Mac;+10 (:0) Locks: None Thread: 00000007 jitcompilethread TDE: B00380000BD58000 Thread priority: 5 Thread status: Java wait Thread group: system Runnable: java/lang/Thread Stack: None Locks: None Thread: 00000005 Reference Handler TDE: B00380000BAAC000 Thread priority: 10 Thread status: Waiting Wait object: java/lang/ref/Reference$Lock Thread group: system Runnable: java/lang/ref/Reference$ReferenceHandler Stack: java/lang/Object.wait()V+1 (Object.java:452) java/lang/ref/Reference$ReferenceHandler.run()V+47 (Reference.java:169) Locks: None Thread: 00000006 Finalizer TDE: B00380000BAB3000 Thread priority: 8 Thread status: Waiting Wait object: java/lang/ref/ReferenceQueue$Lock Thread group: system Runnable: java/lang/ref/Finalizer$FinalizerThread Stack: java/lang/ref/ReferenceQueue.remove(J)Ljava/lang/ref/Reference;+43 (ReferenceQueue.java:111) java/lang/ref/ReferenceQueue.remove()Ljava/lang/ref/Reference;+1 (ReferenceQueue.java:127) java/lang/ref/Finalizer$FinalizerThread.run()V+3 (Finalizer.java:171) Locks: None ........................................................................ . Class loader information . ........................................................................ 0 Default class loader 1 sun/reflect/DelegatingClassLoader 2 sun/misc/Launcher$ExtClassLoader ........................................................................ . GC heap information . ........................................................................ Loader Objects Class name ------ ------- ---------- 0 1493 [C 0 2122181 java/lang/String 0 47 [Ljava/util/Hashtable$Entry; 0 68 [Ljava/lang/Object; 0 1016 java/lang/Class 0 31 java/util/HashMap 0 37 java/util/Hashtable 0 2 java/lang/ThreadGroup 0 2 java/lang/RuntimePermission 0 2 java/lang/ref/ReferenceQueue$Null 0 5 java/lang/ref/ReferenceQueue 0 50 java/util/Vector 0 4 java/util/Stack 0 3 sun/misc/SoftCache 0 1 [Ljava/lang/ThreadGroup; 0 5 [Ljava/io/ObjectStreamField; 0 1 sun/reflect/ReflectionFactory 0 7 java/lang/ref/ReferenceQueue$Lock 0 10 java/lang/Object 0 1 java/lang/String$CaseInsensitiveComparator 0 1 java/util/Hashtable$EmptyEnumerator 0 1 java/util/Hashtable$EmptyIterator 0 33 [Ljava/util/HashMap$Entry; 0 19210 [J 0 1 sun/nio/cs/StandardCharsets 0 5 java/util/TreeMap 0 1075 java/util/TreeMap$Entry 0 469 [Ljava/lang/String; 0 1 java/lang/StringBuffer 0 2 java/io/FileInputStream 0 2 java/io/FileOutputStream 0 2 java/io/BufferedOutputStream 0 1 java/lang/reflect/ReflectPermission 0 1 [[Ljava/lang/ref/SoftReference; 0 2 [Ljava/lang/ref/SoftReference; 0 2 sun/nio/cs/Surrogate$Parser 0 3 sun/misc/Signal 0 1 [Ljava/io/File; 0 6 java/io/File 0 1 java/util/BitSet 0 17 sun/reflect/NativeConstructorAccessorImpl 0 2 java/net/URLClassLoader$ClassFinder 0 12 java/util/ArrayList 0 32 java/io/RandomAccessFile 0 16 java/lang/Thread 0 1 java/lang/ref/Reference$ReferenceHandler 0 1 java/lang/ref/Finalizer$FinalizerThread 0 266 [B 0 2 java/util/Properties 0 71 java/lang/ref/Finalizer 0 2 com/ibm/nio/cs/DirectEncoder 0 38 java/lang/reflect/Constructor 0 33 java/util/jar/JarFile 0 19200 java/lang/StackOverflowError 0 5 java/security/AccessControlContext 0 2 [Ljava/lang/Thread; 0 4 java/lang/OutOfMemoryError 0 1065 java/util/Hashtable$Entry 0 1 java/io/BufferedInputStream 0 2 java/io/PrintStream 0 2 java/io/OutputStreamWriter 0 428 [I 0 3 java/lang/ClassLoader$NativeLibrary 0 25 java/util/Locale 0 3 sun/misc/URLClassPath 0 30 java/util/zip/Inflater 0 612 java/util/HashMap$Entry 0 2 java/io/FilePermission 0 10 java/io/ObjectStreamField 0 1 java/security/BasicPermissionCollection 0 2 java/security/ProtectionDomain 0 1 java/lang/Integer$1 0 1 java/lang/ref/Reference$Lock 0 1 java/lang/Shutdown$Lock 0 1 java/lang/Runtime 0 36 java/io/FileDescriptor 0 1 java/lang/Long$1 0 202 java/lang/Long 0 3 java/lang/ThreadLocal 0 3 java/nio/charset/CodingErrorAction 0 2 java/nio/charset/CoderResult 0 1 java/nio/charset/CoderResult$1 0 1 java/nio/charset/CoderResult$2 0 1 sun/misc/Unsafe 0 2 java/nio/ByteOrder 0 1 java/io/Os400FileSystem 0 3 java/lang/Boolean 0 1 java/lang/Terminator$1 0 23 java/lang/Integer 0 2 sun/misc/NativeSignalHandler 0 1 sun/misc/Launcher$Factory 0 1 sun/misc/Launcher 0 53 [Ljava/lang/Class; 0 1 java/lang/reflect/ReflectAccess 0 18 sun/reflect/DelegatingConstructorAccessorImpl 0 1 sun/net/www/protocol/file/Handler 0 3 java/util/HashSet 0 3 sun/net/www/protocol/jar/Handler 0 1 java/util/jar/JavaUtilJarAccessImpl 0 1 java/net/UnknownContentHandler 0 2 [Ljava/security/Principal; 0 10 [Ljava/security/cert/Certificate; 0 2 sun/misc/AtomicLongCSImpl 0 3 sun/reflect/DelegatingMethodAccessorImpl 0 1 sun/security/util/ByteArrayLexOrder 0 1 sun/security/util/ByteArrayTagOrder 0 7 sun/security/x509/CertificateVersion 0 7 sun/security/x509/CertificateSerialNumber 0 7 sun/security/x509/SerialNumber 0 7 sun/security/x509/CertificateAlgorithmId 0 7 sun/security/x509/CertificateIssuerName 0 60 sun/security/x509/RDN 0 60 [Lsun/security/x509/AVA; 0 67 sun/security/util/DerInputStream 0 3 [Ljava/math/BigInteger; 0 2 com/ibm/nio/cs/Converter 0 2 sun/nio/cs/StreamEncoder$CharsetSE 0 35 java/lang/ref/SoftReference 0 2 java/nio/HeapByteBuffer 0 2 java/io/BufferedWriter 0 33 sun/misc/URLClassPath$JarLoader 0 4 java/lang/ThreadLocal$ThreadLocalMap$Entry 0 76 java/net/URL 0 1 sun/misc/Launcher$ExtClassLoader 0 1 sun/misc/Launcher$AppClassLoader 0 4 java/lang/Throwable 0 7 java/lang/reflect/Method 0 2 sun/misc/URLClassPath$FileLoader 0 2 java/security/CodeSource 0 2 java/security/Permissions 0 2 java/io/FilePermissionCollection 0 1 java/lang/ThreadLocal$ThreadLocalMap 0 1 javax/crypto/spec/SecretKeySpec 0 17 java/util/jar/Attributes$Name 0 1 [Ljava/lang/ThreadLocal$ThreadLocalMap$Entry; 0 1 java/security/SecureRandom 0 2 sun/security/provider/Sun 0 1 java/util/jar/JarFile$JarFileEntry 0 1 java/util/jar/JarVerifier 0 3 sun/reflect/NativeMethodAccessorImpl 0 116 sun/security/util/ObjectIdentifier 0 1 java/lang/Package 0 2 [S 0 104 java/math/BigInteger 0 20 sun/security/x509/AlgorithmId 0 14 sun/security/x509/X500Name 0 14 [Lsun/security/x509/RDN; 0 60 sun/security/x509/AVA 0 67 sun/security/util/DerValue 0 67 sun/security/util/DerInputBuffer 0 21 sun/security/x509/AVAKeyword 0 6 sun/security/x509/X509CertImpl 0 7 sun/security/x509/X509CertInfo 0 1 [Lsun/security/util/ObjectIdentifier; 0 1 [[Ljava/lang/Byte; 0 3 [[B 0 7 sun/security/provider/DSAPublicKey 0 7 sun/security/x509/AuthorityKeyIdentifierExtension 0 12 [Ljava/lang/Byte; 0 14 java/lang/Byte 0 7 sun/security/x509/CertificateSubjectName 0 7 sun/security/x509/CertificateX509Key 0 14 sun/security/x509/KeyIdentifier 0 4 [Z 0 5 sun/text/Normalizer$Mode 0 7 sun/security/x509/CertificateValidity 0 14 java/util/Date 0 7 sun/security/provider/DSAParameters 0 7 sun/security/util/BitArray 0 7 sun/security/x509/CertificateExtensions 0 7 java/security/AlgorithmParameters 0 7 sun/security/x509/SubjectKeyIdentifierExtension 0 5 sun/security/x509/BasicConstraintsExtension 0 2 sun/security/x509/KeyUsageExtension 0 1 sun/text/CompactCharArray 0 1 sun/text/CompactByteArray 0 1 sun/net/www/protocol/jar/JarFileFactory 0 1 java/util/Collections$EmptySet 0 1 java/util/Collections$EmptyList 0 1 java/util/Collections$ReverseComparator 0 1 com/ibm/security/jgss/i18n/PropertyResource 0 1 javax/crypto/b$0 0 1 sun/security/provider/X509Factory 0 1 sun/reflect/BootstrapConstructorAccessorImpl 1 1 sun/reflect/GeneratedConstructorAccessor3202134454 2 1 com/ibm/crypto/provider/IBMJCE 0 6 java/util/ResourceBundle$LoaderReference 0 1 [Lsun/security/x509/NetscapeCertTypeExtension$MapEntry; 0 1 com/sun/rsajca/Provider 0 1 com/ibm/security/cert/IBMCertPath 0 1 com/ibm/as400/ibmonly/net/ssl/Provider 0 1 com/ibm/jsse/IBMJSSEProvider 0 1 com/ibm/security/jgss/IBMJGSSProvider 0 5 org/ietf/jgss/Oid 0 1 java/util/PropertyResourceBundle 0 7 java/util/ResourceBundle$ResourceCacheKey 0 2 sun/net/www/protocol/jar/URLJarFile 0 6 sun/misc/SoftCache$ValueCell 0 1 java/util/Random 0 1 java/util/Collections$EmptyMap 0 112 com/ibm/security/util/ObjectIdentifier 0 5 java/security/Security$ProviderProperty 0 1 java/security/cert/CertificateFactory 0 1 sun/security/provider/SecureRandom 0 2 java/security/MessageDigest$Delegate 0 2 sun/security/provider/SHA 0 1 sun/util/calendar/ZoneInfo 0 4 com/ibm/security/x509/X500Name 0 2 [Ljava/security/cert/X509Certificate; 0 1 sun/reflect/DelegatingClassLoader 0 1 sun/security/x509/NetscapeCertTypeExtension 0 7 sun/security/x509/NetscapeCertTypeExtension$MapEntry 0 3 [[Ljava/lang/String; 0 3 java/util/Arrays$ArrayList 0 7 com/ibm/security/x509/NetscapeCertTypeExtension$MapEntry 0 1 com/ibm/security/validator/EndEntityChecker 0 1 java/util/AbstractList$Itr 0 1 com/ibm/security/util/ByteArrayLexOrder 0 1 com/ibm/security/util/ByteArrayTagOrder 0 18 [Lcom/ibm/security/x509/AVA; 0 18 com/ibm/security/util/DerInputStream 0 5 com/ibm/security/util/text/Normalizer$Mode 0 1 com/ibm/security/validator/SimpleValidator 0 1 [Lcom/ibm/security/x509/NetscapeCertTypeExtension$MapEntry; 0 4 [Lcom/ibm/security/x509/RDN; 0 1 java/util/Hashtable$Enumerator 0 4 java/util/LinkedHashMap$Entry 0 1 sun/text/resources/LocaleElements 0 1 sun/text/resources/LocaleElements_en 0 22 com/ibm/security/x509/AVAKeyword 0 4 javax/security/auth/x500/X500Principal 0 18 com/ibm/security/x509/RDN 0 18 com/ibm/security/x509/AVA 0 18 com/ibm/security/util/DerInputBuffer 0 18 com/ibm/security/util/DerValue 0 1 com/ibm/security/util/text/CompactCharArray 0 1 com/ibm/security/util/text/CompactByteArray 0 2 java/util/LinkedHashMap 0 1 java/net/InetAddress$1 0 2 [Ljava/net/InetAddress; 0 2 java/net/InetAddress$Cache 0 1 java/net/Inet4AddressImpl 0 3 java/net/Inet4Address 0 2 java/net/InetAddress$CacheEntry ........................................................................ . Global registry information . ........................................................................ Loader Objects Class name ------ ------- ---------- 0 23 [C 0 1017 java/lang/Class 0 1 java/lang/ref/Reference$ReferenceHandler 0 1 java/lang/ref/Finalizer$FinalizerThread 0 1 sun/misc/Launcher$AppClassLoader 0 32 java/io/RandomAccessFile 0 32 [B Can someone please advise me? Thanks a lot, Prasanna

Read the article

Is there a Telecommunications Reference Architecture?

- by raul.goycoolea

@font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Abstract Reference architecture provides needed architectural information that can be provided in advance to an enterprise to enable consistent architectural best practices. Enterprise Reference Architecture helps business owners to actualize their strategies, vision, objectives, and principles. It evaluates the IT systems, based on Reference Architecture goals, principles, and standards. It helps to reduce IT costs by increasing functionality, availability, scalability, etc. Telecom Reference Architecture provides customers with the flexibility to view bundled service bills online with the provision of multiple services. It provides real-time, flexible billing and charging systems, to handle complex promotions, discounts, and settlements with multiple parties. This paper attempts to describe the Reference Architecture for the Telecom Enterprises. It lays the foundation for a Telecom Reference Architecture by articulating the requirements, drivers, and pitfalls for telecom service providers. It describes generic reference architecture for telecom enterprises and moves on to explain how to achieve Enterprise Reference Architecture by using SOA. Introduction A Reference Architecture provides a methodology, set of practices, template, and standards based on a set of successful solutions implemented earlier. These solutions have been generalized and structured for the depiction of both a logical and a physical architecture, based on the harvesting of a set of patterns that describe observations in a number of successful implementations. It helps as a reference for the various architectures that an enterprise can implement to solve various problems. It can be used as the starting point or the point of comparisons for various departments/business entities of a company, or for the various companies for an enterprise. It provides multiple views for multiple stakeholders. Major artifacts of the Enterprise Reference Architecture are methodologies, standards, metadata, documents, design patterns, etc. Purpose of Reference Architecture In most cases, architects spend a lot of time researching, investigating, defining, and re-arguing architectural decisions. It is like reinventing the wheel as their peers in other organizations or even the same organization have already spent a lot of time and effort defining their own architectural practices. This prevents an organization from learning from its own experiences and applying that knowledge for increased effectiveness. Reference architecture provides missing architectural information that can be provided in advance to project team members to enable consistent architectural best practices. Enterprise Reference Architecture helps an enterprise to achieve the following at the abstract level: · Reference architecture is more of a communication channel to an enterprise · Helps the business owners to accommodate to their strategies, vision, objectives, and principles. · Evaluates the IT systems based on Reference Architecture Principles · Reduces IT spending through increasing functionality, availability, scalability, etc · A Real-time Integration Model helps to reduce the latency of the data updates Is used to define a single source of Information · Provides a clear view on how to manage information and security · Defines the policy around the data ownership, product boundaries, etc. · Helps with cost optimization across project and solution portfolios by eliminating unused or duplicate investments and assets · Has a shorter implementation time and cost Once the reference architecture is in place, the set of architectural principles, standards, reference models, and best practices ensure that the aligned investments have the greatest possible likelihood of success in both the near term and the long term (TCO). Common pitfalls for Telecom Service Providers Telecom Reference Architecture serves as the first step towards maturity for a telecom service provider. During the course of our assignments/experiences with telecom players, we have come across the following observations – Some of these indicate a lack of maturity of the telecom service provider: · In markets that are growing and not so mature, it has been observed that telcos have a significant amount of in-house or home-grown applications. In some of these markets, the growth has been so rapid that IT has been unable to cope with business demands. Telcos have shown a tendency to come up with workarounds in their IT applications so as to meet business needs. · Even for core functions like provisioning or mediation, some telcos have tried to manage with home-grown applications. · Most of the applications do not have the required scalability or maintainability to sustain growth in volumes or functionality. · Applications face interoperability issues with other applications in the operator's landscape. Integrating a new application or network element requires considerable effort on the part of the other applications. · Application boundaries are not clear, and functionality that is not in the initial scope of that application gets pushed onto it. This results in the development of the multiple, small applications without proper boundaries. · Usage of Legacy OSS/BSS systems, poor Integration across Multiple COTS Products and Internal Systems. Most of the Integrations are developed on ad-hoc basis and Point-to-Point Integration. · Redundancy of the business functions in different applications • Fragmented data across the different applications and no integrated view of the strategic data • Lot of performance Issues due to the usage of the complex integration across OSS and BSS systems However, this is where the maturity of the telecom industry as a whole can be of help. The collaborative efforts of telcos to overcome some of these problems have resulted in bodies like the TM Forum. They have come up with frameworks for business processes, data, applications, and technology for telecom service providers. These could be a good starting point for telcos to clean up their enterprise landscape. Industry Trends in Telecom Reference Architecture Telecom reference architectures are evolving rapidly because telcos are facing business and IT challenges. “The reality is that there probably is no killer application, no silver bullet that the telcos can latch onto to carry them into a 21st Century.... Instead, there are probably hundreds – perhaps thousands – of niche applications.... And the only way to find which of these works for you is to try out lots of them, ramp up the ones that work, and discontinue the ones that fail.” – Martin Creaner President & CTO TM Forum. The following trends have been observed in telecom reference architecture: · Transformation of business structures to align with customer requirements · Adoption of more Internet-like technical architectures. The Web 2.0 concept is increasingly being used. · Virtualization of the traditional operations support system (OSS) · Adoption of SOA to support development of IP-based services · Adoption of frameworks like Service Delivery Platforms (SDPs) and IP Multimedia Subsystem · (IMS) to enable seamless deployment of various services over fixed and mobile networks · Replacement of in-house, customized, and stove-piped OSS/BSS with standards-based COTS products · Compliance with industry standards and frameworks like eTOM, SID, and TAM to enable seamless integration with other standards-based products Drivers of Reference Architecture The drivers of the Reference Architecture are Reference Architecture Goals, Principles, and Enterprise Vision and Telecom Transformation. The details are depicted below diagram. @font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoCaption, li.MsoCaption, div.MsoCaption { margin: 0cm 0cm 10pt; font-size: 9pt; font-family: "Times New Roman"; color: rgb(79, 129, 189); font-weight: bold; }div.Section1 { page: Section1; } Figure 1. Drivers for Reference Architecture @font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Today’s telecom reference architectures should seamlessly integrate traditional legacy-based applications and transition to next-generation network technologies (e.g., IP multimedia subsystems). This has resulted in new requirements for flexible, real-time billing and OSS/BSS systems and implications on the service provider’s organizational requirements and structure. Telecom reference architectures are today expected to: · Integrate voice, messaging, email and other VAS over fixed and mobile networks, back end systems · Be able to provision multiple services and service bundles • Deliver converged voice, video and data services · Leverage the existing Network Infrastructure · Provide real-time, flexible billing and charging systems to handle complex promotions, discounts, and settlements with multiple parties. · Support charging of advanced data services such as VoIP, On-Demand, Services (e.g. Video), IMS/SIP Services, Mobile Money, Content Services and IPTV. · Help in faster deployment of new services • Serve as an effective platform for collaboration between network IT and business organizations · Harness the potential of converging technology, networks, devices and content to develop multimedia services and solutions of ever-increasing sophistication on a single Internet Protocol (IP) · Ensure better service delivery and zero revenue leakage through real-time balance and credit management · Lower operating costs to drive profitability Enterprise Reference Architecture The Enterprise Reference Architecture (RA) fills the gap between the concepts and vocabulary defined by the reference model and the implementation. Reference architecture provides detailed architectural information in a common format such that solutions can be repeatedly designed and deployed in a consistent, high-quality, supportable fashion. This paper attempts to describe the Reference Architecture for the Telecom Application Usage and how to achieve the Enterprise Level Reference Architecture using SOA. • Telecom Reference Architecture • Enterprise SOA based Reference Architecture Telecom Reference Architecture Tele Management Forum’s New Generation Operations Systems and Software (NGOSS) is an architectural framework for organizing, integrating, and implementing telecom systems. NGOSS is a component-based framework consisting of the following elements: · The enhanced Telecom Operations Map (eTOM) is a business process framework. · The Shared Information Data (SID) model provides a comprehensive information framework that may be specialized for the needs of a particular organization. · The Telecom Application Map (TAM) is an application framework to depict the functional footprint of applications, relative to the horizontal processes within eTOM. · The Technology Neutral Architecture (TNA) is an integrated framework. TNA is an architecture that is sustainable through technology changes. NGOSS Architecture Standards are: · Centralized data · Loosely coupled distributed systems · Application components/re-use · A technology-neutral system framework with technology specific implementations · Interoperability to service provider data/processes · Allows more re-use of business components across multiple business scenarios · Workflow automation The traditional operator systems architecture consists of four layers, · Business Support System (BSS) layer, with focus toward customers and business partners. Manages order, subscriber, pricing, rating, and billing information. · Operations Support System (OSS) layer, built around product, service, and resource inventories. · Networks layer – consists of Network elements and 3rd Party Systems. · Integration Layer – to maximize application communication and overall solution flexibility. Reference architecture for telecom enterprises is depicted below. @font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoCaption, li.MsoCaption, div.MsoCaption { margin: 0cm 0cm 10pt; font-size: 9pt; font-family: "Times New Roman"; color: rgb(79, 129, 189); font-weight: bold; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Figure 2. Telecom Reference Architecture The major building blocks of any Telecom Service Provider architecture are as follows: 1. Customer Relationship Management CRM encompasses the end-to-end lifecycle of the customer: customer initiation/acquisition, sales, ordering, and service activation, customer care and support, proactive campaigns, cross sell/up sell, and retention/loyalty. CRM also includes the collection of customer information and its application to personalize, customize, and integrate delivery of service to a customer, as well as to identify opportunities for increasing the value of the customer to the enterprise. The key functionalities related to Customer Relationship Management are · Manage the end-to-end lifecycle of a customer request for products. · Create and manage customer profiles. · Manage all interactions with customers – inquiries, requests, and responses. · Provide updates to Billing and other south bound systems on customer/account related updates such as customer/ account creation, deletion, modification, request bills, final bill, duplicate bills, credit limits through Middleware. · Work with Order Management System, Product, and Service Management components within CRM. · Manage customer preferences – Involve all the touch points and channels to the customer, including contact center, retail stores, dealers, self service, and field service, as well as via any media (phone, face to face, web, mobile device, chat, email, SMS, mail, the customer's bill, etc.). · Support single interface for customer contact details, preferences, account details, offers, customer premise equipment, bill details, bill cycle details, and customer interactions. CRM applications interact with customers through customer touch points like portals, point-of-sale terminals, interactive voice response systems, etc. The requests by customers are sent via fulfillment/provisioning to billing system for ordering processing. 2. Billing and Revenue Management Billing and Revenue Management handles the collection of appropriate usage records and production of timely and accurate bills – for providing pre-bill usage information and billing to customers; for processing their payments; and for performing payment collections. In addition, it handles customer inquiries about bills, provides billing inquiry status, and is responsible for resolving billing problems to the customer's satisfaction in a timely manner. This process grouping also supports prepayment for services. The key functionalities provided by these applications are · To ensure that enterprise revenue is billed and invoices delivered appropriately to customers. · To manage customers’ billing accounts, process their payments, perform payment collections, and monitor the status of the account balance. · To ensure the timely and effective fulfillment of all customer bill inquiries and complaints. · Collect the usage records from mediation and ensure appropriate rating and discounting of all usage and pricing. · Support revenue sharing; split charging where usage is guided to an account different from the service consumer. · Support prepaid and post-paid rating. · Send notification on approach / exceeding the usage thresholds as enforced by the subscribed offer, and / or as setup by the customer. · Support prepaid, post paid, and hybrid (where some services are prepaid and the rest of the services post paid) customers and conversion from post paid to prepaid, and vice versa. · Support different billing function requirements like charge prorating, promotion, discount, adjustment, waiver, write-off, account receivable, GL Interface, late payment fee, credit control, dunning, account or service suspension, re-activation, expiry, termination, contract violation penalty, etc. · Initiate direct debit to collect payment against an invoice outstanding. · Send notification to Middleware on different events; for example, payment receipt, pre-suspension, threshold exceed, etc. Billing systems typically get usage data from mediation systems for rating and billing. They get provisioning requests from order management systems and inquiries from CRM systems. Convergent and real-time billing systems can directly get usage details from network elements. 3. Mediation Mediation systems transform/translate the Raw or Native Usage Data Records into a general format that is acceptable to billing for their rating purposes. The following lists the high-level roles and responsibilities executed by the Mediation system in the end-to-end solution. · Collect Usage Data Records from different data sources – like network elements, routers, servers – via different protocol and interfaces. · Process Usage Data Records – Mediation will process Usage Data Records as per the source format. · Validate Usage Data Records from each source. · Segregates Usage Data Records coming from each source to multiple, based on the segregation requirement of end Application. · Aggregates Usage Data Records based on the aggregation rule if any from different sources. · Consolidates multiple Usage Data Records from each source. · Delivers formatted Usage Data Records to different end application like Billing, Interconnect, Fraud Management, etc. · Generates audit trail for incoming Usage Data Records and keeps track of all the Usage Data Records at various stages of mediation process. · Checks duplicate Usage Data Records across files for a given time window. 4. Fulfillment This area is responsible for providing customers with their requested products in a timely and correct manner. It translates the customer's business or personal need into a solution that can be delivered using the specific products in the enterprise's portfolio. This process informs the customers of the status of their purchase order, and ensures completion on time, as well as ensuring a delighted customer. These processes are responsible for accepting and issuing orders. They deal with pre-order feasibility determination, credit authorization, order issuance, order status and tracking, customer update on customer order activities, and customer notification on order completion. Order management and provisioning applications fall into this category. The key functionalities provided by these applications are · Issuing new customer orders, modifying open customer orders, or canceling open customer orders; · Verifying whether specific non-standard offerings sought by customers are feasible and supportable; · Checking the credit worthiness of customers as part of the customer order process; · Testing the completed offering to ensure it is working correctly; · Updating of the Customer Inventory Database to reflect that the specific product offering has been allocated, modified, or cancelled; · Assigning and tracking customer provisioning activities; · Managing customer provisioning jeopardy conditions; and · Reporting progress on customer orders and other processes to customer. These applications typically get orders from CRM systems. They interact with network elements and billing systems for fulfillment of orders. 5. Enterprise Management This process area includes those processes that manage enterprise-wide activities and needs, or have application within the enterprise as a whole. They encompass all business management processes that · Are necessary to support the whole of the enterprise, including processes for financial management, legal management, regulatory management, process, cost, and quality management, etc.; · Are responsible for setting corporate policies, strategies, and directions, and for providing guidelines and targets for the whole of the business, including strategy development and planning for areas, such as Enterprise Architecture, that are integral to the direction and development of the business; · Occur throughout the enterprise, including processes for project management, performance assessments, cost assessments, etc. (i) Enterprise Risk Management: Enterprise Risk Management focuses on assuring that risks and threats to the enterprise value and/or reputation are identified, and appropriate controls are in place to minimize or eliminate the identified risks. The identified risks may be physical or logical/virtual. Successful risk management ensures that the enterprise can support its mission critical operations, processes, applications, and communications in the face of serious incidents such as security threats/violations and fraud attempts. Two key areas covered in Risk Management by telecom operators are: · Revenue Assurance: Revenue assurance system will be responsible for identifying revenue loss scenarios across components/systems, and will help in rectifying the problems. The following lists the high-level roles and responsibilities executed by the Revenue Assurance system in the end-to-end solution. o Identify all usage information dropped when networks are being upgraded. o Interconnect bill verification. o Identify where services are routinely provisioned but never billed. o Identify poor sales policies that are intensifying collections problems. o Find leakage where usage is sent to error bucket and never billed for. o Find leakage where field service, CRM, and network build-out are not optimized. · Fraud Management: Involves collecting data from different systems to identify abnormalities in traffic patterns, usage patterns, and subscription patterns to report suspicious activity that might suggest fraudulent usage of resources, resulting in revenue losses to the operator. The key roles and responsibilities of the system component are as follows: o Fraud management system will capture and monitor high usage (over a certain threshold) in terms of duration, value, and number of calls for each subscriber. The threshold for each subscriber is decided by the system and fixed automatically. o Fraud management will be able to detect the unauthorized access to services for certain subscribers. These subscribers may have been provided unauthorized services by employees. The component will raise the alert to the operator the very first time of such illegal calls or calls which are not billed. o The solution will be to have an alarm management system that will deliver alarms to the operator/provider whenever it detects a fraud, thus minimizing fraud by catching it the first time it occurs. o The Fraud Management system will be capable of interfacing with switches, mediation systems, and billing systems (ii) Knowledge Management This process focuses on knowledge management, technology research within the enterprise, and the evaluation of potential technology acquisitions. Key responsibilities of knowledge base management are to · Maintain knowledge base – Creation and updating of knowledge base on ongoing basis. · Search knowledge base – Search of knowledge base on keywords or category browse · Maintain metadata – Management of metadata on knowledge base to ensure effective management and search. · Run report generator. · Provide content – Add content to the knowledge base, e.g., user guides, operational manual, etc. (iii) Document Management It focuses on maintaining a repository of all electronic documents or images of paper documents relevant to the enterprise using a system. (iv) Data Management It manages data as a valuable resource for any enterprise. For telecom enterprises, the typical areas covered are Master Data Management, Data Warehousing, and Business Intelligence. It is also responsible for data governance, security, quality, and database management. Key responsibilities of Data Management are · Using ETL, extract the data from CRM, Billing, web content, ERP, campaign management, financial, network operations, asset management info, customer contact data, customer measures, benchmarks, process data, e.g., process inputs, outputs, and measures, into Enterprise Data Warehouse. · Management of data traceability with source, data related business rules/decisions, data quality, data cleansing data reconciliation, competitors data – storage for all the enterprise data (customer profiles, products, offers, revenues, etc.) · Get online update through night time replication or physical backup process at regular frequency. · Provide the data access to business intelligence and other systems for their analysis, report generation, and use. (v) Business Intelligence It uses the Enterprise Data to provide the various analysis and reports that contain prospects and analytics for customer retention, acquisition of new customers due to the offers, and SLAs. It will generate right and optimized plans – bolt-ons for the customers. The following lists the high-level roles and responsibilities executed by the Business Intelligence system at the Enterprise Level: · It will do Pattern analysis and reports problem. · It will do Data Analysis – Statistical analysis, data profiling, affinity analysis of data, customer segment wise usage patterns on offers, products, service and revenue generation against services and customer segments. · It will do Performance (business, system, and forecast) analysis, churn propensity, response time, and SLAs analysis. · It will support for online and offline analysis, and report drill down capability. · It will collect, store, and report various SLA data. · It will provide the necessary intelligence for marketing and working on campaigns, etc., with cost benefit analysis and predictions. It will advise on customer promotions with additional services based on loyalty and credit history of customer · It will Interface with Enterprise Data Management system for data to run reports and analysis tasks. It will interface with the campaign schedules, based on historical success evidence. (vi) Stakeholder and External Relations Management It manages the enterprise's relationship with stakeholders and outside entities. Stakeholders include shareholders, employee organizations, etc. Outside entities include regulators, local community, and unions. Some of the processes within this grouping are Shareholder Relations, External Affairs, Labor Relations, and Public Relations. (vii) Enterprise Resource Planning It is used to manage internal and external resources, including tangible assets, financial resources, materials, and human resources. Its purpose is to facilitate the flow of information between all business functions inside the boundaries of the enterprise and manage the connections to outside stakeholders. ERP systems consolidate all business operations into a uniform and enterprise wide system environment. The key roles and responsibilities for Enterprise System are given below: · It will handle responsibilities such as core accounting, financial, and management reporting. · It will interface with CRM for capturing customer account and details. · It will interface with billing to capture the billing revenue and other financial data. · It will be responsible for executing the dunning process. Billing will send the required feed to ERP for execution of dunning. · It will interface with the CRM and Billing through batch interfaces. Enterprise management systems are like horizontals in the enterprise and typically interact with all major telecom systems. E.g., an ERP system interacts with CRM, Fulfillment, and Billing systems for different kinds of data exchanges. 6. External Interfaces/Touch Points The typical external parties are customers, suppliers/partners, employees, shareholders, and other stakeholders. External interactions from/to a Service Provider to other parties can be achieved by a variety of mechanisms, including: · Exchange of emails or faxes · Call Centers · Web Portals · Business-to-Business (B2B) automated transactions These applications provide an Internet technology driven interface to external parties to undertake a variety of business functions directly for themselves. These can provide fully or partially automated service to external parties through various touch points. Typical characteristics of these touch points are · Pre-integrated self-service system, including stand-alone web framework or integration front end with a portal engine · Self services layer exposing atomic web services/APIs for reuse by multiple systems across the architectural environment · Portlets driven connectivity exposing data and services interoperability through a portal engine or web application These touch points mostly interact with the CRM systems for requests, inquiries, and responses. 7. Middleware The component will be primarily responsible for integrating the different systems components under a common platform. It should provide a Standards-Based Platform for building Service Oriented Architecture and Composite Applications. The following lists the high-level roles and responsibilities executed by the Middleware component in the end-to-end solution. · As an integration framework, covering to and fro interfaces · Provide a web service framework with service registry. · Support SOA framework with SOA service registry. · Each of the interfaces from / to Middleware to other components would handle data transformation, translation, and mapping of data points. · Receive data from the caller / activate and/or forward the data to the recipient system in XML format. · Use standard XML for data exchange. · Provide the response back to the service/call initiator. · Provide a tracking until the response completion. · Keep a store transitional data against each call/transaction. · Interface through Middleware to get any information that is possible and allowed from the existing systems to enterprise systems; e.g., customer profile and customer history, etc. · Provide the data in a common unified format to the SOA calls across systems, and follow the Enterprise Architecture directive. · Provide an audit trail for all transactions being handled by the component. 8. Network Elements The term Network Element means a facility or equipment used in the provision of a telecommunications service. Such terms also includes features, functions, and capabilities that are provided by means of such facility or equipment, including subscriber numbers, databases, signaling systems, and information sufficient for billing and collection or used in the transmission, routing, or other provision of a telecommunications service. Typical network elements in a GSM network are Home Location Register (HLR), Intelligent Network (IN), Mobile Switching Center (MSC), SMS Center (SMSC), and network elements for other value added services like Push-to-talk (PTT), Ring Back Tone (RBT), etc. Network elements are invoked when subscribers use their telecom devices for any kind of usage. These elements generate usage data and pass it on to downstream systems like mediation and billing system for rating and billing. They also integrate with provisioning systems for order/service fulfillment. 9. 3rd Party Applications 3rd Party systems are applications like content providers, payment gateways, point of sale terminals, and databases/applications maintained by the Government. Depending on applicability and the type of functionality provided by 3rd party applications, the integration with different telecom systems like CRM, provisioning, and billing will be done. 10. Service Delivery Platform A service delivery platform (SDP) provides the architecture for the rapid deployment, provisioning, execution, management, and billing of value added telecom services. SDPs are based on the concept of SOA and layered architecture. They support the delivery of voice, data services, and content in network and device-independent fashion. They allow application developers to aggregate network capabilities, services, and sources of content. SDPs typically contain layers for web services exposure, service application development, and network abstraction. SOA Reference Architecture SOA concept is based on the principle of developing reusable business service and building applications by composing those services, instead of building monolithic applications in silos. It’s about bridging the gap between business and IT through a set of business-aligned IT services, using a set of design principles, patterns, and techniques. In an SOA, resources are made available to participants in a value net, enterprise, line of business (typically spanning multiple applications within an enterprise or across multiple enterprises). It consists of a set of business-aligned IT services that collectively fulfill an organization’s business processes and goals. We can choreograph these services into composite applications and invoke them through standard protocols. SOA, apart from agility and reusability, enables: · The business to specify processes as orchestrations of reusable services · Technology agnostic business design, with technology hidden behind service interface · A contractual-like interaction between business and IT, based on service SLAs · Accountability and governance, better aligned to business services · Applications interconnections untangling by allowing access only through service interfaces, reducing the daunting side effects of change · Reduced pressure to replace legacy and extended lifetime for legacy applications, through encapsulation in services · A Cloud Computing paradigm, using web services technologies, that makes possible service outsourcing on an on-demand, utility-like, pay-per-usage basis The following section represents the Reference Architecture of logical view for the Telecom Solution. The new custom built application needs to align with this logical architecture in the long run to achieve EA benefits. Packaged implementation applications, such as ERP billing applications, need to expose their functions as service providers (as other applications consume) and interact with other applications as service consumers. COT applications need to expose services through wrappers such as adapters to utilize existing resources and at the same time achieve Enterprise Architecture goal and objectives. The following are the various layers for Enterprise level deployment of SOA. This diagram captures the abstract view of Enterprise SOA layers and important components of each layer. Layered architecture means decomposition of services such that most interactions occur between adjacent layers. However, there is no strict rule that top layers should not directly communicate with bottom layers. The diagram below represents the important logical pieces that would result from overall SOA transformation. @font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoCaption, li.MsoCaption, div.MsoCaption { margin: 0cm 0cm 10pt; font-size: 9pt; font-family: "Times New Roman"; color: rgb(79, 129, 189); font-weight: bold; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Figure 3. Enterprise SOA Reference Architecture 1. Operational System Layer: This layer consists of all packaged applications like CRM, ERP, custom built applications, COTS based applications like Billing, Revenue Management, Fulfilment, and the Enterprise databases that are essential and contribute directly or indirectly to the Enterprise OSS/BSS Transformation. ERP holds the data of Asset Lifecycle Management, Supply Chain, and Advanced Procurement and Human Capital Management, etc. CRM holds the data related to Order, Sales, and Marketing, Customer Care, Partner Relationship Management, Loyalty, etc. Content Management handles Enterprise Search and Query. Billing application consists of the following components: · Collections Management, Customer Billing Management, Invoices, Real-Time Rating, Discounting, and Applying of Charges · Enterprise databases will hold both the application and service data, whether structured or unstructured. MDM - Master data majorly consists of Customer, Order, Product, and Service Data. 2. Enterprise Component Layer: This layer consists of the Application Services and Common Services that are responsible for realizing the functionality and maintaining the QoS of the exposed services. This layer uses container-based technologies such as application servers to implement the components, workload management, high availability, and load balancing. Application Services: This Service Layer enables application, technology, and database abstraction so that the complex accessing logic is hidden from the other service layers. This is a basic service layer, which exposes application functionalities and data as reusable services. The three types of the Application access services are: · Application Access Service: This Service Layer exposes application level functionalities as a reusable service between BSS to BSS and BSS to OSS integration. This layer is enabled using disparate technology such as Web Service, Integration Servers, and Adaptors, etc. · Data Access Service: This Service Layer exposes application data services as a reusable reference data service. This is done via direct interaction with application data. and provides the federated query. · Network Access Service: This Service Layer exposes provisioning layer as a reusable service from OSS to OSS integration. This integration service emphasizes the need for high performance, stateless process flows, and distributed design. Common Services encompasses management of structured, semi-structured, and unstructured data such as information services, portal services, interaction services, infrastructure services, and security services, etc. 3. Integration Layer: This consists of service infrastructure components like service bus, service gateway for partner integration, service registry, service repository, and BPEL processor. Service bus will carry the service invocation payloads/messages between consumers and providers. The other important functions expected from it are itinerary based routing, distributed caching of routing information, transformations, and all qualities of service for messaging-like reliability, scalability, and availability, etc. Service registry will hold all contracts (wsdl) of services, and it helps developers to locate or discover service during design time or runtime. • BPEL processor would be useful in orchestrating the services to compose a complex business scenario or process. • Workflow and business rules management are also required to support manual triggering of certain activities within business process. based on the rules setup and also the state machine information. Application, data, and service mediation layer typically forms the overall composite application development framework or SOA Framework. 4. Business Process Layer: These are typically the intermediate services layer and represent Shared Business Process Services. At Enterprise Level, these services are from Customer Management, Order Management, Billing, Finance, and Asset Management application domains. 5. Access Layer: This layer consists of portals for Enterprise and provides a single view of Enterprise information management and dashboard services. 6. Channel Layer: This consists of various devices; applications that form part of extended enterprise; browsers through which users access the applications. 7. Client Layer: This designates the different types of users accessing the enterprise applications. The type of user typically would be an important factor in determining the level of access to applications. 8. Vertical pieces like management, monitoring, security, and development cut across all horizontal layers Management and monitoring involves all aspects of SOA-like services, SLAs, and other QoS lifecycle processes for both applications and services surrounding SOA governance. 9. EA Governance, Reference Architecture, Roadmap, Principles, and Best Practices: EA Governance is important in terms of providing the overall direction to SOA implementation within the enterprise. This involves board-level involvement, in addition to business and IT executives. At a high level, this involves managing the SOA projects implementation, managing SOA infrastructure, and controlling the entire effort through all fine-tuned IT processes in accordance with COBIT (Control Objectives for Information Technology). Devising tools and techniques to promote reuse culture, and the SOA way of doing things needs competency centers to be established in addition to training the workforce to take up new roles that are suited to SOA journey. Conclusions Reference Architectures can serve as the basis for disparate architecture efforts throughout the organization, even if they use different tools and technologies. Reference architectures provide best practices and approaches in the independent way a vendor deals with technology and standards. Reference Architectures model the abstract architectural elements for an enterprise independent of the technologies, protocols, and products that are used to implement an SOA. Telecom enterprises today are facing significant business and technology challenges due to growing competition, a multitude of services, and convergence. Adopting architectural best practices could go a long way in meeting these challenges. The use of SOA-based architecture for communication to each of the external systems like Billing, CRM, etc., in OSS/BSS system has made the architecture very loosely coupled, with greater flexibility. Any change in the external systems would be absorbed at the Integration Layer without affecting the rest of the ecosystem. The use of a Business Process Management (BPM) tool makes the management and maintenance of the business processes easy, with better performance in terms of lead time, quality, and cost. Since the Architecture is based on standards, it will lower the cost of deploying and managing OSS/BSS applications over their lifecycles.

Read the article

Agile Development

- by James Oloo Onyango

Alot of literature has and is being written about agile developement and its surrounding philosophies. In my quest to find the best way to express the importance of agile methodologies, i have found Robert C. Martin's "A Satire Of Two Companies" to be both the most concise and thorough! Enjoy the read! Rufus Inc Project Kick Off Your name is Bob. The date is January 3, 2001, and your head still aches from the recent millennial revelry. You are sitting in a conference room with several managers and a group of your peers. You are a project team leader. Your boss is there, and he has brought along all of his team leaders. His boss called the meeting. "We have a new project to develop," says your boss's boss. Call him BB. The points in his hair are so long that they scrape the ceiling. Your boss's points are just starting to grow, but he eagerly awaits the day when he can leave Brylcream stains on the acoustic tiles. BB describes the essence of the new market they have identified and the product they want to develop to exploit this market. "We must have this new project up and working by fourth quarter October 1," BB demands. "Nothing is of higher priority, so we are cancelling your current project." The reaction in the room is stunned silence. Months of work are simply going to be thrown away. Slowly, a murmur of objection begins to circulate around the conference table. His points give off an evil green glow as BB meets the eyes of everyone in the room. One by one, that insidious stare reduces each attendee to quivering lumps of protoplasm. It is clear that he will brook no discussion on this matter. Once silence has been restored, BB says, "We need to begin immediately. How long will it take you to do the analysis?" You raise your hand. Your boss tries to stop you, but his spitwad misses you and you are unaware of his efforts. "Sir, we can't tell you how long the analysis will take until we have some requirements." "The requirements document won't be ready for 3 or 4 weeks," BB says, his points vibrating with frustration. "So, pretend that you have the requirements in front of you now. How long will you require for analysis?" No one breathes. Everyone looks around to see whether anyone has some idea. "If analysis goes beyond April 1, we have a problem. Can you finish the analysis by then?" Your boss visibly gathers his courage: "We'll find a way, sir!" His points grow 3 mm, and your headache increases by two Tylenol. "Good." BB smiles. "Now, how long will it take to do the design?" "Sir," you say. Your boss visibly pales. He is clearly worried that his 3 mms are at risk. "Without an analysis, it will not be possible to tell you how long design will take." BB's expression shifts beyond austere. "PRETEND you have the analysis already!" he says, while fixing you with his vacant, beady little eyes. "How long will it take you to do the design?" Two Tylenol are not going to cut it. Your boss, in a desperate attempt to save his new growth, babbles: "Well, sir, with only six months left to complete the project, design had better take no longer than 3 months." "I'm glad you agree, Smithers!" BB says, beaming. Your boss relaxes. He knows his points are secure. After a while, he starts lightly humming the Brylcream jingle. BB continues, "So, analysis will be complete by April 1, design will be complete by July 1, and that gives you 3 months to implement the project. This meeting is an example of how well our new consensus and empowerment policies are working. Now, get out there and start working. I'll expect to see TQM plans and QIT assignments on my desk by next week. Oh, and don't forget that your crossfunctional team meetings and reports will be needed for next month's quality audit." "Forget the Tylenol," you think to yourself as you return to your cubicle. "I need bourbon." Visibly excited, your boss comes over to you and says, "Gosh, what a great meeting. I think we're really going to do some world shaking with this project." You nod in agreement, too disgusted to do anything else. "Oh," your boss continues, "I almost forgot." He hands you a 30-page document. "Remember that the SEI is coming to do an evaluation next week. This is the evaluation guide. You need to read through it, memorize it, and then shred it. It tells you how to answer any questions that the SEI auditors ask you. It also tells you what parts of the building you are allowed to take them to and what parts to avoid. We are determined to be a CMM level 3 organization by June!" You and your peers start working on the analysis of the new project. This is difficult because you have no requirements. But from the 10-minute introduction given by BB on that fateful morning, you have some idea of what the product is supposed to do. Corporate process demands that you begin by creating a use case document. You and your team begin enumerating use cases and drawing oval and stick diagrams. Philosophical debates break out among the team members. There is disagreement as to whether certain use cases should be connected with <<extends>> or <<includes>> relationships. Competing models are created, but nobody knows how to evaluate them. The debate continues, effectively paralyzing progress. After a week, somebody finds the iceberg.com Web site, which recommends disposing entirely of <<extends>> and <<includes>> and replacing them with <<precedes>> and <<uses>>. The documents on this Web site, authored by Don Sengroiux, describes a method known as stalwart-analysis, which claims to be a step-by-step method for translating use cases into design diagrams. More competing use case models are created using this new scheme, but again, people can't agree on how to evaluate them. The thrashing continues. More and more, the use case meetings are driven by emotion rather than by reason. If it weren't for the fact that you don't have requirements, you'd be pretty upset by the lack of progress you are making. The requirements document arrives on February 15. And then again on February 20, 25, and every week thereafter. Each new version contradicts the previous one. Clearly, the marketing folks who are writing the requirements, empowered though they might be, are not finding consensus. At the same time, several new competing use case templates have been proposed by the various team members. Each template presents its own particularly creative way of delaying progress. The debates rage on. On March 1, Prudence Putrigence, the process proctor, succeeds in integrating all the competing use case forms and templates into a single, all-encompassing form. Just the blank form is 15 pages long. She has managed to include every field that appeared on all the competing templates. She also presents a 159- page document describing how to fill out the use case form. All current use cases must be rewritten according to the new standard. You marvel to yourself that it now requires 15 pages of fill-in-the-blank and essay questions to answer the question: What should the system do when the user presses Return? The corporate process (authored by L. E. Ott, famed author of "Holistic Analysis: A Progressive Dialectic for Software Engineers") insists that you discover all primary use cases, 87 percent of all secondary use cases, and 36.274 percent of all tertiary use cases before you can complete analysis and enter the design phase. You have no idea what a tertiary use case is. So in an attempt to meet this requirement, you try to get your use case document reviewed by the marketing department, which you hope will know what a tertiary use case is. Unfortunately, the marketing folks are too busy with sales support to talk to you. Indeed, since the project started, you have not been able to get a single meeting with marketing, which has provided a never-ending stream of changing and contradictory requirements documents. While one team has been spinning endlessly on the use case document, another team has been working out the domain model. Endless variations of UML documents are pouring out of this team. Every week, the model is reworked. The team members can't decide whether to use <<interfaces>> or <<types>> in the model. A huge disagreement has been raging on the proper syntax and application of OCL. Others on the team just got back from a 5-day class on catabolism, and have been producing incredibly detailed and arcane diagrams that nobody else can fathom. On March 27, with one week to go before analysis is to be complete, you have produced a sea of documents and diagrams but are no closer to a cogent analysis of the problem than you were on January 3. **** And then, a miracle happens. **** On Saturday, April 1, you check your e-mail from home. You see a memo from your boss to BB. It states unequivocally that you are done with the analysis! You phone your boss and complain. "How could you have told BB that we were done with the analysis?" "Have you looked at a calendar lately?" he responds. "It's April 1!" The irony of that date does not escape you. "But we have so much more to think about. So much more to analyze! We haven't even decided whether to use <<extends>> or <<precedes>>!" "Where is your evidence that you are not done?" inquires your boss, impatiently. "Whaaa . . . ." But he cuts you off. "Analysis can go on forever; it has to be stopped at some point. And since this is the date it was scheduled to stop, it has been stopped. Now, on Monday, I want you to gather up all existing analysis materials and put them into a public folder. Release that folder to Prudence so that she can log it in the CM system by Monday afternoon. Then get busy and start designing." As you hang up the phone, you begin to consider the benefits of keeping a bottle of bourbon in your bottom desk drawer. They threw a party to celebrate the on-time completion of the analysis phase. BB gave a colon-stirring speech on empowerment. And your boss, another 3 mm taller, congratulated his team on the incredible show of unity and teamwork. Finally, the CIO takes the stage to tell everyone that the SEI audit went very well and to thank everyone for studying and shredding the evaluation guides that were passed out. Level 3 now seems assured and will be awarded by June. (Scuttlebutt has it that managers at the level of BB and above are to receive significant bonuses once the SEI awards level 3.) As the weeks flow by, you and your team work on the design of the system. Of course, you find that the analysis that the design is supposedly based on is flawedno, useless; no, worse than useless. But when you tell your boss that you need to go back and work some more on the analysis to shore up its weaker sections, he simply states, "The analysis phase is over. The only allowable activity is design. Now get back to it." So, you and your team hack the design as best you can, unsure of whether the requirements have been properly analyzed. Of course, it really doesn't matter much, since the requirements document is still thrashing with weekly revisions, and the marketing department still refuses to meet with you. The design is a nightmare. Your boss recently misread a book named The Finish Line in which the author, Mark DeThomaso, blithely suggested that design documents should be taken down to code-level detail. "If we are going to be working at that level of detail," you ask, "why don't we simply write the code instead?" "Because then you wouldn't be designing, of course. And the only allowable activity in the design phase is design!" "Besides," he continues, "we have just purchased a companywide license for Dandelion! This tool enables 'Round the Horn Engineering!' You are to transfer all design diagrams into this tool. It will automatically generate our code for us! It will also keep the design diagrams in sync with the code!" Your boss hands you a brightly colored shrinkwrapped box containing the Dandelion distribution. You accept it numbly and shuffle off to your cubicle. Twelve hours, eight crashes, one disk reformatting, and eight shots of 151 later, you finally have the tool installed on your server. You consider the week your team will lose while attending Dandelion training. Then you smile and think, "Any week I'm not here is a good week." Design diagram after design diagram is created by your team. Dandelion makes it very difficult to draw these diagrams. There are dozens and dozens of deeply nested dialog boxes with funny text fields and check boxes that must all be filled in correctly. And then there's the problem of moving classes between packages. At first, these diagram are driven from the use cases. But the requirements are changing so often that the use cases rapidly become meaningless. Debates rage about whether VISITOR or DECORATOR design patterns should be used. One developer refuses to use VISITOR in any form, claiming that it's not a properly object-oriented construct. Someone refuses to use multiple inheritance, since it is the spawn of the devil. Review meetings rapidly degenerate into debates about the meaning of object orientation, the definition of analysis versus design, or when to use aggregation versus association. Midway through the design cycle, the marketing folks announce that they have rethought the focus of the system. Their new requirements document is completely restructured. They have eliminated several major feature areas and replaced them with feature areas that they anticipate customer surveys will show to be more appropriate. You tell your boss that these changes mean that you need to reanalyze and redesign much of the system. But he says, "The analysis phase is system. But he says, "The analysis phase is over. The only allowable activity is design. Now get back to it." You suggest that it might be better to create a simple prototype to show to the marketing folks and even some potential customers. But your boss says, "The analysis phase is over. The only allowable activity is design. Now get back to it." Hack, hack, hack, hack. You try to create some kind of a design document that might reflect the new requirements documents. However, the revolution of the requirements has not caused them to stop thrashing. Indeed, if anything, the wild oscillations of the requirements document have only increased in frequency and amplitude. You slog your way through them. On June 15, the Dandelion database gets corrupted. Apparently, the corruption has been progressive. Small errors in the DB accumulated over the months into bigger and bigger errors. Eventually, the CASE tool just stopped working. Of course, the slowly encroaching corruption is present on all the backups. Calls to the Dandelion technical support line go unanswered for several days. Finally, you receive a brief e-mail from Dandelion, informing you that this is a known problem and that the solution is to purchase the new version, which they promise will be ready some time next quarter, and then reenter all the diagrams by hand. **** Then, on July 1 another miracle happens! You are done with the design! Rather than go to your boss and complain, you stock your middle desk drawer with some vodka. **** They threw a party to celebrate the on-time completion of the design phase and their graduation to CMM level 3. This time, you find BB's speech so stirring that you have to use the restroom before it begins. New banners and plaques are all over your workplace. They show pictures of eagles and mountain climbers, and they talk about teamwork and empowerment. They read better after a few scotches. That reminds you that you need to clear out your file cabinet to make room for the brandy. You and your team begin to code. But you rapidly discover that the design is lacking in some significant areas. Actually, it's lacking any significance at all. You convene a design session in one of the conference rooms to try to work through some of the nastier problems. But your boss catches you at it and disbands the meeting, saying, "The design phase is over. The only allowable activity is coding. Now get back to it." **** The code generated by Dandelion is really hideous. It turns out that you and your team were using association and aggregation the wrong way, after all. All the generated code has to be edited to correct these flaws. Editing this code is extremely difficult because it has been instrumented with ugly comment blocks that have special syntax that Dandelion needs in order to keep the diagrams in sync with the code. If you accidentally alter one of these comments, the diagrams will be regenerated incorrectly. It turns out that "Round the Horn Engineering" requires an awful lot of effort. The more you try to keep the code compatible with Dandelion, the more errors Dandelion generates. In the end, you give up and decide to keep the diagrams up to date manually. A second later, you decide that there's no point in keeping the diagrams up to date at all. Besides, who has time? Your boss hires a consultant to build tools to count the number of lines of code that are being produced. He puts a big thermometer graph on the wall with the number 1,000,000 on the top. Every day, he extends the red line to show how many lines have been added. Three days after the thermometer appears on the wall, your boss stops you in the hall. "That graph isn't growing quickly enough. We need to have a million lines done by October 1." "We aren't even sh-sh-sure that the proshect will require a m-million linezh," you blather. "We have to have a million lines done by October 1," your boss reiterates. His points have grown again, and the Grecian formula he uses on them creates an aura of authority and competence. "Are you sure your comment blocks are big enough?" Then, in a flash of managerial insight, he says, "I have it! I want you to institute a new policy among the engineers. No line of code is to be longer than 20 characters. Any such line must be split into two or more preferably more. All existing code needs to be reworked to this standard. That'll get our line count up!" You decide not to tell him that this will require two unscheduled work months. You decide not to tell him anything at all. You decide that intravenous injections of pure ethanol are the only solution. You make the appropriate arrangements. Hack, hack, hack, and hack. You and your team madly code away. By August 1, your boss, frowning at the thermometer on the wall, institutes a mandatory 50-hour workweek. Hack, hack, hack, and hack. By September 1st, the thermometer is at 1.2 million lines and your boss asks you to write a report describing why you exceeded the coding budget by 20 percent. He institutes mandatory Saturdays and demands that the project be brought back down to a million lines. You start a campaign of remerging lines. Hack, hack, hack, and hack. Tempers are flaring; people are quitting; QA is raining trouble reports down on you. Customers are demanding installation and user manuals; salespeople are demanding advance demonstrations for special customers; the requirements document is still thrashing, the marketing folks are complaining that the product isn't anything like they specified, and the liquor store won't accept your credit card anymore. Something has to give. On September 15, BB calls a meeting. As he enters the room, his points are emitting clouds of steam. When he speaks, the bass overtones of his carefully manicured voice cause the pit of your stomach to roll over. "The QA manager has told me that this project has less than 50 percent of the required features implemented. He has also informed me that the system crashes all the time, yields wrong results, and is hideously slow. He has also complained that he cannot keep up with the continuous train of daily releases, each more buggy than the last!" He stops for a few seconds, visibly trying to compose himself. "The QA manager estimates that, at this rate of development, we won't be able to ship the product until December!" Actually, you think it's more like March, but you don't say anything. "December!" BB roars with such derision that people duck their heads as though he were pointing an assault rifle at them. "December is absolutely out of the question. Team leaders, I want new estimates on my desk in the morning. I am hereby mandating 65-hour work weeks until this project is complete. And it better be complete by November 1." As he leaves the conference room, he is heard to mutter: "Empowermentbah!" * * * Your boss is bald; his points are mounted on BB's wall. The fluorescent lights reflecting off his pate momentarily dazzle you. "Do you have anything to drink?" he asks. Having just finished your last bottle of Boone's Farm, you pull a bottle of Thunderbird from your bookshelf and pour it into his coffee mug. "What's it going to take to get this project done? " he asks. "We need to freeze the requirements, analyze them, design them, and then implement them," you say callously. "By November 1?" your boss exclaims incredulously. "No way! Just get back to coding the damned thing." He storms out, scratching his vacant head. A few days later, you find that your boss has been transferred to the corporate research division. Turnover has skyrocketed. Customers, informed at the last minute that their orders cannot be fulfilled on time, have begun to cancel their orders. Marketing is re-evaluating whether this product aligns with the overall goals of the company. Memos fly, heads roll, policies change, and things are, overall, pretty grim. Finally, by March, after far too many sixty-five hour weeks, a very shaky version of the software is ready. In the field, bug-discovery rates are high, and the technical support staff are at their wits' end, trying to cope with the complaints and demands of the irate customers. Nobody is happy. In April, BB decides to buy his way out of the problem by licensing a product produced by Rupert Industries and redistributing it. The customers are mollified, the marketing folks are smug, and you are laid off. Rupert Industries: Project Alpha Your name is Robert. The date is January 3, 2001. The quiet hours spent with your family this holiday have left you refreshed and ready for work. You are sitting in a conference room with your team of professionals. The manager of the division called the meeting. "We have some ideas for a new project," says the division manager. Call him Russ. He is a high-strung British chap with more energy than a fusion reactor. He is ambitious and driven but understands the value of a team. Russ describes the essence of the new market opportunity the company has identified and introduces you to Jane, the marketing manager, who is responsible for defining the products that will address it. Addressing you, Jane says, "We'd like to start defining our first product offering as soon as possible. When can you and your team meet with me?" You reply, "We'll be done with the current iteration of our project this Friday. We can spare a few hours for you between now and then. After that, we'll take a few people from the team and dedicate them to you. We'll begin hiring their replacements and the new people for your team immediately." "Great," says Russ, "but I want you to understand that it is critical that we have something to exhibit at the trade show coming up this July. If we can't be there with something significant, we'll lose the opportunity." "I understand," you reply. "I don't yet know what it is that you have in mind, but I'm sure we can have something by July. I just can't tell you what that something will be right now. In any case, you and Jane are going to have complete control over what we developers do, so you can rest assured that by July, you'll have the most important things that can be accomplished in that time ready to exhibit." Russ nods in satisfaction. He knows how this works. Your team has always kept him advised and allowed him to steer their development. He has the utmost confidence that your team will work on the most important things first and will produce a high-quality product. * * * "So, Robert," says Jane at their first meeting, "How does your team feel about being split up?" "We'll miss working with each other," you answer, "but some of us were getting pretty tired of that last project and are looking forward to a change. So, what are you people cooking up?" Jane beams. "You know how much trouble our customers currently have . . ." And she spends a half hour or so describing the problem and possible solution. "OK, wait a second" you respond. "I need to be clear about this." And so you and Jane talk about how this system might work. Some of her ideas aren't fully formed. You suggest possible solutions. She likes some of them. You continue discussing. During the discussion, as each new topic is addressed, Jane writes user story cards. Each card represents something that the new system has to do. The cards accumulate on the table and are spread out in front of you. Both you and Jane point at them, pick them up, and make notes on them as you discuss the stories. The cards are powerful mnemonic devices that you can use to represent complex ideas that are barely formed. At the end of the meeting, you say, "OK, I've got a general idea of what you want. I'm going to talk to the team about it. I imagine they'll want to run some experiments with various database structures and presentation formats. Next time we meet, it'll be as a group, and we'll start identifying the most important features of the system." A week later, your nascent team meets with Jane. They spread the existing user story cards out on the table and begin to get into some of the details of the system. The meeting is very dynamic. Jane presents the stories in the order of their importance. There is much discussion about each one. The developers are concerned about keeping the stories small enough to estimate and test. So they continually ask Jane to split one story into several smaller stories. Jane is concerned that each story have a clear business value and priority, so as she splits them, she makes sure that this stays true. The stories accumulate on the table. Jane writes them, but the developers make notes on them as needed. Nobody tries to capture everything that is said; the cards are not meant to capture everything but are simply reminders of the conversation. As the developers become more comfortable with the stories, they begin writing estimates on them. These estimates are crude and budgetary, but they give Jane an idea of what the story will cost. At the end of the meeting, it is clear that many more stories could be discussed. It is also clear that the most important stories have been addressed and that they represent several months worth of work. Jane closes the meeting by taking the cards with her and promising to have a proposal for the first release in the morning. * * * The next morning, you reconvene the meeting. Jane chooses five cards and places them on the table. "According to your estimates, these cards represent about one perfect team-week's worth of work. The last iteration of the previous project managed to get one perfect team-week done in 3 real weeks. If we can get these five stories done in 3 weeks, we'll be able to demonstrate them to Russ. That will make him feel very comfortable about our progress." Jane is pushing it. The sheepish look on her face lets you know that she knows it too. You reply, "Jane, this is a new team, working on a new project. It's a bit presumptuous to expect that our velocity will be the same as the previous team's. However, I met with the team yesterday afternoon, and we all agreed that our initial velocity should, in fact, be set to one perfectweek for every 3 real-weeks. So you've lucked out on this one." "Just remember," you continue, "that the story estimates and the story velocity are very tentative at this point. We'll learn more when we plan the iteration and even more when we implement it." Jane looks over her glasses at you as if to say "Who's the boss around here, anyway?" and then smiles and says, "Yeah, don't worry. I know the drill by now."Jane then puts 15 more cards on the table. She says, "If we can get all these cards done by the end of March, we can turn the system over to our beta test customers. And we'll get good feedback from them." You reply, "OK, so we've got our first iteration defined, and we have the stories for the next three iterations after that. These four iterations will make our first release." "So," says Jane, can you really do these five stories in the next 3 weeks?" "I don't know for sure, Jane," you reply. "Let's break them down into tasks and see what we get." So Jane, you, and your team spend the next several hours taking each of the five stories that Jane chose for the first iteration and breaking them down into small tasks. The developers quickly realize that some of the tasks can be shared between stories and that other tasks have commonalities that can probably be taken advantage of. It is clear that potential designs are popping into the developers' heads. From time to time, they form little discussion knots and scribble UML diagrams on some cards. Soon, the whiteboard is filled with the tasks that, once completed, will implement the five stories for this iteration. You start the sign-up process by saying, "OK, let's sign up for these tasks." "I'll take the initial database generation." Says Pete. "That's what I did on the last project, and this doesn't look very different. I estimate it at two of my perfect workdays." "OK, well, then, I'll take the login screen," says Joe. "Aw, darn," says Elaine, the junior member of the team, "I've never done a GUI, and kinda wanted to try that one." "Ah, the impatience of youth," Joe says sagely, with a wink in your direction. "You can assist me with it, young Jedi." To Jane: "I think it'll take me about three of my perfect workdays." One by one, the developers sign up for tasks and estimate them in terms of their own perfect workdays. Both you and Jane know that it is best to let the developers volunteer for tasks than to assign the tasks to them. You also know full well that you daren't challenge any of the developers' estimates. You know these people, and you trust them. You know that they are going to do the very best they can. The developers know that they can't sign up for more perfect workdays than they finished in the last iteration they worked on. Once each developer has filled his or her schedule for the iteration, they stop signing up for tasks. Eventually, all the developers have stopped signing up for tasks. But, of course, tasks are still left on the board. "I was worried that that might happen," you say, "OK, there's only one thing to do, Jane. We've got too much to do in this iteration. What stories or tasks can we remove?" Jane sighs. She knows that this is the only option. Working overtime at the beginning of a project is insane, and projects where she's tried it have not fared well. So Jane starts to remove the least-important functionality. "Well, we really don't need the login screen just yet. We can simply start the system in the logged-in state." "Rats!" cries Elaine. "I really wanted to do that." "Patience, grasshopper." says Joe. "Those who wait for the bees to leave the hive will not have lips too swollen to relish the honey." Elaine looks confused. Everyone looks confused. "So . . .," Jane continues, "I think we can also do away with . . ." And so, bit by bit, the list of tasks shrinks. Developers who lose a task sign up for one of the remaining ones. The negotiation is not painless. Several times, Jane exhibits obvious frustration and impatience. Once, when tensions are especially high, Elaine volunteers, "I'll work extra hard to make up some of the missing time." You are about to correct her when, fortunately, Joe looks her in the eye and says, "When once you proceed down the dark path, forever will it dominate your destiny." In the end, an iteration acceptable to Jane is reached. It's not what Jane wanted. Indeed, it is significantly less. But it's something the team feels that can be achieved in the next 3 weeks. And, after all, it still addresses the most important things that Jane wanted in the iteration. "So, Jane," you say when things had quieted down a bit, "when can we expect acceptance tests from you?" Jane sighs. This is the other side of the coin. For every story the development team implements, Jane must supply a suite of acceptance tests that prove that it works. And the team needs these long before the end of the iteration, since they will certainly point out differences in the way Jane and the developers imagine the system's behaviour. "I'll get you some example test scripts today," Jane promises. "I'll add to them every day after that. You'll have the entire suite by the middle of the iteration." * * * The iteration begins on Monday morning with a flurry of Class, Responsibilities, Collaborators sessions. By midmorning, all the developers have assembled into pairs and are rapidly coding away. "And now, my young apprentice," Joe says to Elaine, "you shall learn the mysteries of test-first design!" "Wow, that sounds pretty rad," Elaine replies. "How do you do it?" Joe beams. It's clear that he has been anticipating this moment. "OK, what does the code do right now?" "Huh?" replied Elaine, "It doesn't do anything at all; there is no code." "So, consider our task; can you think of something the code should do?" "Sure," Elaine said with youthful assurance, "First, it should connect to the database." "And thereupon, what must needs be required to connecteth the database?" "You sure talk weird," laughed Elaine. "I think we'd have to get the database object from some registry and call the Connect() method. "Ah, astute young wizard. Thou perceives correctly that we requireth an object within which we can cacheth the database object." "Is 'cacheth' really a word?" "It is when I say it! So, what test can we write that we know the database registry should pass?" Elaine sighs. She knows she'll just have to play along. "We should be able to create a database object and pass it to the registry in a Store() method. And then we should be able to pull it out of the registry with a Get() method and make sure it's the same object." "Oh, well said, my prepubescent sprite!" "Hay!" "So, now, let's write a test function that proves your case." "But shouldn't we write the database object and registry object first?" "Ah, you've much to learn, my young impatient one. Just write the test first." "But it won't even compile!" "Are you sure? What if it did?" "Uh . . ." "Just write the test, Elaine. Trust me." And so Joe, Elaine, and all the other developers began to code their tasks, one test case at a time. The room in which they worked was abuzz with the conversations between the pairs. The murmur was punctuated by an occasional high five when a pair managed to finish a task or a difficult test case. As development proceeded, the developers changed partners once or twice a day. Each developer got to see what all the others were doing, and so knowledge of the code spread generally throughout the team. Whenever a pair finished something significant whether a whole task or simply an important part of a task they integrated what they had with the rest of the system. Thus, the code base grew daily, and integration difficulties were minimized. The developers communicated with Jane on a daily basis. They'd go to her whenever they had a question about the functionality of the system or the interpretation of an acceptance test case. Jane, good as her word, supplied the team with a steady stream of acceptance test scripts. The team read these carefully and thereby gained a much better understanding of what Jane expected the system to do. By the beginning of the second week, there was enough functionality to demonstrate to Jane. She watched eagerly as the demonstration passed test case after test case. "This is really cool," Jane said as the demonstration finally ended. "But this doesn't seem like one-third of the tasks. Is your velocity slower than anticipated?" You grimace. You'd been waiting for a good time to mention this to Jane but now she was forcing the issue. "Yes, unfortunately, we are going more slowly than we had expected. The new application server we are using is turning out to be a pain to configure. Also, it takes forever to reboot, and we have to reboot it whenever we make even the slightest change to its configuration." Jane eyes you with suspicion. The stress of last Monday's negotiations had still not entirely dissipated. She says, "And what does this mean to our schedule? We can't slip it again, we just can't. Russ will have a fit! He'll haul us all into the woodshed and ream us some new ones." You look Jane right in the eyes. There's no pleasant way to give someone news like this. So you just blurt out, "Look, if things keep going like they're going, we're not going to be done with everything by next Friday. Now it's possible that we'll figure out a way to go faster. But, frankly, I wouldn't depend on that. You should start thinking about one or two tasks that could be removed from the iteration without ruining the demonstration for Russ. Come hell or high water, we are going to give that demonstration on Friday, and I don't think you want us to choose which tasks to omit." "Aw forchrisakes!" Jane barely manages to stifle yelling that last word as she stalks away, shaking her head. Not for the first time, you say to yourself, "Nobody ever promised me project management would be easy." You are pretty sure it won't be the last time, either. Actually, things went a bit better than you had hoped. The team did, in fact, have to drop one task from the iteration, but Jane had chosen wisely, and the demonstration for Russ went without a hitch. Russ was not impressed with the progress, but neither was he dismayed. He simply said, "This is pretty good. But remember, we have to be able to demonstrate this system at the trade show in July, and at this rate, it doesn't look like you'll have all that much to show." Jane, whose attitude had improved dramatically with the completion of the iteration, responded to Russ by saying, "Russ, this team is working hard, and well. When July comes around, I am confident that we'll have something significant to demonstrate. It won't be everything, and some of it may be smoke and mirrors, but we'll have something." Painful though the last iteration was, it had calibrated your velocity numbers. The next iteration went much better. Not because your team got more done than in the last iteration but simply because the team didn't have to remove any tasks or stories in the middle of the iteration. By the start of the fourth iteration, a natural rhythm has been established. Jane, you, and the team know exactly what to expect from one another. The team is running hard, but the pace is sustainable. You are confident that the team can keep up this pace for a year or more. The number of surprises in the schedule diminishes to near zero; however, the number of surprises in the requirements does not. Jane and Russ frequently look over the growing system and make recommendations or changes to the existing functionality. But all parties realize that these changes take time and must be scheduled. So the changes do not cause anyone's expectations to be violated. In March, there is a major demonstration of the system to the board of directors. The system is very limited and is not yet in a form good enough to take to the trade show, but progress is steady, and the board is reasonably impressed. The second release goes even more smoothly than the first. By now, the team has figured out a way to automate Jane's acceptance test scripts. The team has also refactored the design of the system to the point that it is really easy to add new features and change old ones. The second release was done by the end of June and was taken to the trade show. It had less in it than Jane and Russ would have liked, but it did demonstrate the most important features of the system. Although customers at the trade show noticed that certain features were missing, they were very impressed overall. You, Russ, and Jane all returned from the trade show with smiles on your faces. You all felt as though this project was a winner. Indeed, many months later, you are contacted by Rufus Inc. That company had been working on a system like this for its internal operations. Rufus has canceled the development of that system after a death-march project and is negotiating to license your technology for its environment. Indeed, things are looking up!

Read the article

Java code optimization on matrix windowing computes in more time

- by rano

I have a matrix which represents an image and I need to cycle over each pixel and for each one of those I have to compute the sum of all its neighbors, ie the pixels that belong to a window of radius rad centered on the pixel. I came up with three alternatives: The simplest way, the one that recomputes the window for each pixel The more optimized way that uses a queue to store the sums of the window columns and cycling through the columns of the matrix updates this queue by adding a new element and removing the oldes The even more optimized way that does not need to recompute the queue for each row but incrementally adjusts a previously saved one I implemented them in c++ using a queue for the second method and a combination of deques for the third (I need to iterate through their elements without destructing them) and scored their times to see if there was an actual improvement. it appears that the third method is indeed faster. Then I tried to port the code to Java (and I must admit that I'm not very comfortable with it). I used ArrayDeque for the second method and LinkedLists for the third resulting in the third being inefficient in time. Here is the simplest method in C++ (I'm not posting the java version since it is almost identical): void normalWindowing(int mat[][MAX], int cols, int rows, int rad){ int i, j; int h = 0; for (i = 0; i < rows; ++i) { for (j = 0; j < cols; j++) { h = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { for (int rx =- rad; rx <= rad; rx++) { int x = j + rx; if (x >= 0 && x < cols) { h += mat[y][x]; } } } } } } } Here is the second method (the one optimized through columns) in C++: void opt1Windowing(int mat[][MAX], int cols, int rows, int rad){ int i, j, h, y, col; queue<int>* q = NULL; for (i = 0; i < rows; ++i) { if (q != NULL) delete(q); q = new queue<int>(); h = 0; for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][rx]; } } q->push(mem); h += mem; } } for (j = 1; j < cols; j++) { col = j + rad; if (j - rad > 0) { h -= q->front(); q->pop(); } if (j + rad < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][col]; } } q->push(mem); h += mem; } } } } And here is the Java version: public static void opt1Windowing(int [][] mat, int rad){ int i, j = 0, h, y, col; int cols = mat[0].length; int rows = mat.length; ArrayDeque<Integer> q = null; for (i = 0; i < rows; ++i) { q = new ArrayDeque<Integer>(); h = 0; for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][rx]; } } q.addLast(mem); h += mem; } } j = 0; for (j = 1; j < cols; j++) { col = j + rad; if (j - rad > 0) { h -= q.peekFirst(); q.pop(); } if (j + rad < cols) { int mem = 0; for (int ry =- rad; ry <= rad; ry++) { y = i + ry; if (y >= 0 && y < rows) { mem += mat[y][col]; } } q.addLast(mem); h += mem; } } } } I recognize this post will be a wall of text. Here is the third method in C++: void opt2Windowing(int mat[][MAX], int cols, int rows, int rad){ int i = 0; int j = 0; int h = 0; int hh = 0; deque< deque<int> *> * M = new deque< deque<int> *>(); for (int ry = 0; ry <= rad; ry++) { if (ry < rows) { deque<int> * q = new deque<int>(); M->push_back(q); for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int val = mat[ry][rx]; q->push_back(val); h += val; } } } } deque<int> * C = new deque<int>(M->front()->size()); deque<int> * Q = new deque<int>(M->front()->size()); deque<int> * R = new deque<int>(M->size()); deque< deque<int> *>::iterator mit; deque< deque<int> *>::iterator mstart = M->begin(); deque< deque<int> *>::iterator mend = M->end(); deque<int>::iterator rit; deque<int>::iterator rstart = R->begin(); deque<int>::iterator rend = R->end(); deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); for (mit = mstart, rit = rstart; mit != mend, rit != rend; ++mit, ++rit) { deque<int>::iterator pit; deque<int>::iterator pstart = (* mit)->begin(); deque<int>::iterator pend = (* mit)->end(); for(cit = cstart, pit = pstart; cit != cend && pit != pend; ++cit, ++pit) { (* cit) += (* pit); (* rit) += (* pit); } } for (i = 0; i < rows; ++i) { j = 0; if (i - rad > 0) { deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); deque<int>::iterator pit; deque<int>::iterator pstart = (M->front())->begin(); deque<int>::iterator pend = (M->front())->end(); for(cit = cstart, pit = pstart; cit != cend; ++cit, ++pit) { (* cit) -= (* pit); } deque<int> * k = M->front(); M->pop_front(); delete k; h -= R->front(); R->pop_front(); } int row = i + rad; if (row < rows && i > 0) { deque<int> * newQ = new deque<int>(); M->push_back(newQ); deque<int>::iterator cit; deque<int>::iterator cstart = C->begin(); deque<int>::iterator cend = C->end(); int rx; int tot = 0; for (rx = 0, cit = cstart; rx <= rad; rx++, ++cit) { if (rx < cols) { int val = mat[row][rx]; newQ->push_back(val); (* cit) += val; tot += val; } } R->push_back(tot); h += tot; } hh = h; copy(C->begin(), C->end(), Q->begin()); for (j = 1; j < cols; j++) { int col = j + rad; if (j - rad > 0) { hh -= Q->front(); Q->pop_front(); } if (j + rad < cols) { int val = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { val += mat[y][col]; } } hh += val; Q->push_back(val); } } } } And finally its Java version: public static void opt2Windowing(int [][] mat, int rad){ int cols = mat[0].length; int rows = mat.length; int i = 0; int j = 0; int h = 0; int hh = 0; LinkedList<LinkedList<Integer>> M = new LinkedList<LinkedList<Integer>>(); for (int ry = 0; ry <= rad; ry++) { if (ry < rows) { LinkedList<Integer> q = new LinkedList<Integer>(); M.addLast(q); for (int rx = 0; rx <= rad; rx++) { if (rx < cols) { int val = mat[ry][rx]; q.addLast(val); h += val; } } } } int firstSize = M.getFirst().size(); int mSize = M.size(); LinkedList<Integer> C = new LinkedList<Integer>(); LinkedList<Integer> Q = null; LinkedList<Integer> R = new LinkedList<Integer>(); for (int k = 0; k < firstSize; k++) { C.add(0); } for (int k = 0; k < mSize; k++) { R.add(0); } ListIterator<LinkedList<Integer>> mit; ListIterator<Integer> rit; ListIterator<Integer> cit; ListIterator<Integer> pit; for (mit = M.listIterator(), rit = R.listIterator(); mit.hasNext();) { Integer r = rit.next(); int rsum = 0; for (cit = C.listIterator(), pit = (mit.next()).listIterator(); cit.hasNext();) { Integer c = cit.next(); Integer p = pit.next(); rsum += p; cit.set(c + p); } rit.set(r + rsum); } for (i = 0; i < rows; ++i) { j = 0; if (i - rad > 0) { for(cit = C.listIterator(), pit = M.getFirst().listIterator(); cit.hasNext();) { Integer c = cit.next(); Integer p = pit.next(); cit.set(c - p); } M.removeFirst(); h -= R.getFirst(); R.removeFirst(); } int row = i + rad; if (row < rows && i > 0) { LinkedList<Integer> newQ = new LinkedList<Integer>(); M.addLast(newQ); int rx; int tot = 0; for (rx = 0, cit = C.listIterator(); rx <= rad; rx++) { if (rx < cols) { Integer c = cit.next(); int val = mat[row][rx]; newQ.addLast(val); cit.set(c + val); tot += val; } } R.addLast(tot); h += tot; } hh = h; Q = new LinkedList<Integer>(); Q.addAll(C); for (j = 1; j < cols; j++) { int col = j + rad; if (j - rad > 0) { hh -= Q.getFirst(); Q.pop(); } if (j + rad < cols) { int val = 0; for (int ry =- rad; ry <= rad; ry++) { int y = i + ry; if (y >= 0 && y < rows) { val += mat[y][col]; } } hh += val; Q.addLast(val); } } } } I guess that most is due to the poor choice of the LinkedList in Java and to the lack of an efficient (not shallow) copy method between two LinkedList. How can I improve the third Java method? Am I doing some conceptual error? As always, any criticisms is welcome. UPDATE Even if it does not solve the issue, using ArrayLists, as being suggested, instead of LinkedList improves the third method. The second one performs still better (but when the number of rows and columns of the matrix is lower than 300 and the window radius is small the first unoptimized method is the fastest in Java)

Developer IT