Search Results

Search found 159 results on 7 pages for 'beast'.

Page 7/7 | < Previous Page | 3 4 5 6 7 

  • The Presentation Isn't Over Until It's Over

    - by Phil Factor
    The senior corporate dignitaries settled into their seats looking important in a blue-suited sort of way. The lights dimmed as I strode out in front to give my presentation.  I had ten vital minutes to make my pitch.  I was about to dazzle the top management of a large software company who were considering the purchase of my software product. I would present them with a dazzling synthesis of diagrams, graphs, followed by  a live demonstration of my software projected from my laptop.  My preparation had been meticulous: It had to be: A year’s hard work was at stake, so I’d prepared it to perfection.  I stood up and took them all in, with a gaze of sublime confidence. Then the laptop expired. There are several possible alternative plans of action when this happens     A. Stare at the smoking laptop vacuously, flapping ones mouth slowly up and down     B. Stand frozen like a statue, locked in indecision between fright and flight.     C. Run out of the room, weeping     D. Pretend that this was all planned     E. Abandon the presentation in favour of a stilted and tedious dissertation about the software     F. Shake your fist at the sky, and curse the sense of humour of your preferred deity I started for a few seconds on plan B, normally referred to as the ‘Rabbit in the headlamps of the car’ technique. Suddenly, a little voice inside my head spoke. It spoke the famous inane words of Yogi Berra; ‘The game isn't over until it's over.’ ‘Too right’, I thought. What to do? I ran through the alternatives A-F inclusive in my mind but none appealed to me. I was completely unprepared for this. Nowadays, longevity has since taught me more than I wanted to know about the wacky sense of humour of fate, and I would have taken two laptops. I hadn’t, but decided to do the presentation anyway as planned. I started out ignoring the dead laptop, but pretending, instead that it was still working. The audience looked startled. They were expecting plan B to be succeeded by plan C, I suspect. They weren’t used to denial on this scale. After my introductory talk, which didn’t require any visuals, I came to the diagram that described the application I’d written.  I’d taken ages over it and it was hot stuff. Well, it would have been had it been projected onto the screen. It wasn’t. Before I describe what happened then, I must explain that I have thespian tendencies.  My  triumph as Professor Higgins in My Fair Lady at the local operatic society is now long forgotten, but I remember at the time of my finest performance, the moment that, glancing up over the vast audience of  moist-eyed faces at the during the poignant  scene between Eliza and Higgins at the end, I  realised that I had a talent that one day could possibly  be harnessed for commercial use I just talked about the diagram as if it was there, but throwing in some extra description. The audience nodded helpfully when I’d done enough. Emboldened, I began a sort of mime, well, more of a ballet, to represent each slide as I came to it. Heaven knows I’d done my preparation and, in my mind’s eye, I could see every detail, but I had to somehow project the reality of that vision to the audience, much the same way any actor playing Macbeth should do the ghost of Banquo.  My desperation gave me a manic energy. If you’ve ever demonstrated a windows application entirely by mime, gesture and florid description, you’ll understand the scale of the challenge, but then I had nothing to lose. With a brief sentence of description here and there, and arms flailing whilst outlining the size and shape of  graphs and diagrams, I used the many tricks of mime, gesture and body-language  learned from playing Captain Hook, or the Sheriff of Nottingham in pantomime. I set out determinedly on my desperate venture. There wasn’t time to do anything but focus on the challenge of the task: the world around me narrowed down to ten faces and my presentation: ten souls who had to be hypnotized into seeing a Windows application:  one that was slick, well organized and functional I don’t remember the details. Eight minutes of my life are gone completely. I was a thespian berserker.  I know however that I followed the basic plan of building the presentation in a carefully controlled crescendo until the dazzling finale where the results were displayed on-screen.  ‘And here you see the results, neatly formatted and grouped carefully to enhance the significance of the figures, together with running trend-graphs!’ I waved a mime to signify an animated  window-opening, and looked up, in my first pause, to gaze defiantly  at the audience.  It was a sight I’ll never forget. Ten pairs of eyes were gazing in rapt attention at the imaginary window, and several pairs of eyes were glancing at the imaginary graphs and figures.  I hadn’t had an audience like that since my starring role in  Beauty and the Beast.  At that moment, I realized that my desperate ploy might work. I sat down, slightly winded, when my ten minutes were up.  For the first and last time in my life, the audience of a  ‘PowerPoint’ presentation burst into spontaneous applause. ‘Any questions?’ ‘Yes,  Have you got an agent?’ Yes, in case you’re wondering, I got the deal. They bought the software product from me there and then. However, it was a life-changing experience for me and I have never ever again trusted technology as part of a presentation.  Even if things can’t go wrong, they’ll go wrong and they’ll kill the flow of what you’re presenting.  if you can’t do something without the techno-props, then you shouldn’t do it.  The greatest lesson of all is that great presentations require preparation and  ‘stage-presence’ rather than fancy graphics. They’re a great supporting aid, but they should never dominate to the point that you’re lost without them.

    Read the article

  • Nokia Windows Phone 8 App Collection

    - by Tim Murphy
    I recently upgraded to a Nokia Lumia 920.  Along with it came the availability of a number of Nokia developed apps or apps that Nokia has made available from other developers.  Below is a summary of some of the ones that I have used to this point.  There are quite a few of them so I won’t be covering everything that is available. Nokia Maps I am quite pleased with the accuracy of Nokia Maps and not having to tap the screen for each turn any more.  The information on the screen is quite good as well.  The couple of improvements I would like to see are for the voice directions to include which street or exit you need to use and improve the search accuracy.  Bing maps had much better search results in my opinion. Nokia Drive This one really had me confused when I first setup the phone.  I was driving down the road and suddenly I am getting notification tones, but there were no visual notifications on the phone.  It seems that in their infinite wisdom Nokia thinks I don’t know when I am going over the speed limit and need to be told. ESPN I really liked my ESPN app on Windows Phone 7.5, but I am not getting the type of experience I was looking for out of this app.  While it allows me to pick my favorite teams, but there isn’t a pivot page or panorama page that shows a summary of my favorite teams.  I have also found that the live tile don’t update very often.  Over all I am rather disappointed compared app produced by ESPN. Smart Shoot I really need to get the kids to let me use this on.  I like the concept, but I need to spend more time with it.  The idea how running the camera through a continuous shooting mode and then picking the best is something that I have done with my DSLR and am glad to see it available here. Cinemagraph Here is a fun filter.  It doesn’t have the most accurate editing features, but it is fun to stop certain parts of a scene and let other parts move.  As a test I stopped the traffic on the highway and let the traffic on the frontage road flow.  It makes for a fun effect.  If nothing else it could be great for sending prank animations to your friends. YouSendIt I have only briefly touched this application.  What I don’t understand is why it is needed.  Most of the functionality seems to be similar to SkyDrive and it gives you less storage.  They only feature that seems to differentiate the app is the signature capability. Creative Studio This app has some nice quick edits, but it is not very comprehensive.  I am also not to thrilled with the user experience.  It puts you though an initial color cast series that I’m not sure why it is there.  Discovery of the remaining adjustments isn’t that great.  In the end I found myself wanting Thumbia back. Panorama This is one of the apps that I like.  I found it easy to use as it guides you with a target circle that you center for it to take the next pictures.  It also stitches the images with amazing speed.  The one thing I wish it had was the capability to turn the phone into portrait orientation and do a taller panorama.  Perhaps we will see this in the future. Nokia Music After getting over the missing album art I found that there were a number of missing features with this app as well.  I have a Zune HD and I am used to being able to go through my collection and adding songs, albums or artists to my now playing.  There also doesn’t seem to be a way to manage playlists that I have seen yet.  Other than that the UI is familiar and it give Nokia City Lens Augmented reality is a cool concept, but I still haven’t seen it implemented in a compelling fashion beyond a demo at TED a couple of years ago.  The app still leaves me wanting as well.  It does give an interesting toy.  It gives you the ability to look for general categories and see general direction and clusters of locations.  I think as this concept is better thought out it will become more compelling. Nokia Trailers I don’t know how often I will use this app, but I do like being able to see what movies are being promoted.  I can’t wait for The Hobbit to come out and the trailer was just what the doctor ordered.  I can see coming back to this app from time to time. PhotoBeamer PhotoBeamer is a strange beast that needs a better instruction manual.  It seems a lot like magic but very confusing.  I need some more testing, but I don’t think this is something that most people are going to understand quickly and may give up before getting it to work.  I may put an update here after playing with it further. Ringtone Maker The app was just published and it didn’t work very well for me. It couldn’t find 95% of the songs that Nokia Music was playing for me and crashed several times.  It also had songs named wrong that when I checked them in Nokia Music they were fine.  This app looks like it has a long way to go. Summary In all I think that Nokia is offering a well rounded set of initial applications that can get any new owner started.  There is definitely room for improvement in all of these apps.  The main need is usability upgrades.  I would guess that with feedback from users they will come up to acceptable levels.  Try them out and see if you agree. del.icio.us Tags: Windows Phone,Nokia,Lumia,Nokia Apps,ESPN,PhotoBeamer,City Lens,YouSendIt,Drive,Maps

    Read the article

  • How to Avoid Your Next 12-Month Science Project

    - by constant
    While most customers immediately understand how the magic of Oracle's Hybrid Columnar Compression, intelligent storage servers and flash memory make Exadata uniquely powerful against home-grown database systems, some people think that Exalogic is nothing more than a bunch of x86 servers, a storage appliance and an InfiniBand (IB) network, built into a single rack. After all, isn't this exactly what the High Performance Computing (HPC) world has been doing for decades? On the surface, this may be true. And some people tried exactly that: They tried to put together their own version of Exalogic, but then they discover there's a lot more to building a system than buying hardware and assembling it together. IT is not Ikea. Why is that so? Could it be there's more going on behind the scenes than merely putting together a bunch of servers, a storage array and an InfiniBand network into a rack? Let's explore some of the special sauce that makes Exalogic unique and un-copyable, so you can save yourself from your next 6- to 12-month science project that distracts you from doing real work that adds value to your company. Engineering Systems is Hard Work! The backbone of Exalogic is its InfiniBand network: 4 times better bandwidth than even 10 Gigabit Ethernet, and only about a tenth of its latency. What a potential for increased scalability and throughput across the middleware and database layers! But InfiniBand is a beast that needs to be tamed: It is true that Exalogic uses a standard, open-source Open Fabrics Enterprise Distribution (OFED) InfiniBand driver stack. Unfortunately, this software has been developed by the HPC community with fastest speed in mind (which is good) but, despite the name, not many other enterprise-class requirements are included (which is less good). Here are some of the improvements that Oracle's InfiniBand development team had to add to the OFED stack to make it enterprise-ready, simply because typical HPC users didn't have the need to implement them: More than 100 bug fixes in the pieces that were not related to the Message Passing Interface Protocol (MPI), which is the protocol that HPC users use most of the time, but which is less useful in the enterprise. Performance optimizations and tuning across the whole IB stack: From Switches, Host Channel Adapters (HCAs) and drivers to low-level protocols, middleware and applications. Yes, even the standard HPC IB stack could be improved in terms of performance. Ethernet over IB (EoIB): Exalogic uses InfiniBand internally to reach high performance, but it needs to play nicely with datacenters around it. That's why Oracle added Ethernet over InfiniBand technology to it that allows for creating many virtual 10GBE adapters inside Exalogic's nodes that are aggregated and connected to Exalogic's IB gateway switches. While this is an open standard, it's up to the vendor to implement it. In this case, Oracle integrated the EoIB stack with Oracle's own IB to 10GBE gateway switches, and made it fully virtualized from the beginning. This means that Exalogic customers can completely rewire their server infrastructure inside the rack without having to physically pull or plug a single cable - a must-have for every cloud deployment. Anybody who wants to match this level of integration would need to add an InfiniBand switch development team to their project. Or just buy Oracle's gateway switches, which are conveniently shipped with a whole server infrastructure attached! IPv6 support for InfiniBand's Sockets Direct Protocol (SDP), Reliable Datagram Sockets (RDS), TCP/IP over IB (IPoIB) and EoIB protocols. Because no IPv6 = not very enterprise-class. HA capability for SDP. High Availability is not a big requirement for HPC, but for enterprise-class application servers it is. Every node in Exalogic's InfiniBand network is connected twice for redundancy. If any cable or port or HCA fails, there's always a replacement link ready to take over. This requires extra magic at the protocol level to work. So in addition to Weblogic's failover capabilities, Oracle implemented IB automatic path migration at the SDP level to avoid unnecessary failover operations at the middleware level. Security, for example spoof-protection. Another feature that is less important for traditional users of InfiniBand, but very important for enterprise customers. InfiniBand Partitioning and Quality-of-Service (QoS): One of the first questions we get from customers about Exalogic is: “How can we implement multi-tenancy?” The answer is to partition your IB network, which effectively creates many networks that work independently and that are protected at the lowest networking layer possible. In addition to that, QoS allows administrators to prioritize traffic flow in multi-tenancy environments so they can keep their service levels where it matters most. Resilient IB Fabric Management: InfiniBand is a self-managing network, so a lot of the magic lies in coming up with the right topology and in teaching the subnet manager how to properly discover and manage the network. Oracle's Infiniband switches come with pre-integrated, highly available fabric management with seamless integration into Oracle Enterprise Manager Ops Center. In short: Oracle elevated the OFED InfiniBand stack into an enterprise-class networking infrastructure. Many years and multiple teams of manpower went into the above improvements - this is something you can only get from Oracle, because no other InfiniBand vendor can give you these features across the whole stack! Exabus: Because it's not About the Size of Your Network, it's How You Use it! So let's assume that you somehow were able to get your hands on an enterprise-class IB driver stack. Or maybe you don't care and are just happy with the standard OFED one? Anyway, the next step is to actually leverage that InfiniBand performance. Here are the choices: Use traditional TCP/IP on top of the InfiniBand stack, Develop your own integration between your middleware and the lower-level (but faster) InfiniBand protocols. While more bandwidth is always a good thing, it's actually the low latency that enables superior performance for your applications when running on any networking infrastructure: The lower the latency, the faster the response travels through the network and the more transactions you can close per second. The reason why InfiniBand is such a low latency technology is that it gets rid of most if not all of your traditional networking protocol stack: Data is literally beamed from one region of RAM in one server into another region of RAM in another server with no kernel/drivers/UDP/TCP or other networking stack overhead involved! Which makes option 1 a no-go: Adding TCP/IP on top of InfiniBand is like adding training wheels to your racing bike. It may be ok in the beginning and for development, but it's not quite the performance IB was meant to deliver. Which only leaves option 2: Integrating your middleware with fast, low-level InfiniBand protocols. And this is what Exalogic's "Exabus" technology is all about. Here are a few Exabus features that help applications leverage the performance of InfiniBand in Exalogic: RDMA and SDP integration at the JDBC driver level (SDP), for Oracle Weblogic (SDP), Oracle Coherence (RDMA), Oracle Tuxedo (RDMA) and the new Oracle Traffic Director (RDMA) on Exalogic. Using these protocols, middleware can communicate a lot faster with each other and the Oracle database than by using standard networking protocols, Seamless Integration of Ethernet over InfiniBand from Exalogic's Gateway switches into the OS, Oracle Weblogic optimizations for handling massive amounts of parallel transactions. Because if you have an 8-lane Autobahn, you also need to improve your ramps so you can feed it with many cars in parallel. Integration of Weblogic with Oracle Exadata for faster performance, optimized session management and failover. As you see, “Exabus” is Oracle's word for describing all the InfiniBand enhancements Oracle put into Exalogic: OFED stack enhancements, protocols for faster IB access, and InfiniBand support and optimizations at the virtualization and middleware level. All working together to deliver the full potential of InfiniBand performance. Who else has 100% control over their middleware so they can develop their own low-level protocol integration with InfiniBand? Even if you take an open source approach, you're looking at years of development work to create, test and support a whole new networking technology in your middleware! The Extras: Less Hassle, More Productivity, Faster Time to Market And then there are the other advantages of Engineered Systems that are true for Exalogic the same as they are for every other Engineered System: One simple purchasing process: No headaches due to endless RFPs and no “Will X work with Y?” uncertainties. Everything has been engineered together: All kinds of bugs and problems have been already fixed at the design level that would have only manifested themselves after you have built the system from scratch. Everything is built, tested and integrated at the factory level . Less integration pain for you, faster time to market. Every Exalogic machine world-wide is identical to Oracle's own machines in the lab: Instant replication of any problems you may encounter, faster time to resolution. Simplified patching, management and operations. One throat to choke: Imagine finger-pointing hell for systems that have been put together using several different vendors. Oracle's Engineered Systems have a single phone number that customers can call to get their problems solved. For more business-centric values, read The Business Value of Engineered Systems. Conclusion: Buy Exalogic, or get ready for a 6-12 Month Science Project And here's the reason why it's not easy to "build your own Exalogic": There's a lot of work required to make such a system fly. In fact, anybody who is starting to "just put together a bunch of servers and an InfiniBand network" is really looking at a 6-12 month science project. And the outcome is likely to not be very enterprise-class. And it won't have Exalogic's performance either. Because building an Engineered System is literally rocket science: It takes a lot of time, effort, resources and many iterations of design/test/analyze/fix to build such a system. That's why InfiniBand has been reserved for HPC scientists for such a long time. And only Oracle can bring the power of InfiniBand in an enterprise-class, ready-to use, pre-integrated version to customers, without the develop/integrate/support pain. For more details, check the new Exalogic overview white paper which was updated only recently. P.S.: Thanks to my colleagues Ola, Paul, Don and Andy for helping me put together this article! var flattr_uid = '26528'; var flattr_tle = 'How to Avoid Your Next 12-Month Science Project'; var flattr_dsc = 'While most customers immediately understand how the magic of Oracle's Hybrid Columnar Compression, intelligent storage servers and flash memory make Exadata uniquely powerful against home-grown database systems, some people think that Exalogic is nothing more than a bunch of x86 servers, a storage appliance and an InfiniBand (IB) network, built into a single rack.After all, isn't this exactly what the High Performance Computing (HPC) world has been doing for decades?On the surface, this may be true. And some people tried exactly that: They tried to put together their own version of Exalogic, but then they discover there's a lot more to building a system than buying hardware and assembling it together. IT is not Ikea.Why is that so? Could it be there's more going on behind the scenes than merely putting together a bunch of servers, a storage array and an InfiniBand network into a rack? Let's explore some of the special sauce that makes Exalogic unique and un-copyable, so you can save yourself from your next 6- to 12-month science project that distracts you from doing real work that adds value to your company.'; var flattr_tag = 'Engineered Systems,Engineered Systems,Infiniband,Integration,latency,Oracle,performance'; var flattr_cat = 'text'; var flattr_url = 'http://constantin.glez.de/blog/2012/04/how-avoid-your-next-12-month-science-project'; var flattr_lng = 'en_GB'

    Read the article

  • Investigation: Can different combinations of components effect Dataflow performance?

    - by jamiet
    Introduction The Dataflow task is one of the core components (if not the core component) of SQL Server Integration Services (SSIS) and often the most misunderstood. This is not surprising, its an incredibly complicated beast and we’re abstracted away from that complexity via some boxes that go yellow red or green and that have some lines drawn between them. Example dataflow In this blog post I intend to look under that facade and get into some of the nuts and bolts of the Dataflow Task by investigating how the decisions we make when building our packages can affect performance. I will do this by comparing the performance of three dataflows that all have the same input, all produce the same output, but which all operate slightly differently by way of having different transformation components. I also want to use this blog post to challenge a common held opinion that I see perpetuated over and over again on the SSIS forum. That is, that people assume adding components to a dataflow will be detrimental to overall performance. Its not surprising that people think this –it is intuitive to think that more components means more work- however this is not a view that I share. I have always been of the opinion that there are many factors affecting dataflow duration and the number of components is actually one of the less important ones; having said that I have never proven that assertion and that is one reason for this investigation. I have actually seen evidence that some people think dataflow duration is simply a function of number of rows and number of components. I’ll happily call that one out as a myth even without any investigation!  The Setup I have a 2GB datafile which is a list of 4731904 (~4.7million) customer records with various attributes against them and it contains 2 columns that I am going to use for categorisation: [YearlyIncome] [BirthDate] The data file is a SSIS raw format file which I chose to use because it is the quickest way of getting data into a dataflow and given that I am testing the transformations, not the source or destination adapters, I want to minimise external influences as much as possible. In the test I will split the customers according to month of birth (12 of those) and whether or not their yearly income is above or below 50000 (2 of those); in other words I will be splitting them into 24 discrete categories and in order to do it I shall be using different combinations of SSIS’ Conditional Split and Derived Column transformation components. The 24 datapaths that occur will each input to a rowcount component, again because this is the least resource intensive means of terminating a datapath. The test is being carried out on a Dell XPS Studio laptop with a quad core (8 logical Procs) Intel Core i7 at 1.73GHz and Samsung SSD hard drive. Its running SQL Server 2008 R2 on Windows 7. The Variables Here are the three combinations of components that I am going to test:     One Conditional Split - A single Conditional Split component CSPL Split by Month of Birth and income category that will use expressions on [YearlyIncome] & [BirthDate] to send each row to one of 24 outputs. This next screenshot displays the expression logic in use: Derived Column & Conditional Split - A Derived Column component DER Income Category that adds a new column [IncomeCategory] which will contain one of two possible text values {“LessThan50000”,”GreaterThan50000”} and uses [YearlyIncome] to determine which value each row should get. A Conditional Split component CSPL Split by Month of Birth and Income Category then uses that new column in conjunction with [BirthDate] to determine which of the same 24 outputs to send each row to. Put more simply, I am separating the Conditional Split of #1 into a Derived Column and a Conditional Split. The next screenshots display the expression logic in use: DER Income Category         CSPL Split by Month of Birth and Income Category       Three Conditional Splits - A Conditional Split component that produces two outputs based on [YearlyIncome], one for each Income Category. Each of those outputs will go to a further Conditional Split that splits the input into 12 outputs, one for each month of birth (identical logic in each). In this case then I am separating the single Conditional Split of #1 into three Conditional Split components. The next screenshots display the expression logic in use: CSPL Split by Income Category         CSPL Split by Month of Birth 1& 2       Each of these combinations will provide an input to one of the 24 rowcount components, just the same as before. For illustration here is a screenshot of the dataflow containing three Conditional Split components: As you can these dataflows have a fair bit of work to do and remember that they’re doing that work for 4.7million rows. I will execute each dataflow 10 times and use the average for comparison. I foresee three possible outcomes: The dataflow containing just one Conditional Split (i.e. #1) will be quicker There is no significant difference between any of them One of the two dataflows containing multiple transformation components will be quicker Regardless of which of those outcomes come to pass we will have learnt something and that makes this an interesting test to carry out. Note that I will be executing the dataflows using dtexec.exe rather than hitting F5 within BIDS. The Results and Analysis The table below shows all of the executions, 10 for each dataflow. It also shows the average for each along with a standard deviation. All durations are in seconds. I’m pasting a screenshot because I frankly can’t be bothered with the faffing about needed to make a presentable HTML table. It is plain to see from the average that the dataflow containing three conditional splits is significantly faster, the other two taking 43% and 52% longer respectively. This seems strange though, right? Why does the dataflow containing the most components outperform the other two by such a big margin? The answer is actually quite logical when you put some thought into it and I’ll explain that below. Before progressing, a side note. The standard deviation for the “Three Conditional Splits” dataflow is orders of magnitude smaller – indicating that performance for this dataflow can be predicted with much greater confidence too. The Explanation I refer you to the screenshot above that shows how CSPL Split by Month of Birth and salary category in the first dataflow is setup. Observe that there is a case for each combination of Month Of Date and Income Category – 24 in total. These expressions get evaluated in the order that they appear and hence if we assume that Month of Date and Income Category are uniformly distributed in the dataset we can deduce that the expected number of expression evaluations for each row is 12.5 i.e. 1 (the minimum) + 24 (the maximum) divided by 2 = 12.5. Now take a look at the screenshots for the second dataflow. We are doing one expression evaluation in DER Income Category and we have the same 24 cases in CSPL Split by Month of Birth and Income Category as we had before, only the expression differs slightly. In this case then we have 1 + 12.5 = 13.5 expected evaluations for each row – that would account for the slightly longer average execution time for this dataflow. Now onto the third dataflow, the quick one. CSPL Split by Income Category does a maximum of 2 expression evaluations thus the expected number of evaluations per row is 1.5. CSPL Split by Month of Birth 1 & CSPL Split by Month of Birth 2 both have less work to do than the previous Conditional Split components because they only have 12 cases to test for thus the expected number of expression evaluations is 6.5 There are two of them so total expected number of expression evaluations for this dataflow is 6.5 + 6.5 + 1.5 = 14.5. 14.5 is still more than 12.5 & 13.5 though so why is the third dataflow so much quicker? Simple, the conditional expressions in the first two dataflows have two boolean predicates to evaluate – one for Income Category and one for Month of Birth; the expressions in the Conditional Split in the third dataflow however only have one predicate thus they are doing a lot less work. To sum up, the difference in execution times can be attributed to the difference between: MONTH(BirthDate) == 1 && YearlyIncome <= 50000 and MONTH(BirthDate) == 1 In the first two dataflows YearlyIncome <= 50000 gets evaluated an average of 12.5 times for every row whereas in the third dataflow it is evaluated once and once only. Multiply those 11.5 extra operations by 4.7million rows and you get a significant amount of extra CPU cycles – that’s where our duration difference comes from. The Wrap-up The obvious point here is that adding new components to a dataflow isn’t necessarily going to make it go any slower, moreover you may be able to achieve significant improvements by splitting logic over multiple components rather than one. Performance tuning is all about reducing the amount of work that needs to be done and that doesn’t necessarily mean use less components, indeed sometimes you may be able to reduce workload in ways that aren’t immediately obvious as I think I have proven here. Of course there are many variables in play here and your mileage will most definitely vary. I encourage you to download the package and see if you get similar results – let me know in the comments. The package contains all three dataflows plus a fourth dataflow that will create the 2GB raw file for you (you will also need the [AdventureWorksDW2008] sample database from which to source the data); simply disable all dataflows except the one you want to test before executing the package and remember, execute using dtexec, not within BIDS. If you want to explore dataflow performance tuning in more detail then here are some links you might want to check out: Inequality joins, Asynchronous transformations and Lookups Destination Adapter Comparison Don’t turn the dataflow into a cursor SSIS Dataflow – Designing for performance (webinar) Any comments? Let me know! @Jamiet

    Read the article

  • Table Variables: an empirical approach.

    - by Phil Factor
    It isn’t entirely a pleasant experience to publish an article only to have it described on Twitter as ‘Horrible’, and to have it criticized on the MVP forum. When this happened to me in the aftermath of publishing my article on Temporary tables recently, I was taken aback, because these critics were experts whose views I respect. What was my crime? It was, I think, to suggest that, despite the obvious quirks, it was best to use Table Variables as a first choice, and to use local Temporary Tables if you hit problems due to these quirks, or if you were doing complex joins using a large number of rows. What are these quirks? Well, table variables have advantages if they are used sensibly, but this requires some awareness by the developer about the potential hazards and how to avoid them. You can be hit by a badly-performing join involving a table variable. Table Variables are a compromise, and this compromise doesn’t always work out well. Explicit indexes aren’t allowed on Table Variables, so one cannot use covering indexes or non-unique indexes. The query optimizer has to make assumptions about the data rather than using column distribution statistics when a table variable is involved in a join, because there aren’t any column-based distribution statistics on a table variable. It assumes a reasonably even distribution of data, and is likely to have little idea of the number of rows in the table variables that are involved in queries. However complex the heuristics that are used might be in determining the best way of executing a SQL query, and they most certainly are, the Query Optimizer is likely to fail occasionally with table variables, under certain circumstances, and produce a Query Execution Plan that is frightful. The experienced developer or DBA will be on the lookout for this sort of problem. In this blog, I’ll be expanding on some of the tests I used when writing my article to illustrate the quirks, and include a subsequent example supplied by Kevin Boles. A simplified example. We’ll start out by illustrating a simple example that shows some of these characteristics. We’ll create two tables filled with random numbers and then see how many matches we get between the two tables. We’ll forget indexes altogether for this example, and use heaps. We’ll try the same Join with two table variables, two table variables with OPTION (RECOMPILE) in the JOIN clause, and with two temporary tables. It is all a bit jerky because of the granularity of the timing that isn’t actually happening at the millisecond level (I used DATETIME). However, you’ll see that the table variable is outperforming the local temporary table up to 10,000 rows. Actually, even without a use of the OPTION (RECOMPILE) hint, it is doing well. What happens when your table size increases? The table variable is, from around 30,000 rows, locked into a very bad execution plan unless you use OPTION (RECOMPILE) to provide the Query Analyser with a decent estimation of the size of the table. However, if it has the OPTION (RECOMPILE), then it is smokin’. Well, up to 120,000 rows, at least. It is performing better than a Temporary table, and in a good linear fashion. What about mixed table joins, where you are joining a temporary table to a table variable? You’d probably expect that the query analyzer would throw up its hands and produce a bad execution plan as if it were a table variable. After all, it knows nothing about the statistics in one of the tables so how could it do any better? Well, it behaves as if it were doing a recompile. And an explicit recompile adds no value at all. (we just go up to 45000 rows since we know the bigger picture now)   Now, if you were new to this, you might be tempted to start drawing conclusions. Beware! We’re dealing with a very complex beast: the Query Optimizer. It can come up with surprises What if we change the query very slightly to insert the results into a Table Variable? We change nothing else and just measure the execution time of the statement as before. Suddenly, the table variable isn’t looking so much better, even taking into account the time involved in doing the table insert. OK, if you haven’t used OPTION (RECOMPILE) then you’re toast. Otherwise, there isn’t much in it between the Table variable and the temporary table. The table variable is faster up to 8000 rows and then not much in it up to 100,000 rows. Past the 8000 row mark, we’ve lost the advantage of the table variable’s speed. Any general rule you may be formulating has just gone for a walk. What we can conclude from this experiment is that if you join two table variables, and can’t use constraints, you’re going to need that Option (RECOMPILE) hint. Count Dracula and the Horror Join. These tables of integers provide a rather unreal example, so let’s try a rather different example, and get stuck into some implicit indexing, by using constraints. What unusual words are contained in the book ‘Dracula’ by Bram Stoker? Here we get a table of all the common words in the English language (60,387 of them) and put them in a table. We put them in a Table Variable with the word as a primary key, a Table Variable Heap and a Table Variable with a primary key. We then take all the distinct words used in the book ‘Dracula’ (7,558 of them). We then create a table variable and insert into it all those uncommon words that are in ‘Dracula’. i.e. all the words in Dracula that aren’t matched in the list of common words. To do this we use a left outer join, where the right-hand value is null. The results show a huge variation, between the sublime and the gorblimey. If both tables contain a Primary Key on the columns we join on, and both are Table Variables, it took 33 Ms. If one table contains a Primary Key, and the other is a heap, and both are Table Variables, it took 46 Ms. If both Table Variables use a unique constraint, then the query takes 36 Ms. If neither table contains a Primary Key and both are Table Variables, it took 116383 Ms. Yes, nearly two minutes!! If both tables contain a Primary Key, one is a Table Variables and the other is a temporary table, it took 113 Ms. If one table contains a Primary Key, and both are Temporary Tables, it took 56 Ms.If both tables are temporary tables and both have primary keys, it took 46 Ms. Here we see table variables which are joined on their primary key again enjoying a  slight performance advantage over temporary tables. Where both tables are table variables and both are heaps, the query suddenly takes nearly two minutes! So what if you have two heaps and you use option Recompile? If you take the rogue query and add the hint, then suddenly, the query drops its time down to 76 Ms. If you add unique indexes, then you've done even better, down to half that time. Here are the text execution plans.So where have we got to? Without drilling down into the minutiae of the execution plans we can begin to create a hypothesis. If you are using table variables, and your tables are relatively small, they are faster than temporary tables, but as the number of rows increases you need to do one of two things: either you need to have a primary key on the column you are using to join on, or else you need to use option (RECOMPILE) If you try to execute a query that is a join, and both tables are table variable heaps, you are asking for trouble, well- slow queries, unless you give the table hint once the number of rows has risen past a point (30,000 in our first example, but this varies considerably according to context). Kevin’s Skew In describing the table-size, I used the term ‘relatively small’. Kevin Boles produced an interesting case where a single-row table variable produces a very poor execution plan when joined to a very, very skewed table. In the original, pasted into my article as a comment, a column consisted of 100000 rows in which the key column was one number (1) . To this was added eight rows with sequential numbers up to 9. When this was joined to a single-tow Table Variable with a key of 2 it produced a bad plan. This problem is unlikely to occur in real usage, and the Query Optimiser team probably never set up a test for it. Actually, the skew can be slightly less extreme than Kevin made it. The following test showed that once the table had 54 sequential rows in the table, then it adopted exactly the same execution plan as for the temporary table and then all was well. Undeniably, real data does occasionally cause problems to the performance of joins in Table Variables due to the extreme skew of the distribution. We've all experienced Perfectly Poisonous Table Variables in real live data. As in Kevin’s example, indexes merely make matters worse, and the OPTION (RECOMPILE) trick does nothing to help. In this case, there is no option but to use a temporary table. However, one has to note that once the slight de-skew had taken place, then the plans were identical across a huge range. Conclusions Where you need to hold intermediate results as part of a process, Table Variables offer a good alternative to temporary tables when used wisely. They can perform faster than a temporary table when the number of rows is not great. For some processing with huge tables, they can perform well when only a clustered index is required, and when the nature of the processing makes an index seek very effective. Table Variables are scoped to the batch or procedure and are unlikely to hang about in the TempDB when they are no longer required. They require no explicit cleanup. Where the number of rows in the table is moderate, you can even use them in joins as ‘Heaps’, unindexed. Beware, however, since, as the number of rows increase, joins on Table Variable heaps can easily become saddled by very poor execution plans, and this must be cured either by adding constraints (UNIQUE or PRIMARY KEY) or by adding the OPTION (RECOMPILE) hint if this is impossible. Occasionally, the way that the data is distributed prevents the efficient use of Table Variables, and this will require using a temporary table instead. Tables Variables require some awareness by the developer about the potential hazards and how to avoid them. If you are not prepared to do any performance monitoring of your code or fine-tuning, and just want to pummel out stuff that ‘just runs’ without considering namby-pamby stuff such as indexes, then stick to Temporary tables. If you are likely to slosh about large numbers of rows in temporary tables without considering the niceties of processing just what is required and no more, then temporary tables provide a safer and less fragile means-to-an-end for you.

    Read the article

  • Stumbling Through: Visual Studio 2010 (Part IV)

    So finally we get to the fun part the fruits of all of our middle-tier/back end labors of generating classes to interface with an XML data source that the previous posts were about can now be presented quickly and easily to an end user.  I think.  Well see.  Well be using a WPF window to display all of our various MFL information that weve collected in the two XML files, and well provide a means of adding, updating and deleting each of these entities using as little code as possible.  Additionally, I would like to dig into the performance of this solution as well as the flexibility of it if were were to modify the underlying XML schema.  So first things first, lets create a WPF project and include our xml data in a data folder within.  On the main window, well drag out the following controls: A combo box to contain all of the teams A list box to show the players of the selected team, along with add/delete player buttons A text box tied to the selected players name, with a save button to save any changes made to the player name A combo box of all the available positions, tied to the currently selected players position A data grid tied to the statistics of the currently selected player, with add/delete statistic buttons This monstrosity of a form and its associated project will look like this (dont forget to reference the DataFoundation project from the Presentation project): To get to the visual data binding, as we learned in a previous post, you have to first make sure the project containing your bindable classes is compiled.  Do so, and then open the Data Sources pane to add a reference to the Teams and Positions classes in the DataFoundation project: Why only Team and Position?  Well, we will get to Players from Teams, and Statistics from Players so no need to make an interface for them as well see in a second.  As for Positions, well need a way to bind the dropdown to ALL positions they dont appear underneath any of the other classes so we need to reference it directly.  After adding these guys, expand every node in your Data Sources pane and see how the Team node allows you to drill into Players and then Statistics.  This is why there was no need to bring in a reference to those classes for the UI we are designing: Now for the seriously hard work of binding all of our controls to the correct data sources.  Drag the following items from the Data Sources pane to the specified control on the window design canvas: Team.Name > Teams combo box Team.Players.Name > Players list box Team.Players.Name > Player name text box Team.Players.Statistics > Statistics data grid Position.Name > Positions combo box That is it!  Really?  Well, no, not really there is one caveat here in that the Positions combo box is not bound the selected players position.  To do so, we will apply a binding to the position combo boxs SelectedValue to point to the current players PositionId value: That should do the trick now, all we need to worry about is loading the actual data.  Sadly, it appears as if we will need to drop to code in order to invoke our IO methods to load all teams and positions.  At least Visual Studio kindly created the stubs for us to do so, ultimately the code should look like this: Note the weirdness with the InitializeDataFiles call that is my current means of telling an IO where to load the data for each of the entities.  I havent thought of a more intuitive way than that yet, but do note that all data is loaded from Teams.xml besides for positions, which is loaded from Lookups.xml.   I think that may be all we need to do to at least load all of the data, lets run it and see: Yay!  All of our glorious data is being displayed!  Er, wait, whats up with the position dropdown?  Why is it red?  Lets select the RB and see if everything updates: Crap, the position didnt update to reflect the selected player, but everything else did.  Where did we go wrong in binding the position to the selected player?  Thinking about it a bit and comparing it to how traditional data binding works, I realize that we never set the value member (or some similar property) to tell the control to join the Id of the source (positions) to the position Id of the player.  I dont see a similar property to that on the combo box control, but I do see a property named SelectedValuePath that might be it, so I set it to Id and run the app again: Hey, all right!  No red box around the positions combo box.  Unfortunately, selecting the RB does not update the dropdown to point to Runningback.  Hmmm.  Now what could it be?  Maybe the problem is that we are loading teams before we are loading positions, so when it binds position Id, all of the positions arent loaded yet.  I went to the code behind and switched things so position loads first and no dice.  Same result when I run.  Why?  WHY?  Ok, ok, calm down, take a deep breath.  Get something with caffeine or sugar (preferably both) and think rationally. Ok, gigantic chocolate chip cookie and a mountain dew chaser have never let me down in the past, so dont fail me now!  Ah ha!  of course!  I didnt even have to finish the mountain dew and I think Ive got it:  Data Context.  By default, when setting on the selected value binding for the dropdown, the data context was list_team.  I dont even know what the heck list_team is, we want it to be bound to our team players view source resource instead, like this: Running it now and selecting the various players: Done and done.  Everything read and bound, thank you caffeine and sugar!  Oh, and thank you Visual Studio 2010.  Lets wire up some of those buttons now There has got to be a better way to do this, but it works for now.  What the add player button does is add a new player object to the currently selected team.  Unfortunately, I couldnt get the new object to automatically show up in the players list (something about not using an observable collection gotta look into this) so I just save the change immediately and reload the screen.  Terrible, but it works: Lets go after something easier:  The save button.  By default, as we type in new text for the players name, it is showing up in the list box as updated.  Cool!  Why couldnt my add new player logic do that?  Anyway, the save button should be as simple as invoking MFL.IO.Save for the selected player, like this: MFL.IO.Save((MFL.Player)lbTeamPlayers.SelectedItem, true); Surprisingly, that worked on the first try.  Lets see if we get as lucky with the Delete player button: MFL.IO.Delete((MFL.Player)lbTeamPlayers.SelectedItem); Refresh(); Note the use of the Refresh method again I cant seem to figure out why updates to the underlying data source are immediately reflected, but adds and deletes are not.  That is a problem for another day, and again my hunch is that I should be binding to something more complex than IEnumerable (like observable collection). Now that an example of the basic CRUD methods are wired up, I want to quickly investigate the performance of this beast.  Im going to make a special button to add 30 teams, each with 50 players and 10 seasons worth of stats.  If my math is right, that will end up with 15000 rows of data, a pretty hefty amount for an XML file.  The save of all this new data took a little over a minute, but that is acceptable because we wouldnt typically be saving batches of 15k records, and the resulting XML file size is a little over a megabyte.  Not huge, but big enough to see some read performance numbers or so I thought.  It reads this file and renders the first team in under a second.  That is unbelievable, but we are lazy loading and the file really wasnt that big.  I will increase it to 50 teams with 100 players and 20 seasons each - 100,000 rows.  It took a year and a half to save all of that data, and resulted in an 8 megabyte file.  Seriously, if you are loading XML files this large, get a freaking database!  Despite this, it STILL takes under a second to load and render the first team, which is interesting mostly because I thought that it was loading that entire 8 MB XML file behind the scenes.  I have to say that I am quite impressed with the performance of the LINQ to XML approach, particularly since I took no efforts to optimize any of this code and was fairly new to the concept from the start.  There might be some merit to this little project after all Look out SQL Server and Oracle, use XML files instead!  Next up, I am going to completely pull the rug out from under the UI and change a number of entities in our model.  How well will the code be regenerated?  How much effort will be required to tie things back together in the UI?Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • Oracle performance problem

    - by jreid42
    We are using an Oracle 11G machine that is very powerful; has redundant storage etc. It's a beast from what I have been told. We just got this DB for a tool that when I first came on as a coop had like 20 people using, now its upwards of 150 people. I am the only one working on it :( We currently have a system in place that distributes PERL scripts across our entire data center essentially giving us a sort of "grid" computing power. The Perl scripts run a sort of simulation and report back the results to the database. They do selects / inserts. The load is not very high for each script but it could be happening across 20-50 systems at the same time. We then have multiple data centers and users all hitting the same database with this same approach. Our main problem with this is that our database is getting overloaded with connections and having to drop some. We sometimes have upwards of 500 connections. These are old perl scripts and they do not handle this well. Essentially they fail and the results are lost. I would rather avoid having to rewrite a lot of these as they are poorly written, and are a headache to even look at. The database itself is not overloaded, just the connection overhead is too high. We open a connection, make a quick query and then drop the connection. Very short connections but many of them. The database team has basically said we need to lower the number of connections or they are going to ignore us. Because this is distributed across our farm we cant implement persistent connections. I do this with our webserver; but its on a fixed system. The other ones are perl scripts that get opened and closed by the distribution tool and thus arent always running. What would be my best approach to resolving this issue? The scripts themselves can wait for a connection to be open. They do not need to act immediately. Some sort of queing system? I've been suggested to set up a few instances of a tool called "SQL Relay". Maybe one in each data center. How reliable is this tool? How good is this approach? Would it work for what we need? We could have one for each data center and relay requests through it to our main database, keeping a pipeline of open persistent connections? Does this make sense? Is there any other suggestions you can make? Any ideas? Any help would be greatly appreciated. Sadly I am just a coop student working for a very big company and somehow all of this has landed all on my shoulders (there is literally nobody to ask for help; its a hardware company, everybody is hardware engineers, and the database team is useless and in India) and I am quite lost as what the best approach would be? I am extremely overworked and this problem is interfering with on going progress and basically needs to be resolved as quickly as possible; preferably without rewriting the whole system, purchasing hardware (not gonna happen), or shooting myself in the foot. HELP LOL!

    Read the article

  • python list mysteriously getting set to something within my django/piston handler

    - by Anverc
    To start, I'm very new to python, let alone Django and Piston. Anyway, I've created a new BaseHandler class "class BaseApiHandler(BaseHandler)" so that I can extend some of the stff that BaseHandler does. This has been working fine until I added a new filter that could limit results to the first or last result. Now I can refresh the api page over and over and sometimes it will limit the result even if I don't include /limit/whatever in my URL... I've added some debug info into my return value to see what is happening, and that's when it gets more weird. this return value will make more sense after you see the code, but here they are for reference: When the results are correct: "statusmsg": "2 hours_detail found with query: {'empid':'22','datestamp':'2009-03-02',}", when the results are incorrect (once you read the code you'll notice two things wrong. First, it doesn't have 'limit':'None', secondly it shouldn't even get this far to begin with. "statusmsg": "1 hours_detail found with query: {'empid':'22','datestamp':'2009-03-02',with limit[0,1](limit,None),}", It may be important to note that I'm the only person with access to the server running this right now, so even if it was a cache issue, it doesn't make sense that I can just refresh and get different results by hitting F5 while viewing: http://localhost/api/hours_detail/datestamp/2009-03-02/empid/22 Here's the code broken into urls.py and handlers.py so that you can see what i'm doing: URLS.PY urlpatterns = patterns('', #hours_detail/id/{id}/empid/{empid}/projid/{projid}/datestamp/{datestamp}/daterange/{fromdate}to{todate}/limit/{first|last}/exact #empid is required # id, empid, projid, datestamp, daterange can be in any order url(r'^api/hours_detail/(?:' + \ r'(?:[/]?id/(?P<id>\d+))?' + \ r'(?:[/]?empid/(?P<empid>\d+))?' + \ r'(?:[/]?projid/(?P<projid>\d+))?' + \ r'(?:[/]?datestamp/(?P<datestamp>\d{4,}[-/\.]\d{2,}[-/\.]\d{2,}))?' + \ r'(?:[/]?daterange/(?P<daterange>(?:\d{4,}[-/\.]\d{2,}[-/\.]\d{2,})(?:to|/-)(?:\d{4,}[-/\.]\d{2,}[-/\.]\d{2,})))?' + \ r')+' + \ r'(?:/limit/(?P<limit>(?:first|last)))?' + \ r'(?:/(?P<exact>exact))?$', hours_detail_resource), HANDLERS.PY # inherit from BaseHandler to add the extra functionality i need to process the possibly null URL params class BaseApiHandler(BaseHandler): # keep track of the handler so the data is represented back to me correctly post_name = 'base' # THIS IS THE LIST IN QUESTION - SOMETIMES IT IS GETTING SET TO [0,1] MYSTERIOUSLY # this gets set to a list when the results are to be limited limit = None def has_limit(self): return (isinstance(self.limit, list) and len(self.limit) == 2) def process_kwarg_read(self, key, value, d_post, b_exact): """ this should be overridden in the derived classes to process kwargs """ pass # override 'read' so we can better handle our api's searching capabilities def read(self, request, *args, **kwargs): d_post = {'status':0,'statusmsg':'Nothing Happened'} try: # setup the named response object # select all employees then filter - querysets are lazy in django # the actual query is only done once data is needed, so this may # seem like some memory hog slow beast, but it's actually not. d_post[self.post_name] = self.queryset(request) # this is a string that holds debug information... it's the string I mentioned before pasting this code s_query = '' b_exact = False if 'exact' in kwargs and kwargs['exact'] <> None: b_exact = True s_query = '\'exact\':True,' for key,value in kwargs.iteritems(): # the regex url possibilities will push None into the kwargs dictionary # if not specified, so just continue looping through if that's the case if value == None or key == 'exact': continue # write to the s_query string so we have a nice error message s_query = '%s\'%s\':\'%s\',' % (s_query, key, value) # now process this key/value kwarg self.process_kwarg_read(key=key, value=value, d_post=d_post, b_exact=b_exact) # end of the kwargs for loop else: if self.has_limit(): # THIS SEEMS TO GET HIT SOMETIMES IF YOU CONSTANTLY REFRESH THE API PAGE, EVEN THOUGH # THE LINE IN THE FOR LOOP WHICH UPDATES s_query DOESN'T GET HIS AND THUS self.process_kwarg_read ALSO # DOESN'T GET HIT SO NEITHER DOES limit = [0,1] s_query = '%swith limit[%s,%s](limit,%s),' % (s_query, self.limit[0], self.limit[1], kwargs['limit']) d_post[self.post_name] = d_post[self.post_name][self.limit[0]:self.limit[1]] if d_post[self.post_name].count() == 0: d_post['status'] = 0 d_post['statusmsg'] = '%s not found with query: {%s}' % (self.post_name, s_query) else: d_post['status'] = 1 d_post['statusmsg'] = '%s %s found with query: {%s}' % (d_post[self.post_name].count(), self.post_name, s_query) except: e = sys.exc_info()[1] d_post['status'] = 0 d_post['statusmsg'] = 'error: %s' % e d_post[self.post_name] = [] return d_post class HoursDetailHandler(BaseApiHandler): #allowed_methods = ('GET',) model = HoursDetail exclude = () post_name = 'hours_detail' def process_kwarg_read(self, key, value, d_post, b_exact): if ... # I have several if/elif statements here that check for other things... # 'self.limit =' only shows up in the following elif: elif key == 'limit': order_by = 'clock_time' if value == 'last': order_by = '-clock_time' d_post[self.post_name] = d_post[self.post_name].order_by(order_by) # TO GET HERE, THE ONLY PLACE IN CODE WHERE self.limit IS SET, YOU MUST HAVE GONE THROUGH # THE value == None CHECK???? self.limit = [0, 1] else: raise NameError def read(self, request, *args, **kwargs): # empid is required, so make sure it exists before running BaseApiHandler's read method if not('empid' in kwargs and kwargs['empid'] <> None and kwargs['empid'] >= 0): return {'status':0,'statusmsg':'empid cannot be empty'} else: return BaseApiHandler.read(self, request, *args, **kwargs) Does anyone have a clue how else self.limit might be getting set to [0, 1] ? Am I misunderstanding kwargs or loops or anything in Python?

    Read the article

  • The Incremental Architect&acute;s Napkin &ndash; #3 &ndash; Make Evolvability inevitable

    - by Ralf Westphal
    Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/06/04/the-incremental-architectacutes-napkin-ndash-3-ndash-make-evolvability-inevitable.aspxThe easier something to measure the more likely it will be produced. Deviations between what is and what should be can be readily detected. That´s what automated acceptance tests are for. That´s what sprint reviews in Scrum are for. It´s no small wonder our software looks like it looks. It has all the traits whose conformance with requirements can easily be measured. And it´s lacking traits which cannot easily be measured. Evolvability (or Changeability) is such a trait. If an operation is correct, if an operation if fast enough, that can be checked very easily. But whether Evolvability is high or low, that cannot be checked by taking a measure or two. Evolvability might correlate with certain traits, e.g. number of lines of code (LOC) per function or Cyclomatic Complexity or test coverage. But there is no threshold value signalling “evolvability too low”; also Evolvability is hardly tangible for the customer. Nevertheless Evolvability is of great importance - at least in the long run. You can get away without much of it for a short time. Eventually, though, it´s needed like any other requirement. Or even more. Because without Evolvability no other requirement can be implemented. Evolvability is the foundation on which all else is build. Such fundamental importance is in stark contrast with its immeasurability. To compensate this, Evolvability must be put at the very center of software development. It must become the hub around everything else revolves. Since we cannot measure Evolvability, though, we cannot start watching it more. Instead we need to establish practices to keep it high (enough) at all times. Chefs have known that for long. That´s why everybody in a restaurant kitchen is constantly seeing after cleanliness. Hygiene is important as is to have clean tools at standardized locations. Only then the health of the patrons can be guaranteed and production efficiency is constantly high. Still a kitchen´s level of cleanliness is easier to measure than software Evolvability. That´s why important practices like reviews, pair programming, or TDD are not enough, I guess. What we need to keep Evolvability in focus and high is… to continually evolve. Change must not be something to avoid but too embrace. To me that means the whole change cycle from requirement analysis to delivery needs to be gone through more often. Scrum´s sprints of 4, 2 even 1 week are too long. Kanban´s flow of user stories across is too unreliable; it takes as long as it takes. Instead we should fix the cycle time at 2 days max. I call that Spinning. No increment must take longer than from this morning until tomorrow evening to finish. Then it should be acceptance checked by the customer (or his/her representative, e.g. a Product Owner). For me there are several resasons for such a fixed and short cycle time for each increment: Clear expectations Absolute estimates (“This will take X days to complete.”) are near impossible in software development as explained previously. Too much unplanned research and engineering work lurk in every feature. And then pervasive interruptions of work by peers and management. However, the smaller the scope the better our absolute estimates become. That´s because we understand better what really are the requirements and what the solution should look like. But maybe more importantly the shorter the timespan the more we can control how we use our time. So much can happen over the course of a week and longer timespans. But if push comes to shove I can block out all distractions and interruptions for a day or possibly two. That´s why I believe we can give rough absolute estimates on 3 levels: Noon Tonight Tomorrow Think of a meeting with a Product Owner at 8:30 in the morning. If she asks you, how long it will take you to implement a user story or bug fix, you can say, “It´ll be fixed by noon.”, or you can say, “I can manage to implement it until tonight before I leave.”, or you can say, “You´ll get it by tomorrow night at latest.” Yes, I believe all else would be naive. If you´re not confident to get something done by tomorrow night (some 34h from now) you just cannot reliably commit to any timeframe. That means you should not promise anything, you should not even start working on the issue. So when estimating use these four categories: Noon, Tonight, Tomorrow, NoClue - with NoClue meaning the requirement needs to be broken down further so each aspect can be assigned to one of the first three categories. If you like absolute estimates, here you go. But don´t do deep estimates. Don´t estimate dozens of issues; don´t think ahead (“Issue A is a Tonight, then B will be a Tomorrow, after that it´s C as a Noon, finally D is a Tonight - that´s what I´ll do this week.”). Just estimate so Work-in-Progress (WIP) is 1 for everybody - plus a small number of buffer issues. To be blunt: Yes, this makes promises impossible as to what a team will deliver in terms of scope at a certain date in the future. But it will give a Product Owner a clear picture of what to pull for acceptance feedback tonight and tomorrow. Trust through reliability Our trade is lacking trust. Customers don´t trust software companies/departments much. Managers don´t trust developers much. I find that perfectly understandable in the light of what we´re trying to accomplish: delivering software in the face of uncertainty by means of material good production. Customers as well as managers still expect software development to be close to production of houses or cars. But that´s a fundamental misunderstanding. Software development ist development. It´s basically research. As software developers we´re constantly executing experiments to find out what really provides value to users. We don´t know what they need, we just have mediated hypothesises. That´s why we cannot reliably deliver on preposterous demands. So trust is out of the window in no time. If we switch to delivering in short cycles, though, we can regain trust. Because estimates - explicit or implicit - up to 32 hours at most can be satisfied. I´d say: reliability over scope. It´s more important to reliably deliver what was promised then to cover a lot of requirement area. So when in doubt promise less - but deliver without delay. Deliver on scope (Functionality and Quality); but also deliver on Evolvability, i.e. on inner quality according to accepted principles. Always. Trust will be the reward. Less complexity of communication will follow. More goodwill buffer will follow. So don´t wait for some Kanban board to show you, that flow can be improved by scheduling smaller stories. You don´t need to learn that the hard way. Just start with small batch sizes of three different sizes. Fast feedback What has been finished can be checked for acceptance. Why wait for a sprint of several weeks to end? Why let the mental model of the issue and its solution dissipate? If you get final feedback after one or two weeks, you hardly remember what you did and why you did it. Resoning becomes hard. But more importantly youo probably are not in the mood anymore to go back to something you deemed done a long time ago. It´s boring, it´s frustrating to open up that mental box again. Learning is harder the longer it takes from event to feedback. Effort can be wasted between event (finishing an issue) and feedback, because other work might go in the wrong direction based on false premises. Checking finished issues for acceptance is the most important task of a Product Owner. It´s even more important than planning new issues. Because as long as work started is not released (accepted) it´s potential waste. So before starting new work better make sure work already done has value. By putting the emphasis on acceptance rather than planning true pull is established. As long as planning and starting work is more important, it´s a push process. Accept a Noon issue on the same day before leaving. Accept a Tonight issue before leaving today or first thing tomorrow morning. Accept a Tomorrow issue tomorrow night before leaving or early the day after tomorrow. After acceptance the developer(s) can start working on the next issue. Flexibility As if reliability/trust and fast feedback for less waste weren´t enough economic incentive, there is flexibility. After each issue the Product Owner can change course. If on Monday morning feature slices A, B, C, D, E were important and A, B, C were scheduled for acceptance by Monday evening and Tuesday evening, the Product Owner can change her mind at any time. Maybe after A got accepted she asks for continuation with D. But maybe, just maybe, she has gotten a completely different idea by then. Maybe she wants work to continue on F. And after B it´s neither D nor E, but G. And after G it´s D. With Spinning every 32 hours at latest priorities can be changed. And nothing is lost. Because what got accepted is of value. It provides an incremental value to the customer/user. Or it provides internal value to the Product Owner as increased knowledge/decreased uncertainty. I find such reactivity over commitment economically very benefical. Why commit a team to some workload for several weeks? It´s unnecessary at beast, and inflexible and wasteful at worst. If we cannot promise delivery of a certain scope on a certain date - which is what customers/management usually want -, we can at least provide them with unpredecented flexibility in the face of high uncertainty. Where the path is not clear, cannot be clear, make small steps so you´re able to change your course at any time. Premature completion Customers/management are used to premeditating budgets. They want to know exactly how much to pay for a certain amount of requirements. That´s understandable. But it does not match with the nature of software development. We should know that by now. Maybe there´s somewhere in the world some team who can consistently deliver on scope, quality, and time, and budget. Great! Congratulations! I, however, haven´t seen such a team yet. Which does not mean it´s impossible, but I think it´s nothing I can recommend to strive for. Rather I´d say: Don´t try this at home. It might hurt you one way or the other. However, what we can do, is allow customers/management stop work on features at any moment. With spinning every 32 hours a feature can be declared as finished - even though it might not be completed according to initial definition. I think, progress over completion is an important offer software development can make. Why think in terms of completion beyond a promise for the next 32 hours? Isn´t it more important to constantly move forward? Step by step. We´re not running sprints, we´re not running marathons, not even ultra-marathons. We´re in the sport of running forever. That makes it futile to stare at the finishing line. The very concept of a burn-down chart is misleading (in most cases). Whoever can only think in terms of completed requirements shuts out the chance for saving money. The requirements for a features mostly are uncertain. So how does a Product Owner know in the first place, how much is needed. Maybe more than specified is needed - which gets uncovered step by step with each finished increment. Maybe less than specified is needed. After each 4–32 hour increment the Product Owner can do an experient (or invite users to an experiment) if a particular trait of the software system is already good enough. And if so, she can switch the attention to a different aspect. In the end, requirements A, B, C then could be finished just 70%, 80%, and 50%. What the heck? It´s good enough - for now. 33% money saved. Wouldn´t that be splendid? Isn´t that a stunning argument for any budget-sensitive customer? You can save money and still get what you need? Pull on practices So far, in addition to more trust, more flexibility, less money spent, Spinning led to “doing less” which also means less code which of course means higher Evolvability per se. Last but not least, though, I think Spinning´s short acceptance cycles have one more effect. They excert pull-power on all sorts of practices known for increasing Evolvability. If, for example, you believe high automated test coverage helps Evolvability by lowering the fear of inadverted damage to a code base, why isn´t 90% of the developer community practicing automated tests consistently? I think, the answer is simple: Because they can do without. Somehow they manage to do enough manual checks before their rare releases/acceptance checks to ensure good enough correctness - at least in the short term. The same goes for other practices like component orientation, continuous build/integration, code reviews etc. None of that is compelling, urgent, imperative. Something else always seems more important. So Evolvability principles and practices fall through the cracks most of the time - until a project hits a wall. Then everybody becomes desperate; but by then (re)gaining Evolvability has become as very, very difficult and tedious undertaking. Sometimes up to the point where the existence of a project/company is in danger. With Spinning that´s different. If you´re practicing Spinning you cannot avoid all those practices. With Spinning you very quickly realize you cannot deliver reliably even on your 32 hour promises. Spinning thus is pulling on developers to adopt principles and practices for Evolvability. They will start actively looking for ways to keep their delivery rate high. And if not, management will soon tell them to do that. Because first the Product Owner then management will notice an increasing difficulty to deliver value within 32 hours. There, finally there emerges a way to measure Evolvability: The more frequent developers tell the Product Owner there is no way to deliver anything worth of feedback until tomorrow night, the poorer Evolvability is. Don´t count the “WTF!”, count the “No way!” utterances. In closing For sustainable software development we need to put Evolvability first. Functionality and Quality must not rule software development but be implemented within a framework ensuring (enough) Evolvability. Since Evolvability cannot be measured easily, I think we need to put software development “under pressure”. Software needs to be changed more often, in smaller increments. Each increment being relevant to the customer/user in some way. That does not mean each increment is worthy of shipment. It´s sufficient to gain further insight from it. Increments primarily serve the reduction of uncertainty, not sales. Sales even needs to be decoupled from this incremental progress. No more promises to sales. No more delivery au point. Rather sales should look at a stream of accepted increments (or incremental releases) and scoup from that whatever they find valuable. Sales and marketing need to realize they should work on what´s there, not what might be possible in the future. But I digress… In my view a Spinning cycle - which is not easy to reach, which requires practice - is the core practice to compensate the immeasurability of Evolvability. From start to finish of each issue in 32 hours max - that´s the challenge we need to accept if we´re serious increasing Evolvability. Fortunately higher Evolvability is not the only outcome of Spinning. Customer/management will like the increased flexibility and “getting more bang for the buck”.

    Read the article

< Previous Page | 3 4 5 6 7