Search Results

Search found 590 results on 24 pages for 'tony ennis'.

Page 2/24 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

New Wine in New Bottles

- by Tony Davis

How many people, when their car shows signs of wear and tear, would consider upgrading the engine and keeping the shell? Even if you're cash-strapped, you'll soon work out the subtlety of the economics, the cost of sudden breakdowns, the precious time lost coping with the hassle, and the low 'book value'. You'll generally buy a new car. The same philosophy should apply to database systems. Mainstream support for SQL Server 2005 ends on April 12; many DBAS, if they haven't done so already, will be considering the migration to SQL Server 2008 R2. Hopefully, that upgrade plan will include a fresh install of the operating system on brand new hardware. SQL Server 2008 R2 and Windows Server 2008 R2 are designed to work together. The improved architecture, processing power, and hyper-threading capabilities of modern processors will dramatically improve the performance of many SQL Server workloads, and allow consolidation opportunities. Of course, there will be many DBAs smiling ruefully at the suggestion of such indulgence. This is nothing like the real world, this halcyon place where hardware and software budgets are limitless, development and testing resources are plentiful, and third party vendors immediately certify their applications for the latest-and-greatest platform! As with cars, or any other technology, the justification for a complete upgrade is complex. With Servers, the extra cost at time of upgrade will generally pay you back in terms of the increased performance of your business applications, reduced maintenance costs, training costs and downtime. Also, if you plan and design carefully, it's possible to offset hardware costs with reduced SQL Server licence costs. In his forthcoming SQL Server Hardware book, Glenn Berry describes a recent case where he was able to replace 4 single-socket database servers with one two-socket server, saving about $90K in hardware costs and $350K in SQL Server license costs. Of course, there are exceptions. If you do have a stable, reliable, secure SQL Server 6.5 system that still admirably meets the needs of a specific business requirement, and has no security vulnerabilities, then by all means leave it alone. Why upgrade just for the sake of it? However, as soon as a system shows sign of being unfit for purpose, or is moving out of mainstream support, the ruthless DBA will make the strongest possible case for a belts-and-braces upgrade. We'd love to hear what you think. What does your typical upgrade path look like? What are the major obstacles? Cheers, Tony.

Read the article
Hype and LINQ

- by Tony Davis

"Tired of querying in antiquated SQL?" I blinked in astonishment when I saw this headline on the LinqPad site. Warming to its theme, the site suggests that what we need is to "kiss goodbye to SSMS", and instead use LINQ, a modern query language! Elsewhere, there is an article entitled "Why LINQ beats SQL". The designers of LINQ, along with many DBAs, would, I'm sure, cringe with embarrassment at the suggestion that LINQ and SQL are, in any sense, competitive ways of doing the same thing. In fact what LINQ really is, at last, is an efficient, declarative language for C# and VB programmers to access or manipulate data in objects, local data stores, ORMs, web services, data repositories, and, yes, even relational databases. The fact is that LINQ is essentially declarative programming in a .NET language, and so in many ways encourages developers into a "SQL-like" mindset, even though they are not directly writing SQL. In place of imperative logic and loops, it uses various expressions, operators and declarative logic to build up an "expression tree" describing only what data is required, not the operations to be performed to get it. This expression tree is then parsed by the language compiler, and the result, when used against a relational database, is a SQL string that, while perhaps not always perfect, is often correctly parameterized and certainly no less "optimal" than what is achieved when a developer applies blunt, imperative logic to the SQL language. From a developer standpoint, it is a mistake to consider LINQ simply as a substitute means of querying SQL Server. The strength of LINQ is that that can be used to access any data source, for which a LINQ provider exists. Microsoft supplies built-in providers to access not just SQL Server, but also XML documents, .NET objects, ADO.NET datasets, and Entity Framework elements. LINQ-to-Objects is particularly interesting in that it allows a declarative means to access and manipulate arrays, collections and so on. Furthermore, as Michael Sorens points out in his excellent article on LINQ, there a whole host of third-party LINQ providers, that offers a simple way to get at data in Excel, Google, Flickr and much more, without having to learn a new interface or language. Of course, the need to be generic enough to deal with a range of data sources, from something as mundane as a text file to as esoteric as a relational database, means that LINQ is a compromise and so has inherent limitations. However, it is a powerful and beautifully compact language and one that, at least in its "query syntax" guise, is accessible to developers and DBAs alike. Perhaps there is still hope that LINQ can fulfill Phil Factor's lobster-induced fantasy of a language that will allow us to "treat all data objects, whether Word files, Excel files, XML, relational databases, text files, HTML files, registry files, LDAPs, Outlook and so on, in the same logical way, as linked databases, and extract the metadata, create the entities and relationships in the same way, and use the same SQL syntax to interrogate, create, read, write and update them." Cheers, Tony.

Read the article
A Community Cure for a String Splitting Headache

- by Tony Davis

A heartwarming tale of dogged perseverance and Community collaboration to solve some SQL Server string-related headaches. Michael J Swart posted a blog this week that had me smiling in recognition and agreement, describing how an inquisitive Developer or DBA deals with a problem. It's a three-step process, starting with discomfort and anxiety; a feeling that one doesn't know as much about one's chosen specialized subject as previously thought. It progresses through a phase of intense research and learning until finally one achieves breakthrough, blessed relief and renewed optimism. In this case, the discomfort was provoked by the mystery of massively high CPU when searching Unicode strings in SQL Server. Michael explored the problem via Stack Overflow, Google and Twitter #sqlhelp, finally leading to resolution and a blog post that shared what he learned. Perfect; except that sometimes you have to be prepared to share what you've learned so far, while still mired in the phase of nagging discomfort. A good recent example of this recently can be found on our own blogs. Despite being a loud advocate of the lightning fast T-SQL-based string splitting techniques, honed to near perfection over many years by Jeff Moden and others, Phil Factor retained a dogged conviction that, in theory, shredding element-based XML using XQuery ought to be even more efficient for splitting a string to create a table. After some careful testing, he found instead that the XML way performed and scaled miserably by comparison. Somewhat subdued, and with a nagging feeling that perhaps he was still missing "something", he posted his findings. What happened next was a joy to behold; the community jumped in to suggest subtle changes in approach, using an attribute-based rather than element-based XML list, and tweaking the XQuery shredding. The result was performance and scalability that surpassed all other techniques. I asked Phil how quickly he would have arrived at the real breakthrough on his own. His candid answer was "never". Both are great examples of the power of Community learning and the latter in particular the importance of being brave enough to parade one's ignorance. Perhaps Jeff Moden will accept the string-splitting gauntlet one more time. To quote the great man: you've just got to love this community! If you've an interesting tale to tell about being helped to a significant breakthrough for a problem by the community, I'd love to hear about it. Cheers, Tony.

Read the article
Aptronyms: fitting the profession to the name

- by Tony Davis

Writing a recent piece on the pains of index fragmentation, I found myself wondering why, in SQL Server, you can’t set the equivalent of a fill factor, on a heap table. I scratched my head…who might know? Phil Factor, of course! I approached him with a due sense of optimism only to find that not only did he not know, he also didn’t seem to care much either. I skulked off thinking how this may be the final nail in the coffin of nominative determinism. I’ve always wondered if there was anything in it, though. If your surname is Plumb or Leeks, is there even a tiny, extra percentage chance that you’ll end up fitting bathrooms? Some examples are quite common. I’m sure we’ve all met teachers called English or French, or lawyers called Judge or Laws. I’ve also known a Doctor called Coffin, a Urologist called Waterfall, and a Dentist called Dentith. Two personal favorites are Wolfgang Wolf who ended up managing the German Soccer team, Wolfsburg, and Edmund Akenhead, a Crossword Editor for The Times newspaper. Having forgiven Phil his earlier offhandedness, I asked him for if he knew of any notable examples. He had met the famous Dr. Batty and Dr. Nutter, both Psychiatrists, knew undertakers called Death and Stiff, had read a book by Frederick Page-Turner, and suppressed a giggle at the idea of a feminist called Gurley-Brown. He even managed to better my Urologist example, citing the article on incontinence in the British Journal of Urology (vol.49, pp.173-176, 1977) by A. J. Splatt and D. Weedon. What, however, if you were keen to gently nudge your child down the path to a career in IT? What name would you choose? Subtlety probably doesn’t really work, although in a recent interview, Rodney Landrum did congratulate PowerShell MVP Max Trinidad on being named after a SQL function. Grant “The Memory” Fritchey (OK, I made up that nickname) doesn’t do badly either. Some surnames, seem to offer a natural head start, although I know of no members of the Page-Reid clan in the profession. There are certainly families with the Table surname, although sadly, Little Bobby Tables was merely a legend by xkcd. A member of the well-known Key family would need to name their son Primary, or maybe live abroad, to make their mark. Nominate your examples of people seemingly destined, by name, for their chosen profession (extra points for IT). The best three will receive a prize. Cheers, Tony.

Read the article
Recovering an Ubuntu installation - Ubuntu eats itself after 'sudo apt-get install -f'

- by Tony Martin

Updater (I assume) put a no entry style alert icon on the panel which informed me that certain package dependencies were not up to snuff. Upgrades were thereafter only partial. The dialogue advised that I sudo apt-get install -f. I did this hoping that app-get would fulfil dependencies and replace corrupted files and watched it systematically remove every component of linux, both the stuff I had installed and the core ubuntu packages. I could only assume at this stage that this was in preparation for a fresh install but, of course, I know better now - if you find yourself with apt-get warning you that you are about to remove several hundred packages and asking you to type an involved confirmation string seek advice before proceeding. I digress. This was a 64 bit install of 12.04. All that is left is grub pointing to a couple of windows recovery partitions on the hard drive. Thankfully the Ext4 partition is reachable from a stick boot. EDIT: I've logged onto the machine with a 64 bit stick and can see the file structure left behind by apt-get after {ahem} fixing. My first instinct was to run install from the stick but it seemed to want to do another install rather than a repair. My question then: is there a way to recover the current installation so that if I reinstall the packages I had they will pick up the original settings? I'm particularly worried about losing email from evolution - the rest I could probably lash back together. As for the use of PPA I'm not sure what you're driving at. I generally use Ubuntu Software Centre to install software, though I have used terminal scripts to add new repositories and software successfully following guidance on various websites. The most recent change I made was a downgrade of Wine in an attempt to install and run excel2007 (a necessity, I think, as I have VBA work to do). The installer had stalled and had to be killed. I wonder if that corrupted whatever database holds a model of the package installation structure. I would also be interested to know how this disaster came about. I see people in the know recommending the sudo apt-get install -f as a fairly innocuous cure in similar circumstances. Thanks for your attention, Tony Martin p.s. Do please forgive the rant aspects of the original post. It's hard to write rationally with a large hole in the pit of your stomach.

Read the article
On the art of self-promotion

- by Tony Davis

I attended Brent Ozar's Building the Fastest SQL Servers session at Tech Ed last week, and found myself engulfed in a 'perfect storm' of excellent technical and presentational skills coupled with an astute awareness of the value of promoting one's work. I spend a lot of time at such events talking to developers and DBAs about the value of blogging and writing articles, and my impression is that some could benefit from a touch less modesty and a little more self-promotion. I sense a reticence in many would-be writers. Is what I have to say important enough? Haven't far more qualified and established commentators, MVPs and so on, already said it? While it's a good idea to pick reasonably fresh and interesting topics, it's more important not to let such fears lead to writer's block. In the eyes of any future employer, your published writing is an extension of your resume. They will not care that a certain MVP knows how to solve problem x, but they will be very interested to see that you have tackled that same problem, and solved it in your own way, and described the process in your own voice. In your current job, your writing is one of the ways you can express to your peers, and to the organization as a whole, the value of what you contribute. Many Developers and DBAs seem to rely on the idea that their work will speak for itself, and that their skill shines out from it. Unfortunately, this isn't always true. Many Development DBAs, for example, will be painfully aware of the massive effort involved in tuning and adding resilience to rapidly developed applications. However, others in the organization who are unaware of what's involved in getting an application that is 'done' ready for production may dismiss such efforts as fussiness or conservatism. At the dark end of the development cycle, chickens come home to roost, but their droppings tend to land on those trying to clear up the mess. My advice is this: next time you fix a bug or improve the resilience or performance of a database or application, make sure that you use team meetings, informal discussions and so on to ensure that people understand what the problem was and what you had to do to fix it. Use your blog to describe, generally, the process you adopted, the resources you used and the insights that came from your work. Encourage your colleagues to do the same. By spreading the art of self-promotion to everyone involved in an IT project, we get a better idea of the extent of the work and the value of the contribution of all the team members. As always, we'd love to hear what you think. This very week, Simple-talk launches its new blogging platform. If any of this has moved you to 'throw your hat into the ring', drop us a mail at [email protected]. Cheers, Tony.

Read the article
On the art of self-promotion

- by Tony Davis

I attended Brent Ozar’s Building the Fastest SQL Servers session at Tech Ed last week, and found myself engulfed in a ‘perfect storm’ of excellent technical and presentational skills coupled with an astute awareness of the value of promoting one’s work. I spend a lot of time at such events talking to developers and DBAs about the value of blogging and writing articles, and my impression is that some could benefit from a touch less modesty and a little more self-promotion. I sense a reticence in many would-be writers. Is what I have to say important enough? Haven’t far more qualified and established commentators, MVPs and so on, already said it? While it’s a good idea to pick reasonably fresh and interesting topics, it’s more important not to let such fears lead to writer’s block. In the eyes of any future employer, your published writing is an extension of your resume. They will not care that a certain MVP knows how to solve problem x, but they will be very interested to see that you have tackled that same problem, and solved it in your own way, and described the process in your own voice. In your current job, your writing is one of the ways you can express to your peers, and to the organization as a whole, the value of what you contribute. Many Developers and DBAs seem to rely on the idea that their work will speak for itself, and that their skill shines out from it. Unfortunately, this isn’t always true. Many Development DBAs, for example, will be painfully aware of the massive effort involved in tuning and adding resilience to rapidly developed applications. However, others in the organization who are unaware of what’s involved in getting an application that is ‘done’ ready for production may dismiss such efforts as fussiness or conservatism. At the dark end of the development cycle, chickens come home to roost, but their droppings tend to land on those trying to clear up the mess. My advice is this: next time you fix a bug or improve the resilience or performance of a database or application, make sure that you use team meetings, informal discussions and so on to ensure that people understand what the problem was and what you had to do to fix it. Use your blog to describe, generally, the process you adopted, the resources you used and the insights that came from your work. Encourage your colleagues to do the same. By spreading the art of self-promotion to everyone involved in an IT project, we get a better idea of the extent of the work and the value of the contribution of all the team members. As always, we’d love to hear what you think. This very week, Simple-talk launches its new blogging platform. If any of this has moved you to ‘throw your hat into the ring’, drop us a mail at [email protected]. Cheers, Tony.

Read the article
Cheating on Technical Debt

- by Tony Davis

One bad practice guaranteed to cause dismay amongst your colleagues is passing on technical debt without full disclosure. There could only be two reasons for this. Either the developer or DBA didn’t know the difference between good and bad practices, or concealed the debt. Neither reflects well on their professional competence. Technical debt, or code debt, is a convenient term to cover all the compromises between the ideal solution and the actual solution, reflecting the reality of the pressures of commercial coding. The one time you’re guaranteed to hear one developer, or DBA, pass judgment on another is when he or she inherits their project, and is surprised by the amount of technical debt left lying around in the form of inelegant architecture, incomplete tests, confusing interface design, no documentation, and so on. It is often expedient for a Project Manager to ignore the build-up of technical debt, the cut corners, not-quite-finished features and rushed designs that mean progress is satisfyingly rapid in the short term. It’s far less satisfying for the poor person who inherits the code. Nothing sends a colder chill down the spine than the dawning realization that you’ve inherited a system crippled with performance and functional issues that will take months of pain to fix before you can even begin to make progress on any of the planned new features. It’s often hard to justify this ‘debt paying’ time to the project owners and managers. It just looks as if you are making no progress, in marked contrast to your predecessor. There can be many good reasons for allowing technical debt to build up, at least in the short term. Often, rapid prototyping is essential, there is a temporary shortfall in test resources, or the domain knowledge is incomplete. It may be necessary to hit a specific deadline with a prototype, or proof-of-concept, to explore a possible market opportunity, with planned iterations and refactoring to follow later. However, it is a crime for a developer to build up technical debt without making this clear to the project participants. He or she needs to record it explicitly. A design compromise made in to order to hit a deadline, be it an outright hack, or a decision made without time for rigorous investigation and testing, needs to be documented with the same rigor that one tracks a bug. What’s the best way to do this? Ideally, we’d have some kind of objective assessment of the level of technical debt in a software project, although that smacks of Science Fiction even as I write it. I’d be interested of hear of any methods you’ve used, but I’m sure most teams have to rely simply on the integrity of their colleagues and the clear perceptions of the project manager… Cheers, Tony.

Read the article
On the art of self-promotion

- by Tony Davis

I attended Brent Ozar's Building the Fastest SQL Servers session at Tech Ed last week, and found myself engulfed in a 'perfect storm' of excellent technical and presentational skills coupled with an astute awareness of the value of promoting one's work. I spend a lot of time at such events talking to developers and DBAs about the value of blogging and writing articles, and my impression is that some could benefit from a touch less modesty and a little more self-promotion. I sense a reticence in many would-be writers. Is what I have to say important enough? Haven't far more qualified and established commentators, MVPs and so on, already said it? While it's a good idea to pick reasonably fresh and interesting topics, it's more important not to let such fears lead to writer's block. In the eyes of any future employer, your published writing is an extension of your resume. They will not care that a certain MVP knows how to solve problem x, but they will be very interested to see that you have tackled that same problem, and solved it in your own way, and described the process in your own voice. In your current job, your writing is one of the ways you can express to your peers, and to the organization as a whole, the value of what you contribute. Many Developers and DBAs seem to rely on the idea that their work will speak for itself, and that their skill shines out from it. Unfortunately, this isn't always true. Many Development DBAs, for example, will be painfully aware of the massive effort involved in tuning and adding resilience to rapidly developed applications. However, others in the organization who are unaware of what's involved in getting an application that is 'done' ready for production may dismiss such efforts as fussiness or conservatism. At the dark end of the development cycle, chickens come home to roost, but their droppings tend to land on those trying to clear up the mess. My advice is this: next time you fix a bug or improve the resilience or performance of a database or application, make sure that you use team meetings, informal discussions and so on to ensure that people understand what the problem was and what you had to do to fix it. Use your blog to describe, generally, the process you adopted, the resources you used and the insights that came from your work. Encourage your colleagues to do the same. By spreading the art of self-promotion to everyone involved in an IT project, we get a better idea of the extent of the work and the value of the contribution of all the team members. As always, we'd love to hear what you think. This very week, Simple-talk launches its new blogging platform. If any of this has moved you to 'throw your hat into the ring', drop us a mail at [email protected]. Cheers, Tony.

Read the article
Data Model Dissonance

- by Tony Davis

So often at the start of the development of database applications, there is a premature rush to the keyboard. Unless, before we get there, we’ve mapped out and agreed the three data models, the Conceptual, the Logical and the Physical, then the inevitable refactoring will dog development work. It pays to get the data models sorted out up-front, however ‘agile’ you profess to be. The hardest model to get right, the most misunderstood, and the one most neglected by the various modeling tools, is the conceptual data model, and yet it is critical to all that follows. The conceptual model distils what the business understands about itself, and the way it operates. It represents the business rules that govern the required data, its constraints and its properties. The conceptual model uses the terminology of the business and defines the most important entities and their inter-relationships. Don’t assume that the organization’s understanding of these business rules is consistent or accurate. Too often, one department has a subtly different understanding of what an entity means and what it stores, from another. If our conceptual data model fails to resolve such inconsistencies, it will reduce data quality. If we don’t collect and measure the raw data in a consistent way across the whole business, how can we hope to perform meaningful aggregation? The conceptual data model has more to do with business than technology, and as such, developers often regard it as a worthy but rather arcane ceremony like saluting the flag or only eating fish on Friday. However, the consequences of getting it wrong have a direct and painful impact on many aspects of the project. If you adopt a silo-based (a.k.a. Domain driven) approach to development), you are still likely to suffer by starting with an incomplete knowledge of the domain. Even when you have surmounted these problems so that the data entities accurately reflect the business domain that the application represents, there are likely to be dire consequences from abandoning the goal of a shared, enterprise-wide understanding of the business. In reading this, you may recall experiences of the consequence of getting the conceptual data model wrong. I believe that Phil Factor, for example, witnessed the abandonment of a multi-million dollar banking project due to an inadequate conceptual analysis of how the bank defined a ‘customer’. We’d love to hear of any examples you know of development projects poleaxed by errors in the conceptual data model. Cheers, Tony

Read the article
From NaN to Infinity...and Beyond!

- by Tony Davis

It is hard to believe that it was once possible to corrupt a SQL Server Database by storing perfectly normal data values into a table; but it is true. In SQL Server 2000 and before, one could inadvertently load invalid data values into certain data types via RPC calls or bulk insert methods rather than DML. In the particular case of the FLOAT data type, this meant that common 'special values' for this type, namely NaN (not-a-number) and +/- infinity, could be quite happily plugged into the database from an application and stored as 'out-of-range' values. This was like a time-bomb. When one then tried to query this data; the values were unsupported and so data pages containing them were flagged as being corrupt. Any query that needed to read a column containing the special value could fail or return unpredictable results. Microsoft even had to issue a hotfix to deal with failures in the automatic recovery process, caused by the presence of these NaN values, which rendered the whole database inaccessible! This problem is history for those of us on more current versions of SQL Server, but its ghost still haunts us. Recently, for example, a developer on Red Gate’s SQL Response team reported a strange problem when attempting to load historical monitoring data into a SQL Server 2005 database via the C# ADO.NET provider. The ratios used in some of their reporting calculations occasionally threw out NaN or infinity values, and the subsequent attempts to load these values resulted in a nasty error. It turns out to be a different manifestation of the same problem. SQL Server 2005 still does not fully support the IEEE 754 standard for floating point numbers, in that the FLOAT data type still cannot handle NaN or infinity values. Instead, they just added validation checks that prevent the 'invalid' values from being loaded in the first place. For people migrating from SQL Server 2000 databases that contained out-of-range FLOAT (or DATETIME etc.) data, to SQL Server 2005, Microsoft have added to the latter's version of the DBCC CHECKDB (or CHECKTABLE) command a DATA_PURITY clause. When enabled, this will seek out the corrupt data, but won’t fix it. You have to do this yourself in what can often be a slow, painful manual process. Our development team, after a quizzical shrug of the shoulders, simply decided to represent NaN and infinity values as NULL, and move on, accepting the minor inconvenience of not being able to tell them apart. However, what of scientific, engineering and other applications that really would like the luxury of being able to both store and access these perfectly-reasonable floating point data values? The sticking point seems to be the stipulation in the IEEE 754 standard that, when NaN is compared to any other value including itself, the answer is "unequal" (i.e. FALSE). This is clearly different from normal number comparisons and has repercussions for such things as indexing operations. Even so, this hardly applies to infinity values, which are single definite values. In fact, there is some encouraging talk in the Connect note on this issue that they might be supported 'in the SQL Server 2008 timeframe'. If didn't happen; SQL 2008 doesn't support NaN or infinity values, though one could be forgiven for thinking otherwise, based on the MSDN documentation for the FLOAT type, which states that "The behavior of float and real follows the IEEE 754 specification on approximate numeric data types". However, the truth is revealed in the XPath documentation, which states that "…float (53) is not exactly IEEE 754. For example, neither NaN (Not-a-Number) nor infinity is used…". Is it really so hard to fix this problem the right way, and properly support in SQL Server the IEEE 754 standard for the floating point data type, NaNs, infinities and all? Oracle seems to have managed it quite nicely with its BINARY_FLOAT and BINARY_DOUBLE types, so it is technically possible. We have an enterprise-class database that is marketed as being part of an 'integrated' Windows platform. Absurdly, we have .NET and XPath libraries that fully support the standard for floating point numbers, and we can't even properly store these values, let alone query them, in the SQL Server database! Cheers, Tony.

Read the article
It’s the thought that counts…

- by Tony Davis

I recently finished editing a book called Tribal SQL, and it was a fantastic experience. It’s a community-sourced book written by first-timers. Fifteen previously unpublished authors contributed one chapter each, with the seemingly simple remit to write about “what makes them passionate about working with SQL Server, something that all SQL Server DBAs and developers really need to know”. Sure, some of the writing skills were a bit rusty as one would expect from busy people, but the ideas and energy were sheer nectar. Any seasoned editor can deal easily with the problem of fixing the output of untrained writers. We can handle with the occasional technical error too, which is why we have technical reviewers. The editor’s real job is to hone the clarity and flow of ideas, making the author’s knowledge and experience accessible to as many others as possible. What the writer needs to bring, on the other hand, is enthusiasm, attention to detail, common sense, and a sense of the person behind the writing. If any of these are missing, no editor can fix it. We can see these essential characteristics in many of the more seasoned and widely-published writers about SQL. To illustrate what I mean by enthusiasm, or passion, take a look at the work of Laerte Junior or Fabiano Amorim. Both authors have English as a second language, but their energy, enthusiasm, sheer immersion in a technology and thirst to know more, drives them, with a little editorial help, to produce articles of far more practical value than one can find in the “manuals”. There’s the attention to detail of the likes of Jonathan Kehayias, or Paul Randal. Read their work and one begins to understand the knowledge coupled with incredible rigor, the willingness to bend and test every piece of advice offered to make sure it’s correct, that marks out the very best technical writing. There’s the common sense of someone like Louis Davidson. All writers, including Louis, like to stretch the grey matter of their readers, but some of the most valuable writing is that which takes a complicated idea, or distils years of experience, and expresses it in a way that sounds like simple common sense. There’s personality and humor. Contrary to what you may have been told, they can and do mix well with technical writing, as long as they don’t become a distraction. Read someone like Rodney Landrum, or Phil Factor, for numerous examples of articles that teach hard technical lessons but also make you smile at least twice along the way. Writing well is not easy and it takes a certain bravery to expose your ideas and knowledge for dissection by others, but it doesn’t mean that writing should be the preserve only of those trained in the art, or best left to the MVPs. I believe that Tribal SQL is testament to the fact that if you have passion for what you do, and really know your topic then, with a little editorial help, you can write, and people will learn from what you have to say. You can read a sample chapter, by Mark Rasmussen, in this issue of Simple-Talk and I hope you’ll consider checking out the book (if you needed any further encouragement, it’s also for a good cause, Computers4Africa). Cheers, Tony

Read the article
Monitoring the Application alongside SQL Server

- by Tony Davis

Sometimes, on Simple-Talk, it takes a while to spot strange and unexpected patterns of user activity, or small bugs. For example, one morning we spotted that an article’s comment count had leapt to 1485, but that only four were displayed. With some rooting around in Google Analytics, and the endlessly annoying Community Server admin-interface, we were able to work out that a few days previously the article had been subject to a spam attack and that the comment count was for some reason including both accepted and unaccepted comments (which in turn uncovered a bug in the SQL). This sort of incident made us a lot keener on monitoring Simple-talk website usage more effectively. However, the metrics we wanted are troublesome, because they are far too specific for Google Analytics to measure, and the SQL Server backend doesn’t keep sufficient information to enable us to plot trends. The latter could provide, for example, the total number of comments made on, or votes cast for, articles, over all time, but not the number that occur by hour over a set time. We lacked a baseline, in other words. We couldn’t alter the database, as it is a bought-in package. We had neither the resources nor inclination to build-in dedicated application monitoring. Possibly, we could investigate a third-party tool to do the job; but then it occurred to us that we were already using a monitoring tool (SQL Monitor) to keep an eye on the database. It stored data, made graphs and sent alerts. Could we get it to monitor some aspects of the application as well? Of course, SQL Monitor’s single purpose is to check and monitor SQL Server, over time, rather than to monitor applications that use SQL Server. However, how different is the business of gathering and plotting SQL Server Wait Stats, from gathering and plotting various aspects of user activity on the site? Not a lot, it turns out. The latest version allows us to write our own custom monitoring scripts, meaning that we could now monitor any metric in the application that returns an integer. It took little time to write a simple SQL Query that collects basic metrics of the total number of subscribers, votes cast, comments made, or views of articles, over time. The SQL Monitor database polls Simple-Talk every second or so in order to get the latest totals, and can then store and plot this information, or even correlate SQL Server usage to application usage. You can see the live data by visiting monitor.red-gate.com. Click the "Analysis" tab, and select one of the "Simple-talk:" entries in the "Show" box and an appropriate data range (e.g. last 30 days). It’s nascent, and we’re still working on it, but it’s already given us more confidence that we’ll spot quickly trends, bugs, or bursts of ‘abnormal’ activity. If there is a sudden rise in comments, we get an alert, and if it’s due to a spam attack, we can moderate or ban the perpetrator very quickly. We’ve often argued that a tool should perform a single job well rather than turn into a Swiss-army knife, but ironically we’ve rather appreciated being able to make best use of what’s there anyway for a slightly different purpose. Is this a good or common practice? What do you think? Cheers, Tony.

Read the article
A temporary disagreement

- by Tony Davis

Last month, Phil Factor caused a furore amongst some MVPs with an article that attempted to offer simple advice to developers regarding the use of table variables, versus local and global temporary tables, in their code. Phil makes clear that the table variables do come with some fairly major limitations.no distribution statistics, no parallel query plans for queries that modify table variables.but goes on to suggest that for reasonably small-scale strategic uses, and with a bit of due care and testing, table variables are a "good thing". Not everyone shares his opinion; in fact, I imagine he was rather aghast to learn that there were those felt his article was akin to pulling the pin out of a grenade and tossing it into the database; table variables should be avoided in almost all cases, according to their advice, in favour of temp tables. In other words, a fairly major feature of SQL Server should be more-or-less 'off limits' to developers. The problem with temp tables is that, because they are scoped either in the procedure or the connection, it is easy to allow them to hang around for too long, eating up precious memory and bulking up the shared tempdb database. Unless they are explicitly dropped, global temporary tables, and local temporary tables created within a connection rather than within a stored procedure, will persist until the connection is closed or, with connection pooling, until the connection is reused. It's also quite common with ASP.NET applications to have connection leaks, as Bill Vaughn explains in his chapter in the "SQL Server Deep Dives" book, meaning that the web page exits without closing the connection object, maybe due to an error condition. This will then hang around in the heap for what might be hours before picked up by the garbage collector. Table variables are much safer in this regard, since they are batch-scoped and so are cleaned up automatically once the batch is complete, which also means that they are intuitive to use for the developer because they conform to scoping rules that are closer to those in procedural code. On the surface then, an ideal way to deal with issues related to tempdb memory hogging. So why did Phil qualify his recommendation to use Table Variables? This is another of those cases where, like scalar UDFs and table-valued multi-statement UDFs, developers can sometimes get into trouble with a relatively benign-looking feature, due to way it's been implemented in SQL Server. Once again the biggest problem is how they are handled internally, by the SQL Server query optimizer, which can make very poor choices for JOIN orders and so on, in the absence of statistics, especially when joining to tables with highly-skewed data. The resulting execution plans can be horrible, as will be the resulting performance. If the JOIN is to a large table, that will hurt. Ideally, Microsoft would simply fix this issue so that developers can't get burned in this way; they've been around since SQL Server 2000, so Microsoft has had a bit of time to get it right. As I commented in regard to UDFs, when developers discover issues like with such standard features, the database becomes an alien planet to them, where death lurks around each corner, and they continue to avoid these "killer" features years after the problems have been eventually resolved. In the meantime, what is the right approach? Is it to say "hammers can kill, don't ever use hammers", or is it to try to explain, as Phil's article and follow-up blog post have tried to do, what the feature was intended for, why care must be applied in its use, and so enable developers to make properly-informed decisions, without requiring them to delve deep into the inner workings of SQL Server? Cheers, Tony.

Read the article
Inappropriate Updates?

- by Tony Davis

A recent Simple-talk article by Kathi Kellenberger dissected the fastest SQL solution, submitted by Peter Larsson as part of Phil Factor's SQL Speed Phreak challenge, to the classic "running total" problem. In its analysis of the code, the article re-ignited a heated debate regarding the techniques that should, and should not, be deemed acceptable in your search for fast SQL code. Peter's code for running total calculation uses a variation of a somewhat contentious technique, sometimes referred to as a "quirky update": SET @Subscribers = Subscribers = @Subscribers + PeopleJoined - PeopleLeft This form of the UPDATE statement, @variable = column = expression, is documented and it allows you to set a variable to the value returned by the expression. Microsoft does not guarantee the order in which rows are updated in this technique because, in relational theory, a table doesn’t have a natural order to its rows and the UPDATE statement has no means of specifying the order. Traditionally, in cases where a specific order is requires, such as for running aggregate calculations, programmers who used the technique have relied on the fact that the UPDATE statement, without the WHERE clause, is executed in the order imposed by the clustered index, or in heap order, if there isn’t one. Peter wasn’t satisfied with this, and so used the ingenious device of assuring the order of the UPDATE by the use of an "ordered CTE", based on an underlying temporary staging table (a heap). However, in either case, the ordering is still not guaranteed and, in addition, would be broken under conditions of parallelism, or partitioning. Many argue, with validity, that this reliance on a given order where none can ever be guaranteed is an abuse of basic relational principles, and so is a bad practice; perhaps even irresponsible. More importantly, Microsoft doesn't wish to support the technique and offers no guarantee that it will always work. If you put it into production and it breaks in a later version, you can't file a bug. As such, many believe that the technique should never be tolerated in a production system, under any circumstances. Is this attitude justified? After all, both forms of the technique, using a clustered index to guarantee the order or using an ordered CTE, have been tested rigorously and are proven to be robust; although not guaranteed by Microsoft, the ordering is reliable, provided none of the conditions that are known to break it are violated. In Peter's particular case, the technique is being applied to a temporary table, where the developer has full control of the data ordering, and indexing, and knows that the table will never be subject to parallelism or partitioning. It might be argued that, in such circumstances, the technique is not really "quirky" at all and to ban it from your systems would server no real purpose other than to deprive yourself of a reliable technique that has uses that extend well beyond the running total calculations. Of course, it is doubly important that such a technique, including its unsupported status and the assumptions that underpin its success, is fully and clearly documented, preferably even when posting it online in a competition or forum post. Ultimately, however, this technique has been available to programmers throughout the time Sybase and SQL Server has existed, and so cannot be lightly cast aside, even if one sympathises with Microsoft for the awkwardness of maintaining an archaic way of doing updates. After all, a Table hint could easily be devised that, if specified in the WITH (<Table_Hint_Limited>) clause, could be used to request the database engine to do the update in the conventional order. Then perhaps everyone would be satisfied. Cheers, Tony.

Read the article
Access Denied

- by Tony Davis

When Microsoft executives wake up in the night screaming, I suspect they are having a nightmare about their own version of Frankenstein's monster. Created with the best of intentions, without thinking too hard of the long-term strategy, and having long outlived its usefulness, the monster still lives on, occasionally wreaking vengeance on the innocent. Its name is Access; a living synthesis of disparate body parts that is resistant to all attempts at a mercy-killing. In 1986, Microsoft had no database products, and needed one for their new OS/2 operating system, the successor to MSDOS. In 1986, they bought exclusive rights to Sybase DataServer, and were also intent on developing a desktop database to capture Ashton-Tate's dominance of that market, with dbase. This project, first called 'Omega' and later 'Cirrus', eventually spawned two products: Visual Basic in 1991 and Access in late 1992. Whereas Visual Basic battled with PowerBuilder for dominance in the client-server market, Access easily won the desktop database battle, with Dbase III and DataEase falling away. Access did an excellent job of abstracting and simplifying the task of building small database applications in a short amount of time, for a small number of departmental users, and often for a transient requirement. There is an excellent front end and forms generator. We not only see it in Access but parts of it also reappear in SSMS. It's good. A business user can pull together useful reports, without relying on extensive technical support. A skilled Access programmer can deliver a fairly sophisticated application, whilst the traditional client-server programmer is still sharpening his pencil. Even for the SQL Server programmer, the forms generator of Access is useful for sketching out application designs. So far, so good, but here's where the problems start; Access ties together two different products and the backend of Access is the bugbear. The limitations of Jet/ACE are well-known and documented. They range from MDB files that are prone to corruption, especially as they grow in size, pathetic security, and "copy and paste" Backups. The biggest problem though, was an infamous lack of scalability. Because Microsoft never realized how long the product would last, they put little energy into improving the beast. Microsoft 'ate their own dog food' by using Access for Microsoft Exchange and Outlook. They choked on it. For years, scalability and performance problems with Exchange Server have been laid at the door of the Jet Blue engine on which it relies. Substantial development work in Exchange 2010 was required, just in order to improve the engine and storage schema so that it more efficiently handled the reading and writing of mails. The alternative of using SQL Server just never panned out. The Jet engine was designed to limit concurrent users to a small number (10-20). When Access applications outgrew this, bitter experience proved that there really is no easy upgrade path from Access to SQL Server, beyond rewriting the whole lot from scratch. The various initiatives to do this never quite bridged the cultural gulf between Access and a true relational database So, what are the obvious alternatives for small, strategic database applications? I know many users who, for simple 'list maintenance' requirements are very happy using Excel databases. Surely, now that PowerPivot has led the way, it is time for Microsoft to offer a new RAD package for database application development; namely an Excel-based front end for SQL Server Express. In that way, we'll have a powerful and familiar front end, to a scalable database, and a clear upgrade path when an app takes off and needs to go enterprise. Cheers, Tony.

Read the article
DAC pack up all your troubles

- by Tony Davis

Visual Studio 2010, or perhaps its apparently-forthcoming sister, "SQL Studio", is being geared up to become the natural way for developers to create databases. Central to this drive is the introduction of 'data-tier application components', or DACs. Applications are developed as normal but when it comes to deployment, instead of supplying the DBA with a bunch of scripts to create the required database objects, the developer creates a single DAC Package ("DAC Pack"); a zipped XML file containing all the database objects needed by the application, along with versioning information, policies for deployment, and so on. It's an intriguing prospect. Developers can work on their development database using their existing tools and source control, and then package up the changes into a single DACPAC for deployment and management. DBAs get an "application level view" of how their instances are being used and the ability to collectively, rather than individually, manage the objects. The DBA needing to manage a large number of relatively small databases can use "DAC snapshots" to get a quick overview of what has changed across all the databases they manage. The reason that DAC packs haven't caused more excitement is that they can only be pushed to SQL Server 2008 R2, and they must be developed or inspected using Visual Studio 2010. Furthermore, what we see right now in VS2010 is more of a 'work-in-progress' or 'vision of the future', with serious shortcomings and restrictions that render it unsuitable for anything but small 'non-critical' departmental databases. The first problem is that DAC packs support a limited set of schema objects (corresponding closely to the features available on 'Azure'). This means that Service Broker queues, CLR Objects, and perhaps most critically security (permissions, certificates etc.), are off-limits. Applications that require these objects will need to add them via a post-deployment TSQL script, rather defeating the whole idea. More worrying still is the process for altering a database with a DAC pack. The grand 'collective' philosophy, whereby a single XML file can be used for deploying and managing builds and changes, extends, unfortunately, to database upgrades. Any change to a database object will result in the creation of a new database, copying the data from the old version, nuking the previous one, and then renaming the new one. Simple eh? The problem is that even something as trivial as adding a comment to a stored procedure in a 5GB database will require the server to find at least twice as much space, as well sufficient elbow-room in the transaction log for copying the largest table. Of course, you'll need to take the database offline for the full course of the deployment, which is likely to take a long time if there is a lot of data. This upgrade/rename process breaks the log chain, makes any subsequent full restore operation highly complicated, and will also break log shipping. As with any grand vision, the devil is always in the detail. It's hard to fathom why Microsoft hasn't used a SQL Compare-style approach to the upgrade process, altering a database with a change script, and this will surely be adopted in the near future. Something had to be in place for VS2010, but right now DAC packs only make sense for Azure. For this, they're cute, but hardly compelling. Nevertheless, DBAs would do well to get familiar with VS 2010 and DAC packs. Like it or not, they're both coming. Cheers, Tony.

Read the article
Sweet and Sour Source Control

- by Tony Davis

Most database developers don't use Source Control. A recent anonymous poll on SQL Server Central asked its readers "Which Version Control system do you currently use to store you database scripts?" The winner, with almost 30% of the vote was...none: "We don't use source control for database scripts". In second place with almost 28% of the vote was Microsoft's VSS. VSS? Given its reputation for being buggy, unstable and lacking most of the basic features required of a proper source control system, answering VSS is really just another way of saying "I don't use Source Control". At first glance, it's a surprising thought. You wonder how database developers can work in a team and find out what changed, when the system worked before but is now broken; to work out what happened to their changes that now seem to have vanished; to roll-back a mistake quickly so that the rest of the team have a functioning build; to find instantly whether a suspect change has been deployed to production. Unfortunately, the survey didn't ask about the scale of the database development, and correlate the two questions. If there is only one database developer within a schema, who has an automated approach to regular generation of build scripts, then the need for a formal source control system is questionable. After all, a database stores far more about its metadata than a traditional compiled application. However, what is meat for a small development is poison for a team-based development. Here, we need a form of Source Control that can reconcile simultaneous changes, store the history of changes, derive versions and builds and that can cope with forks and merges. The problem comes when one borrows a solution that was designed for conventional programming. A database is not thought of as a "file", but a vast, interdependent and intricate matrix of tables, indexes, constraints, triggers, enumerations, static data and so on, all subtly interconnected. It is an awkward fit. Subversion with its support for merges and forks, and the tolerance of different work practices, can be made to work well, if used carefully. It has a standards-based architecture that allows it to be used on all platforms such as Windows Mac, and Linux. In the words of Erland Sommerskog, developers should "just do it". What's in a database is akin to a "binary file", and the developer must work only from the file. You check out the file, edit it, and save it to disk to compile it. Dependencies are validated at this point and if you've broken anything (e.g. you renamed a column and broke all the objects that reference the column), you'll find out about it right away, and you'll be forced to fix it. Nevertheless, for many this is an alien way of working with SQL Server. Subversion is the powerhouse, not the GUI. It doesn't work seamlessly with your existing IDE, and that usually means SSMS. So the question then becomes more subtle. Would developers be less reluctant to use a fully-featured source (revision) control system for a team database development if they had a turn-key, reliable system that fitted in with their existing work-practices? I'd love to hear what you think. Cheers, Tony.

Read the article
Cloud Backup: Getting the Users' Backs Up

- by Tony Davis

On Wednesday last week, Microsoft announced that as of July 1, all data transfers into its Microsoft Azure cloud will be free (though you have to pay for transferring data out). On Thursday last week, SQL Azure in Western Europe went down. It was a relatively short outage, but since SQL Azure currently provides no easy way to take a standard backup of a database and store it locally, many people had no recourse but to wait patiently for their cloud-based app to resume. It seems that Microsoft are very keen encourage developers to move their data onto their cloud, but are developers ready to do it, given that such basic backup capabilities are lacking? Recently on Simple-Talk, Mike Mooney described a perfect use case for the Microsoft Cloud. They had a simple web-based application with a SQL Server backend; they could move the application to Windows Azure, and the data into SQL Azure and in the process free themselves from much of the hassle surrounding management and scaling of the hardware, network and so on. It was a great fit and yet it nearly didn't happen; lack of support for the BACKUP command almost proved a show-stopper. Of course, backups of Azure databases are always and have always been taken automatically, for disaster recovery purposes, but these are strictly on-cloud copies and as of now it is not possible to use them to them to restore a database to a particular point in time. It seems that none of those clever Microsoft people managed to predict the need to perform basic backups of Azure databases so that copies could be stored locally, outside the Azure universe. At the very least, as Mike points out, performing a local backup before a new deployment is more or less mandatory. Microsoft did at least note the sound of gnashing teeth and, as a stop-gap measure, offered SQL Azure Database Copy which basically allows you to create an online clone of your database, but this doesn't allow for storing local archives of the data. To that end MS has provided SQL Azure Import/Export, to package up and export a database and its data, using BACPACs. These BACPACs do not guarantee transactional consistency; for example, if a child table is modified after the parent is copied, then the copied database will be in inconsistent state (meaning, to add to the fun, BACPACs need to be created from a database copy). In any event, widespread problems with BACPAC's evil cousin, the DACPAC have been well-documented, and it seems likely that many will also give BACPAC the bum's rush. Finally, in a TechEd 2011 presentation tagged "SQL Azure Advanced Administration", it was announced that "backup and restore" were coming in the next SQL Azure CTP. And yet this still doesn't mean that we'll get simple backups as DBAs know and love them. What it does mean, at least, is the ability to restore any given database to a point in time within a 2-week window. For the time being, if you want a local copy of your data and don't want to brave the BACPAC, one is left with SSIS or BCP, creative use of schema and data comparison tools, or use of SQL Azure Backup (currently in beta) in order to perform this simple but vital task. Cheers, Tony.

Read the article
Going by the eBook

- by Tony Davis

The book and magazine publishing world is rapidly going digital, and the industry is faced with making drastic changes to their ways of doing business. The sudden take-up of digital readers by the book-buying public has surprised even the most technological-savvy of the industry. Printed books just aren't selling like they did. In contrast, eBooks are doing well. The ePub file format is the standard around which all publishers are converging. ePub is a standard for formatting book content, so that it can be reflowed for various devices, with their widely differing screen-sizes, and can be read offline. If you unzip an ePub file, you'll find familiar formats such as XML, XHTML and CSS. This is both a blessing and a curse. Whilst it is good to be able to use familiar technologies that have been developed to a level of considerable sophistication, it doesn't get us all the way to producing a viable publication. XHTML is a page-description language, not a book-description language, as we soon found out during our initial experiments, when trying to specify headers, footers, indexes and chaptering. As a result, it is difficult to predict how any particular eBook application will decide to render a book. There isn't even a consensus as to how the cover image is specified. All of this is awkward for the publisher. Each book must be created and revised in a form from which can be generated a whole range of 'printed media', from print books, to Mobi for kindles, ePub for most Tablets and SmartPhones, HTML for excerpted chapters on websites, and a plethora of other formats for other eBook readers, each with its own idiosyncrasies. In theory, if we can get our content into a clean, semantic XML form, such as DOCBOOKS, we can, from there, after every revision, perform a series of relatively simple XSLT transformations to output anything from a HTML article, to an ePub file for reading on an iPad, to an ICML file (an XML-based file format supported by the InDesign tool), ready for print publication. As always, however, the task looks bigger the closer you get to the detail. On the way to the utopian world of an XML-based book format that encompasses all the diverse requirements of the different publication media, ePub looks like a reasonable format to adopt. Its forthcoming support for HTML 5 and CSS 3, with ePub 3.0, means that features, such as widow-and-orphan controls, multi-column flow and multi-media graphics can be incorporated into eBooks. This starts to make it possible to build an "app-like" experience into the eBook and to free publishers to think of putting context before container; to think of what content is required, be it graphical, textual or audio, from the point of view of the user, rather than what's possible in a given, traditional book "Container". In the meantime, there is a gap between what publishers require and what current technology can provide and, of course building this app-like experience is far from plain sailing. Real portability between devices is still a big challenge, and achieving the sort of wizardry seen in the likes of Theodore Grey's "Elements" eBook will require some serious device-specific programming skills. Cheers, Tony.

Read the article
SQL Server Optimizer Malfunction?

- by Tony Davis

There was a sharp intake of breath from the audience when Adam Machanic declared the SQL Server optimizer to be essentially "stuck in 1997". It was during his fascinating "Query Tuning Mastery: Manhandling Parallelism" session at the recent PASS SQL Summit. Paraphrasing somewhat, Adam (blog | @AdamMachanic) offered a convincing argument that the optimizer often delivers flawed plans based on assumptions that are no longer valid with today’s hardware. In 1997, when Microsoft engineers re-designed the database engine for SQL Server 7.0, SQL Server got its initial implementation of a cost-based optimizer. Up to SQL Server 2000, the developer often had to deploy a steady stream of hints in SQL statements to combat the occasionally wilful plan choices made by the optimizer. However, with each successive release, the optimizer has evolved and improved in its decision-making. It is still prone to the occasional stumble when we tackle difficult problems, join large numbers of tables, perform complex aggregations, and so on, but for most of us, most of the time, the optimizer purrs along efficiently in the background. Adam, however, challenged further any assumption that the current optimizer is competent at providing the most efficient plans for our more complex analytical queries, and in particular of offering up correctly parallelized plans. He painted a picture of a present where complex analytical queries have become ever more prevalent; where disk IO is ever faster so that reads from disk come into buffer cache faster than ever; where the improving RAM-to-data ratio means that we have a better chance of finding our data in cache. Most importantly, we have more CPUs at our disposal than ever before. To get these queries to perform, we not only need to have the right indexes, but also to be able to split the data up into subsets and spread its processing evenly across all these available CPUs. Improvements such as support for ColumnStore indexes are taking things in the right direction, but, unfortunately, deficiencies in the current Optimizer mean that SQL Server is yet to be able to exploit properly all those extra CPUs. Adam’s contention was that the current optimizer uses essentially the same costing model for many of its core operations as it did back in the days of SQL Server 7, based on assumptions that are no longer valid. One example he gave was a "slow disk" bias that may have been valid back in 1997 but certainly is not on modern disk systems. Essentially, the optimizer assesses the relative cost of serial versus parallel plans based on the assumption that there is no IO cost benefit from parallelization, only CPU. It assumes that a single request will saturate the IO channel, and so a query would not run any faster if we parallelized IO because the disk system simply wouldn’t be able to handle the extra pressure. As such, the optimizer often decides that a serial plan is lower cost, often in cases where a parallel plan would improve performance dramatically. It was challenging and thought provoking stuff, as were his techniques for driving parallelism through query logic based on subsets of rows that define the "grain" of the query. I highly recommend you catch the session if you missed it. I’m interested to hear though, when and how often people feel the force of the optimizer’s shortcomings. Barring mistakes, such as stale statistics, how often do you feel the Optimizer fails to find the plan you think it should, and what are the most common causes? Is it fighting to induce it toward parallelism? Combating unexpected plans, arising from table partitioning? Something altogether more prosaic? Cheers, Tony.

Read the article
Head in the Clouds

- by Tony Davis

We're just past the second anniversary of the launch of Windows Azure. A couple of years' experience with Azure in the industry has provided some obvious success stories, but has deflated some of the initial marketing hyperbole. As a general principle, Azure seems to work well in providing a Service-Oriented Architecture for services in enterprises that suffer wide fluctuations in demand. Instead of being obliged to provide hardware sufficient for the occasional peaks in demand, one can hire capacity only when it is needed, and the cost of hosting an application is no longer a capital cost. It enables companies to avoid having to scale out hardware for peak periods only to see it underused for the rest of the time. A customer-facing application such as a concert ticketing system, which suffers high demand in short, predictable bursts of activity, is a great example of an application that would work well in Azure. However, moving existing applications to Azure isn't something to be done on impulse. Unless your application is .NET-based, and consists of 'stateless' components that communicate via queues, you are probably in for a lot of redevelopment work. It makes most sense for IT departments who are already deep in this .NET mindset, and who also want 'grown-up' methods of staging, testing, and deployment. Azure fits well with this culture and offers, as a bonus, good Visual Studio integration. The most-commonly stated barrier to porting these applications to Azure is the problem of reconciling the use of the cloud with legislation for data privacy and security. Putting databases in the cloud is a sticky issue for many and impossible for some due to compliance and security issues, the need for direct control over data, and so on. In the face of feedback from the early adopters of Azure, Microsoft has broadened the architectural choices to cater for a wide range of requirements. As well as SQL Azure Database (SAD) and Azure storage, the unstructured 'BLOB and Entity-Attribute-Value' NoSQL storage alternative (which equates more closely with folders and files than a database), Windows Azure offers a wide range of storage options including use of services such as oData: developers who are programming for Windows Azure can simply choose the one most appropriate for their needs. Secondly, and crucially, the Windows Azure architecture allows you the freedom to produce hybrid applications, where only those parts that need cloud-based hosting are deployed to Azure, whereas those parts that must unavoidably be hosted in a corporate datacenter can stay there. By using a hybrid architecture, it will seldom, if ever, be necessary to move an entire application to the cloud, along with personal and financial data. For example that we could port to Azure only put those parts of our ticketing application that capture and process tickets orders. Once an order is captured, the financial side can be processed in our own data center. In short, Windows Azure seems to be a very effective way of providing services that are subject to wide but predictable fluctuations in demand. Have you come to the same conclusions, or do you think I've got it wrong? If you've had experience with Azure, would you recommend it? It would be great to hear from you. Cheers, Tony.

Read the article
The long road to bug-free software

- by Tony Davis

The past decade has seen a burgeoning interest in functional programming languages such as Haskell or, in the Microsoft world, F#. Though still on the periphery of mainstream programming, functional programming concepts are gradually seeping into the imperative C# language (for example, Lambda expressions have their root in functional programming). One of the more interesting concepts from functional programming languages is the use of formal methods, the lofty ideal behind which is bug-free software. The idea is that we write a specification that describes exactly how our function (say) should behave. We then prove that our function conforms to it, and in doing so have proved beyond any doubt that it is free from bugs. All programmers already use one form of specification, specifically their programming language's type system. If a value has a specific type then, in a type-safe language, the compiler guarantees that value cannot be an instance of a different type. Many extensions to existing type systems, such as generics in Java and .NET, extend the range of programs that can be type-checked. Unfortunately, type systems can only prevent some bugs. To take a classic problem of retrieving an index value from an array, since the type system doesn't specify the length of the array, the compiler has no way of knowing that a request for the "value of index 4" from an array of only two elements is "unsafe". We restore safety via exception handling, but the ideal type system will prevent us from doing anything that is unsafe in the first place and this is where we start to borrow ideas from a language such as Haskell, with its concept of "dependent types". If the type of an array includes its length, we can ensure that any index accesses into the array are valid. The problem is that we now need to carry around the length of arrays and the values of indices throughout our code so that it can be type-checked. In general, writing the specification to prove a positive property, even for a problem very amenable to specification, such as a simple sorting algorithm, turns out to be very hard and the specification will be different for every program. Extend this to writing a specification for, say, Microsoft Word and we can see that the specification would end up being no simpler, and therefore no less buggy, than the implementation. Fortunately, it is easier to write a specification that proves that a program doesn't have certain, specific and undesirable properties, such as infinite loops or accesses to the wrong bit of memory. If we can write the specifications to prove that a program is immune to such problems, we could reuse them in many places. The problem is the lack of specification "provers" that can do this without a lot of manual intervention (i.e. hints from the programmer). All this might feel a very long way off, but computing power and our understanding of the theory of "provers" advances quickly, and Microsoft is doing some of it already. Via their Terminator research project they have started to prove that their device drivers will always terminate, and in so doing have suddenly eliminated a vast range of possible bugs. This is a huge step forward from saying, "we've tested it lots and it seems fine". What do you think? What might be good targets for specification and verification? SQL could be one: the cost of a bug in SQL Server is quite high given how many important systems rely on it, so there's a good incentive to eliminate bugs, even at high initial cost. [Many thanks to Mike Williamson for guidance and useful conversations during the writing of this piece] Cheers, Tony.

Read the article
The long road to bug-free software

- by Tony Davis

The past decade has seen a burgeoning interest in functional programming languages such as Haskell or, in the Microsoft world, F#. Though still on the periphery of mainstream programming, functional programming concepts are gradually seeping into the imperative C# language (for example, Lambda expressions have their root in functional programming). One of the more interesting concepts from functional programming languages is the use of formal methods, the lofty ideal behind which is bug-free software. The idea is that we write a specification that describes exactly how our function (say) should behave. We then prove that our function conforms to it, and in doing so have proved beyond any doubt that it is free from bugs. All programmers already use one form of specification, specifically their programming language's type system. If a value has a specific type then, in a type-safe language, the compiler guarantees that value cannot be an instance of a different type. Many extensions to existing type systems, such as generics in Java and .NET, extend the range of programs that can be type-checked. Unfortunately, type systems can only prevent some bugs. To take a classic problem of retrieving an index value from an array, since the type system doesn't specify the length of the array, the compiler has no way of knowing that a request for the "value of index 4" from an array of only two elements is "unsafe". We restore safety via exception handling, but the ideal type system will prevent us from doing anything that is unsafe in the first place and this is where we start to borrow ideas from a language such as Haskell, with its concept of "dependent types". If the type of an array includes its length, we can ensure that any index accesses into the array are valid. The problem is that we now need to carry around the length of arrays and the values of indices throughout our code so that it can be type-checked. In general, writing the specification to prove a positive property, even for a problem very amenable to specification, such as a simple sorting algorithm, turns out to be very hard and the specification will be different for every program. Extend this to writing a specification for, say, Microsoft Word and we can see that the specification would end up being no simpler, and therefore no less buggy, than the implementation. Fortunately, it is easier to write a specification that proves that a program doesn't have certain, specific and undesirable properties, such as infinite loops or accesses to the wrong bit of memory. If we can write the specifications to prove that a program is immune to such problems, we could reuse them in many places. The problem is the lack of specification "provers" that can do this without a lot of manual intervention (i.e. hints from the programmer). All this might feel a very long way off, but computing power and our understanding of the theory of "provers" advances quickly, and Microsoft is doing some of it already. Via their Terminator research project they have started to prove that their device drivers will always terminate, and in so doing have suddenly eliminated a vast range of possible bugs. This is a huge step forward from saying, "we've tested it lots and it seems fine". What do you think? What might be good targets for specification and verification? SQL could be one: the cost of a bug in SQL Server is quite high given how many important systems rely on it, so there's a good incentive to eliminate bugs, even at high initial cost. [Many thanks to Mike Williamson for guidance and useful conversations during the writing of this piece] Cheers, Tony.

Read the article
Concurrent Affairs

- by Tony Davis

I once wrote an editorial, multi-core mania, on the conundrum of ever-increasing numbers of processor cores, but without the concurrent programming techniques to get anywhere near exploiting their performance potential. I came to the.controversial.conclusion that, while the problem loomed for all procedural languages, it was not a big issue for the vast majority of programmers. Two years later, I still think most programmers don't concern themselves overly with this issue, but I do think that's a bigger problem than I originally implied. Firstly, is the performance boost from writing code that can fully exploit all available cores worth the cost of the additional programming complexity? Right now, with quad-core processors that, at best, can make our programs four times faster, the answer is still no for many applications. But what happens in a few years, as the number of cores grows to 100 or even 1000? At this point, it becomes very hard to ignore the potential gains from exploiting concurrency. Possibly, I was optimistic to assume that, by the time we have 100-core processors, and most applications really needed to exploit them, some technology would be around to allow us to do so with relative ease. The ideal solution would be one that allows programmers to forget about the problem, in much the same way that garbage collection removed the need to worry too much about memory allocation. From all I can find on the topic, though, there is only a remote likelihood that we'll ever have a compiler that takes a program written in a single-threaded style and "auto-magically" converts it into an efficient, correct, multi-threaded program. At the same time, it seems clear that what is currently the most common solution, multi-threaded programming with shared memory, is unsustainable. As soon as a piece of state can be changed by a different thread of execution, the potential number of execution paths through your program grows exponentially with the number of threads. If you have two threads, each executing n instructions, then there are 2^n possible "interleavings" of those instructions. Of course, many of those interleavings will have identical behavior, but several won't. Not only does this make understanding how a program works an order of magnitude harder, but it will also result in irreproducible, non-deterministic, bugs. And of course, the problem will be many times worse when you have a hundred or a thousand threads. So what is the answer? All of the possible alternatives require a change in the way we write programs and, currently, seem to be plagued by performance issues. Software transactional memory (STM) applies the ideas of database transactions, and optimistic concurrency control, to memory. However, working out how to break down your program into sufficiently small transactions, so as to avoid contention issues, isn't easy. Another approach is concurrency with actors, where instead of having threads share memory, each thread runs in complete isolation, and communicates with others by passing messages. It simplifies concurrent programs but still has performance issues, if the threads need to operate on the same large piece of data. There are doubtless other possible solutions that I haven't mentioned, and I would love to know to what extent you, as a developer, are considering the problem of multi-core concurrency, what solution you currently favor, and why. Cheers, Tony.

Read the article

Search Results

Search found 590 results on 24 pages for 'tony ennis'.

Page 2/24 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Martin

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

- by Tony Davis

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >