Search Results

Search found 615 results on 25 pages for 'sean farley'.

Page 2/25 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

When is a SQL function not a function?

- by Rob Farley

Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function. SELECT GETDATE(), SomeDatetimeColumn FROM dbo.SomeTable; There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on. But it’s rubbish. Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”) I’m going to consider the three main types of user-defined functions in SQL Server: Scalar Inline Table-Valued Multi-statement Table-Valued I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions. Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’. If it’s not inline, it’s rubbish. It really is. Let’s start by considering the two kinds of table-valued function, and compare them. These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database. CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int) RETURNS TABLE AS RETURN ( SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ) ; GO CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int) RETURNS @results TABLE ( EmployeeLogin nvarchar(512), OrderDate datetime, SalesOrderID int ) AS BEGIN INSERT @results (EmployeeLogin, OrderDate, SalesOrderID) SELECT e.LoginID, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ; RETURN END ; GO You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter. Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on. Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column. Trickery. Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing... To see what’s actually going on, let’s look at the Estimated plan. Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires. And the cost of the TVF is 57% of the batch! But it gets worse. Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column. Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed. A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic. But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation. The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog. Now how about scalar functions... Suppose we wanted a scalar function to return the count of these. CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int) RETURNS int AS BEGIN RETURN ( SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); END ; GO Notice the evil words? They’re required. Try to remove them, you just get an error. That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions. Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan: SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on. If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999. If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged. You see, this 0.0337999 is the cost of running the scalar function ONCE. But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years. And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here. The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here. Let’s look at a better way. Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this: SELECT e.LoginID, y.year, (SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = p.SalesPersonID AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101') AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101') ) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this). Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly. To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok. CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int) RETURNS table AS RETURN (SELECT COUNT(*) as NumSales FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); GO But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator. SELECT e.LoginID, y.year, n.NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n; And now, we get the plan that we want for this query. All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way. It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions). So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it. @rob_farley

Read the article
My new laptop - with a really nice battery option

- by Rob Farley

It was about time I got a new laptop, and so I made a phone-call to Dell to discuss my options. I decided not to get an SSD from them, because I’d rather choose one myself – the sales guy tells me that changing the HD doesn’t void my warranty, so that’s good (incidentally, I’d love to hear people’s recommendations for which SSD to get for my laptop). Unfortunately this machine only has one HD slot, but I figure that I’ll put lots of stuff onto external disks anyway. The machine I got was a Dell Studio XPS 16. It’s red (which suits my company), but also has the Intel® Core™ i7-820QM Processor, which is 4 Cores/8 Threads. Makes for a pretty Task Manager, but nothing like the one I saw at SQLBits last year (at 96 cores), or the one that my good friend James Rowland-Jones writes about here. But the reason for this post is actually something in the software that comes with the machine – you know, the stuff that most people uninstall at the earliest opportunity. I had just reinstalled the operating system, and was going through the utilities to get the drivers up-to-date, when I noticed that one of Dell applications included an option to disable battery charging. So I installed it. And sure enough, I can tell the battery not to charge now. Clearly Dell see it as a temporary option, and one that’s designed for when you’re on a plane. But for me, I most often use my laptop with the power plugged in, which means I don’t need to have my battery continually topping itself up. So I really love this option, but I feel like it could go a little further. I’d like “Not Charging” to be the default option, and let me set it when I want to charge it (which should theoretically make my battery last longer). I also intend to work out how this option works, so that I can script it and put it into my StartUp options (so it can be the Default setting). Actually – if someone has already worked this out and can tell me what it does, then please feel free to let me know. Even better would be an external switch. I had a switch on my old laptop (a Dell Latitude) for WiFi, so that I could turn that off before I turned on the computer (this laptop doesn’t give me that option – no physical switch for flight mode). I guess it just means I’ll get used to leaving the WiFi off by default, and turning it on when I want it – might save myself some battery power that way too. Soon I’ll need to take the plunge and sync my iPhone with the new laptop. I’m a little worried that I might lose something – Apple’s messages about how my stuff will be wiped and replaced with what’s on the PC doesn’t fill me with confidence, as it’s a new PC that doesn’t have stuff on it. But having a new machine is definitely a nice experience, and one that I can recommend. I’m sure when I get around to buying an SSD I’ll feel like it’s shiny and new all over again! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Collation errors in business

- by Rob Farley

At the PASS Summit last month, I did a set (Lightning Talk) about collation, and in particular, the difference between the “English” spoken by people from the US, Australia and the UK. One of the examples I gave was that in the US drivers might stop for gas, whereas in Australia, they just open the window a little. This is what’s known as a paraprosdokian, where you suddenly realise you misunderstood the first part of the sentence, based on what was said in the second. My current favourite is Emo Phillip’s line “I like to play chess with old men in the park, but it can be hard to find thirty-two of them.” Essentially, this a collation error, one that good comedians can get mileage from. Unfortunately, collation is at its worst when we have a computer comparing two things in different collations. They might look the same, and sound the same, but if one of the things is in SQL English, and the other one is in Windows English, the poor database server (with no sense of humour) will get suspicious of developers (who all have senses of humour, obviously), and declare a collation error, worried that it might not realise some nuance of the language. One example is the common scenario of a case-sensitive collation and a case-insensitive one. One may think that “Rob” and “rob” are the same, but the other might not. Clearly one of them is my name, and the other is a verb which means to steal (people called “Nick” have the same problem, of course), but I have no idea whether “Rob” and “rob” should be considered the same or not – it depends on the collation. I told a lie before – collation isn’t at its worst in the computer world, because the computer has the sense to complain about the collation issue. People don’t. People will say something, with their own understanding of what they mean. Other people will listen, and apply their own collation to it. I remember when someone was asking me about a situation which had annoyed me. They asked if I was ‘pissed’, and I said yes. I meant that I was annoyed, but they were asking if I’d been drinking. It took a moment for us to realise the misunderstanding. In business, the problem is escalated. A business user may explain something in a particular way, using terminology that they understand, but using words that mean something else to a technical person. I remember a situation with a checkbox on a form (back in VB6 days from memory). It was used to indicate that something was approved, and indicated whether a particular database field should store True or False – nothing more. However, the client understood it to mean that an entire workflow system would be implemented, with different users have permission to approve items and more. The project manager I’d just taken over from clearly hadn’t appreciated that, and I faced a situation of explaining the misunderstanding to the client. Lots of fun... Collation errors aren’t just a database setting that you can ignore. You need to remember that Americans speak a different type of English to Aussies and Poms, and techies speak a different language to their clients.

Read the article
Slide-decks from recent Adelaide SQL Server UG meetings

- by Rob Farley

The UK has been well represented this summer at the Adelaide SQL Server User Group, with presentations from Chris Testa-O’Neill (isn’t that the right link? Maybe try this one) and Martin Cairney. The slides are available here and here. I thought I’d particularly mention Martin’s, and how it’s relevant to this month’s T-SQL Tuesday. Martin spoke about Policy-Based Management and the Enterprise Policy Management Framework – something which is remarkably under-used, and yet which can really impact your ability to look after environments. If you have policies set up, then you can easily test each of your SQL instances to see if they are still satisfying a set of policies as defined. Automation (the topic of this month’s T-SQL Tuesday) should mean that your life is made easier, thereby enabling to you to do more. It shouldn’t remove the human element, but should remove (most of) the human errors. People still need to manage the situation, and work out what needs to be done, etc. We haven’t reached a point where computers can replace people, but they are very good at replace the mundaneness and monotony of our jobs. They’ve made our lives more interesting (although many would rightly argue that they have also made our lives more complex) by letting us focus on the stuff that changes. Martin named his talk Put Your Feet Up, which nicely expresses the fact that managing systems shouldn’t be about running around checking things all the time. It must be about having systems in place which tell you when things aren’t going well. It’s never quite as simple as being able to actually put your feet up, but certainly no system should require constant attention. It’s definitely a policy we at LobsterPot adhere to, whether it’s an alert to let us know that an ETL package has run successfully, or a script that generates some code for a report. If things can be automated, it reduces the chance of error, reduces the repetitive nature of work, and in general, keeps both consultants and clients much happier.

Read the article
PASS Summit Feedback

- by Rob Farley

PASS Feedback came in last week. I also saw my dentist for some fillings... At the PASS Summit this year, I delivered a couple of regular sessions and a Lightning Talk. People told me they enjoyed it, but when the rankings came out, they showed that I didn’t score particularly well. Brent Ozar was keen to discuss it with me. Brent: PASS speaker feedback is out. You did two sessions and a Lightning Talk. How did you go? Rob: Not so well actually, thanks for asking. Brent: Ha! Sorry. Of course you know that's why I wanted to discuss this with you. I was in one of your sessions at SQLBits in the UK a month before PASS, and I thought you rocked. You've got a really good and distinctive delivery style. Then I noticed your talks were ranked in the bottom quarter of the Summit ratings and wanted to discuss it. Rob: Yeah, I know. You did ask me if we could do this... I should explain – my presentation style is not the stereotypical IT conference one. I throw in jokes, and try to engage the audience thoroughly. I find many talks amazingly dry, and I guess I try to buck that trend. I also run training courses, and find that I get a lot of feedback from people thanking me for keeping things interesting. That said, I also get feedback criticising me for my style, and that’s basically what’s happened here. For the rest of this discussion, let’s focus on my talk about the Incredible Shrinking Execution Plan, which I considered to be my main talk. Brent: I thought that session title was the very best one at the entire Summit, and I had it on my recommended sessions list. In four words, you managed to sum up the topic and your sense of humor. I read that and immediately thought, "People need to be in this session," and then it didn't score well. Tell me about your scores. Rob: The questions on the feedback form covered the usefulness of the information, the speaker’s presentation skills, their knowledge of the subject, how well the session was described, the amount of time allocated, and the quality of the presentation materials. Brent: Presentation materials? But you don’t do slides. Did they rate your thong? Rob: No-one saw my flip-flops in this talk, Brent. I created a script in Management Studio, and published that afterwards, but I think people will have scored that question based on the lack of slides. I wasn’t expecting to do particularly well on that one. That was the only section that didn’t have 5/5 as the most popular score. Brent: See, that sucks, because cookbook-style scripts are often some of my favorites. Adam Machanic's Service Broker workbench series helped me immensely when I was prepping for the MCM. As an attendee, I'd rather have a commented script than a slide deck. So how did you rank so low? Rob: When I look at the scores that you got (based on your blog post), you got very few scores below 3 – people that felt strong enough about your talk to post a negative score. In my scores, between 5% and 10% were below 3 (except on the question about whether I knew my stuff – I guess I came as knowledgeable). Brent: Wow – so quite a few people really didn’t like your talk then? Rob: Yeah. Mind you, based on the comments, some people really loved it. I’d like to think that there would be a certain portion of the room who may have rated the talk as one of the best of the conference. Some of my comments included “amazing!”, “Best presentation so far!”, “Wow, best session yet”, “fantastic” and “Outstanding!”. I think lots of talks can be “Great”, but not so many talks can be “Outstanding” without the word losing its meaning. One wrote “Pretty amazing presentation, considering it was completely extemporaneous.” Brent: Extemporaneous, eh? Rob: Yeah. I guess they don’t realise how much preparation goes into coming across as unprepared. In many ways it’s much easier to give a written speech than to deliver a presentation without slides as a prompt. Brent: That delivery style, the really relaxed, casual, college-professor approach was one of the things I really liked about your presentation at SQLbits. As somebody who presents a lot, I "get" it - I know how hard it is to come off as relaxed and comfortable with your own material. It's like improv done by jazz players and comedians - if you've never tried it, you don't realize how hard it is. People also don't realize how hard it is to make a tough subject fun. Rob: Yeah well... There will be people writing comments on this post that say I wasn't trying to make the subject fun, and that I was making it all about me. Sometimes the style works, sometimes it doesn't. Most of the comments mentioned the fact that I tell jokes, some in a nice way, but some not so much (and it wasn't just a PASS thing - that's the mix of feedback I generally get). One comment at PASS was: “great stand up comedian - not what I'm looking for at pass”, and there were certainly a few that said “too many jokes”. I’m not trying to do stand-up – jokes are my way of engaging with the audience while I demonstrate some of the amazing things that the Query Optimizer can do if you write your queries the right way. Some people didn’t think it was technical enough, but I’ve also had some people tell me that the concepts I’m explaining are deep and profound. Brent: To me, that's a hallmark of a great explanation - when someone says, "But of course it has to work that way - how could it work any other way? It seems so simple and logical." Well, sure it does when it's explained correctly, but now pick up any number of thick SQL Server books and try to understand the Redundant Joins concept. I guarantee it'll take more than 45 minutes. Rob: Some people in my audiences realise that, but definitely not everyone. There's only so much you can tell someone that something is profound. Generally it's something that they either have an epiphany on or not. I like to lull my audience into knowing what's going on, and do something that surprises them. Gain their trust, build a rapport, and then show them the deeper truth of what just happened. Brent: So you've learned your lesson about presentation scores, right? From here on out, you're going to be dry, humorless, and all your presentations will consist of you reading bullet points off the screen. Rob: No Brent, I’m not. I'm also not going to suggest that most presentations at PASS are like that. No-one tries to present like that. There's a big space to occupy between what "dry and humourless" and me. My difference is to focus on the relationship I have with the crowd, rather than focussing on delivering the perfect session. I want to see people smiling and know they're relaxed. I think most presenters focus on the material, which is completely reasonable and safe. I remember once hearing someone talking about product creation. They talked about mediocrity. They said that one of the worst things that people can ever say about your product is that it’s “good”. What you want is for 10% of the world to love it enough to want to buy it. If 10% the world gave me a dollar, I’d have more money than I could ever use (assuming it wasn’t the SAME dollar they were giving me I guess). Brent: It's the Raving Fans theory. It's better to have a small number of raving customers than a large number of almost-but-not-really customers who don't care that much about your product or service. I know exactly how you feel - when I got survey feedback from my Quest video presentation when I was dressed up in a Richard Simmons costume, some of the attendees said I was unprofessional and distracting. Some of the attendees couldn't get enough and Photoshopped all kinds of stuff into the screen captures. On a whole, I probably didn't score that well, and I'm fine with that. It sucks to look at the scores though - do those lower scores bother you? Rob: Of course they do. It hurts deeply. I open myself up and give presentations in a very personal way. All presenters do that, and we all feel the pain of negative feedback. I hate coming 146th & 162nd out of 185, but have to acknowledge that many sessions did worse still. Plus, once I feel the wounds have healed, I’ll be able to remember that there are people in the world that rave about my presentation style, and figure that people will hopefully talk about me. One day maybe those people that don’t like my presentation style will stay away and I might be able to score better. You don’t pay to hear country music if you prefer western... Lots of people find chili too spicy, but it’s still a popular food. Brent: But don’t you want to appeal to everyone? Rob: I do, but I don’t want to be lukewarm as in Revelation 3:16. I’d rather disgust and be discussed. Well, maybe not ‘disgust’, but I don’t want to conform. Conformity just isn’t the same any more. I’m not sure I’ve ever been one to do that. I try not to offend, but definitely like to be different. Brent: Count me among your raving fans, sir. Where can we see you next? Rob: Considering I live in Adelaide in Australia, I’m not about to appear at anyone’s local SQL Saturday. I’m still trying to plan which events I’ll get to in 2011. I’ve submitted abstracts for TechEd North America, but won’t hold my breath. I’m also considering the SQLBits conferences in the UK in April, PASS in October, and I’m sure I’ll do some LiveMeeting presentations for user groups. Online, people download some of my recent SQLBits presentations at http://bit.ly/RFSarg and http://bit.ly/Simplification though. And they can download a 5-minute MP3 of my Lightning Talk at http://www.lobsterpot.com.au/files/Collation.mp3, in which I try to explain the idea behind collation, using thongs as an example. Brent: I was in the audience for http://bit.ly/RFSarg. That was a great presentation. Rob: Thanks, Brent. Now where’s my dollar?

Read the article
SQLBits VI – The sixth sets

- by Rob Farley

My involvement stopped with the tagline, but SQLBits VI is on tomorrow. The theme of the event is Performance Tuning, which has nothing to do with Bruce Willis or dead people – unless Bruce Willis has just become a database expert and been shot for doing a dropping an index (some would say that’s a crime worthy of the death penalty). It’s a shame my involvement hasn’t been more, because it’s such a terrific event, and it would’ve been good to have been there for a second time. It’s a long way to...(read more)

Read the article
SQL Spatial: Getting “nearest” calculations working properly

- by Rob Farley

If you’ve ever done spatial work with SQL Server, I hope you’ve come across the ‘nearest’ problem. You have five thousand stores around the world, and you want to identify the one that’s closest to a particular place. Maybe you want the store closest to the LobsterPot office in Adelaide, at -34.925806, 138.605073. Or our new US office, at 42.524929, -87.858244. Or maybe both! You know how to do this. You don’t want to use an aggregate MIN or MAX, because you want the whole row, telling you which store it is. You want to use TOP, and if you want to find the closest store for multiple locations, you use APPLY. Let’s do this (but I’m going to use addresses in AdventureWorks2012, as I don’t have a list of stores). Oh, and before I do, let’s make sure we have a spatial index in place. I’m going to use the default options. CREATE SPATIAL INDEX spin_Address ON Person.Address(SpatialLocation); And my actual query: WITH MyLocations AS (SELECT * FROM (VALUES ('LobsterPot Adelaide', geography::Point(-34.925806, 138.605073, 4326)), ('LobsterPot USA', geography::Point(42.524929, -87.858244, 4326)) ) t (Name, Geo)) SELECT l.Name, a.AddressLine1, a.City, s.Name AS [State], c.Name AS Country FROM MyLocations AS l CROSS APPLY ( SELECT TOP (1) * FROM Person.Address AS ad ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a JOIN Person.StateProvince AS s ON s.StateProvinceID = a.StateProvinceID JOIN Person.CountryRegion AS c ON c.CountryRegionCode = s.CountryRegionCode ; Great! This is definitely working. I know both those City locations, even if the AddressLine1s don’t quite ring a bell. I’m sure I’ll be able to find them next time I’m in the area. But of course what I’m concerned about from a querying perspective is what’s happened behind the scenes – the execution plan. This isn’t pretty. It’s not using my index. It’s sucking every row out of the Address table TWICE (which sucks), and then it’s sorting them by the distance to find the smallest one. It’s not pretty, and it takes a while. Mind you, I do like the fact that it saw an indexed view it could use for the State and Country details – that’s pretty neat. But yeah – users of my nifty website aren’t going to like how long that query takes. The frustrating thing is that I know that I can use the index to find locations that are within a particular distance of my locations quite easily, and Microsoft recommends this for solving the ‘nearest’ problem, as described at http://msdn.microsoft.com/en-au/library/ff929109.aspx. Now, in the first example on this page, it says that the query there will use the spatial index. But when I run it on my machine, it does nothing of the sort. I’m not particularly impressed. But what we see here is that parallelism has kicked in. In my scenario, it’s split the data up into 4 threads, but it’s still slow, and not using my index. It’s disappointing. But I can persuade it with hints! If I tell it to FORCESEEK, or use my index, or even turn off the parallelism with MAXDOP 1, then I get the index being used, and it’s a thing of beauty! Part of the plan is here: It’s massive, and it’s ugly, and it uses a TVF… but it’s quick. The way it works is to hook into the GeodeticTessellation function, which is essentially finds where the point is, and works out through the spatial index cells that surround it. This then provides a framework to be able to see into the spatial index for the items we want. You can read more about it at http://msdn.microsoft.com/en-us/library/bb895265.aspx#tessellation – including a bunch of pretty diagrams. One of those times when we have a much more complex-looking plan, but just because of the good that’s going on. This tessellation stuff was introduced in SQL Server 2012. But my query isn’t using it. When I try to use the FORCESEEK hint on the Person.Address table, I get the friendly error: Msg 8622, Level 16, State 1, Line 1 Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN. And I’m almost tempted to just give up and move back to the old method of checking increasingly large circles around my location. After all, I can even leverage multiple OUTER APPLY clauses just like I did in my recent Lookup post. WITH MyLocations AS (SELECT * FROM (VALUES ('LobsterPot Adelaide', geography::Point(-34.925806, 138.605073, 4326)), ('LobsterPot USA', geography::Point(42.524929, -87.858244, 4326)) ) t (Name, Geo)) SELECT l.Name, COALESCE(a1.AddressLine1,a2.AddressLine1,a3.AddressLine1), COALESCE(a1.City,a2.City,a3.City), s.Name AS [State], c.Name AS Country FROM MyLocations AS l OUTER APPLY ( SELECT TOP (1) * FROM Person.Address AS ad WHERE l.Geo.STDistance(ad.SpatialLocation) < 1000 ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a1 OUTER APPLY ( SELECT TOP (1) * FROM Person.Address AS ad WHERE l.Geo.STDistance(ad.SpatialLocation) < 5000 AND a1.AddressID IS NULL ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a2 OUTER APPLY ( SELECT TOP (1) * FROM Person.Address AS ad WHERE l.Geo.STDistance(ad.SpatialLocation) < 20000 AND a2.AddressID IS NULL ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a3 JOIN Person.StateProvince AS s ON s.StateProvinceID = COALESCE(a1.StateProvinceID,a2.StateProvinceID,a3.StateProvinceID) JOIN Person.CountryRegion AS c ON c.CountryRegionCode = s.CountryRegionCode ; But this isn’t friendly-looking at all, and I’d use the method recommended by Isaac Kunen, who uses a table of numbers for the expanding circles. It feels old-school though, when I’m dealing with SQL 2012 (and later) versions. So why isn’t my query doing what it’s supposed to? Remember the query... WITH MyLocations AS (SELECT * FROM (VALUES ('LobsterPot Adelaide', geography::Point(-34.925806, 138.605073, 4326)), ('LobsterPot USA', geography::Point(42.524929, -87.858244, 4326)) ) t (Name, Geo)) SELECT l.Name, a.AddressLine1, a.City, s.Name AS [State], c.Name AS Country FROM MyLocations AS l CROSS APPLY ( SELECT TOP (1) * FROM Person.Address AS ad ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a JOIN Person.StateProvince AS s ON s.StateProvinceID = a.StateProvinceID JOIN Person.CountryRegion AS c ON c.CountryRegionCode = s.CountryRegionCode ; Well, I just wasn’t reading http://msdn.microsoft.com/en-us/library/ff929109.aspx properly. The following requirements must be met for a Nearest Neighbor query to use a spatial index: A spatial index must be present on one of the spatial columns and the STDistance() method must use that column in the WHERE and ORDER BY clauses. The TOP clause cannot contain a PERCENT statement. The WHERE clause must contain a STDistance() method. If there are multiple predicates in the WHERE clause then the predicate containing STDistance() method must be connected by an AND conjunction to the other predicates. The STDistance() method cannot be in an optional part of the WHERE clause. The first expression in the ORDER BY clause must use the STDistance() method. Sort order for the first STDistance() expression in the ORDER BY clause must be ASC. All the rows for which STDistance returns NULL must be filtered out. Let’s start from the top. 1. Needs a spatial index on one of the columns that’s in the STDistance call. Yup, got the index. 2. No ‘PERCENT’. Yeah, I don’t have that. 3. The WHERE clause needs to use STDistance(). Ok, but I’m not filtering, so that should be fine. 4. Yeah, I don’t have multiple predicates. 5. The first expression in the ORDER BY is my distance, that’s fine. 6. Sort order is ASC, because otherwise we’d be starting with the ones that are furthest away, and that’s tricky. 7. All the rows for which STDistance returns NULL must be filtered out. But I don’t have any NULL values, so that shouldn’t affect me either. ...but something’s wrong. I do actually need to satisfy #3. And I do need to make sure #7 is being handled properly, because there are some situations (eg, differing SRIDs) where STDistance can return NULL. It says so at http://msdn.microsoft.com/en-us/library/bb933808.aspx – “STDistance() always returns null if the spatial reference IDs (SRIDs) of the geography instances do not match.” So if I simply make sure that I’m filtering out the rows that return NULL… …then it’s blindingly fast, I get the right results, and I’ve got the complex-but-brilliant plan that I wanted. It just wasn’t overly intuitive, despite being documented. @rob_farley

Read the article
An MCM exam, Rob? Really?

- by Rob Farley

I took the SQL 2008 MCM Knowledge exam while in Seattle for the PASS Summit ten days ago. I wasn’t planning to do it, but I got persuaded to try. I was meaning to write this post to explain myself before the result came out, but it seems I didn’t get typing quickly enough. Those of you who know me will know I’m a big fan of certification, to a point. I’ve been involved with Microsoft Learning to help create exams. I’ve kept my certifications current since I first took an exam back in 1998, sitting many in beta, across quite a variety of topics. I’ve probably become quite good at them – I know I’ve definitely passed some that I really should’ve failed. I’ve also written that I don’t think exams are worth studying for. (That’s probably not entirely true, but it depends on your motivation. If you’re doing learning, I would encourage you to focus on what you need to know to do your job better. That will help you pass an exam – but the two skills are very different. I can coach someone on how to pass an exam, but that’s a different kind of teaching when compared to coaching someone about how to do a job. For example, the real world includes a lot of “it depends”, where you develop a feel for what the influencing factors might be. In an exam, its better to be able to know some of the “Don’t use this technology if XYZ is true” concepts better.) As for the Microsoft Certified Master certification… I’m not opposed to the idea of having the MCM (or in the future, MCSM) cert. But the barrier to entry feels quite high for me. When it was first introduced, the nearest testing centres to me were in Kuala Lumpur and Manila. Now there’s one in Perth, but that’s still a big effort. I know there are options in the US – such as one about an hour’s drive away from downtown Seattle, but it all just seems too hard. Plus, these exams are more expensive, and all up – I wasn’t sure I wanted to try them, particularly with the fact that I don’t like to study. I used to study for exams. It would drive my wife crazy. I’d have some exam scheduled for some time in the future (like the time I had two booked for two consecutive days at TechEd Australia 2005), and I’d make sure I was ready. Every waking moment would be spent pouring over exam material, and it wasn’t healthy. I got shaken out of that, though, when I ended up taking four exams in those two days in 2005 and passed them all. I also worked out that if I had a Second Shot available, then failing wasn’t a bad thing at all. Even without Second Shot, I’m much more okay about failing. But even just trying an MCM exam is a big effort. I wouldn’t want to fail one of them. Plus there’s the illusion to maintain. People have told me for a long time that I should just take the MCM exams – that I’d pass no problem. I’ve never been so sure. It was almost becoming a pride-point. Perhaps I should fail just to demonstrate that I can fail these things. Anyway – boB Taylor (@sqlboBT) persuaded me to try the SQL 2008 MCM Knowledge exam at the PASS Summit. They set up a testing centre in one of the room there, so it wasn’t out of my way at all. I had to squeeze it in between other commitments, and I certainly didn’t have time to even see what was on the syllabus, let alone study. In fact, I was so exhausted from the week that I fell asleep at least once (just for a moment though) during the actual exam. Perhaps the questions need more jokes, I’m not sure. I knew if I failed, then I might disappoint some people, but that I wouldn’t’ve spent a great deal of effort in trying to pass. On the other hand, if I did pass I’d then be under pressure to investigate the MCM Lab exam, which can be taken remotely (therefore, a much smaller amount of effort to make happen). In some ways, passing could end up just putting a bunch more pressure on me. Oh, and I did.

Read the article
Stored Procedures with SSRS? Hmm… not so much

- by Rob Farley

Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports. Let me say that again, in case you missed it the first time: I want to have a whinge about people who insist on using stored procedures for SSRS reports. Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this: create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin /* First check that @eomdate is a valid date */ if isdate(@eomdate) != 1 begin select 'Please enter a valid date' as ErrorMessage; return; end /* Then check that time has passed since @eomdate */ if datediff(day,@eomdate,sysdatetime()) < 5 begin select 'Sorry - EOM is not complete yet' as ErrorMessage; return; end /* If those checks have succeeded, return the data */ select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error. The other check is maybe fair enough, but I still don’t like it. The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment. If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set. Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set. Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY. When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!) So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think. set fmtonly on exec dbo.GetMonthSummaryPerSalesPerson '20030401'; set fmtonly off Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back. But what I do get back hurts Reporting Services. It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them. It would have been much better to push these checks down into the query itself. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date. First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation. …compared to one that uses a future date: You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller. The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it. Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0. That’s right – it doesn’t run. So much for reading plans right-to-left... The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false. Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID /* Now include the error messages */ union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 order by SalesPersonID; end But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions. That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.) create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) returns table as return ( select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 ); We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data. And we get so much more flexibility with this. If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago. So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like: select SalesPersonID from dbo.GetMonthSummaryPerSalesPerson(@eomonth) order by SalesPersonID; This doesn’t need to look up the TotalDue field, which makes a simpler plan. select * from dbo.GetMonthSummaryPerSalesPerson(@eomonth) where SalesPersonID is not null order by SalesPersonID; This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report. If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation? Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish. @rob_farley

Read the article
How many people will be with you during 24HOP?

- by Rob Farley

In less than a week, SQLPASS hosts another 24 Hours of PASS event, this time with an array of 24 female speakers (in honour of this month being Women’s History Month). Interestingly, the committee has had a few people ask if there are rules about how the event can be viewed, such as “How many people from any one organisation can watch it?” or “Does it matter if a few people are crowded around the same screen?” From a licensing and marketing perspective, there is value in knowing how many people are watching the event, but there are no restrictions about how the thing is viewed. In fact – if you’re planning to watch any of these events, I want to suggest an idea: Book a meeting room in your office with a projector, and watch 24HOP in there. If you’re planning to have it streaming in the background while you work, obviously this makes life a bit trickier. But if you’re planning to treat it as a training event (a 2-day conference if you like) and block out a bit of time for it (as well you should – there’s going to be some great stuff in there), then why not do it in a way that makes it so that other people can see that you’re watching it, and potentially join you. When an event like this runs, we can see how many different ‘people’ are attending each LiveMeeting session. What we can’t tell is how many actual people there are represented. Jessica Moss spoke to the Adelaide SQL Server User Group a few weeks ago via LiveMeeting, and LiveMeeting told us there were less than a dozen people attending. Really there were at least three times that number, because all the people in the room with me weren’t included. I’d love to imagine that every LiveMeeting attendee represented a crowd in a room, watching a shared screen. So there’s my challenge – don’t let your LiveMeeting session represent just you. Find a way of involving other people. At the very least, you’ll be able to discuss it with them afterwards. Now stick a comment on this post to let me know how many people are going to be joining you. :) If you’re not registered for the event yet, get yourself over to the SQLPASS site and make it happen.

Read the article
Re-running SSRS subscription jobs that have failed

- by Rob Farley

Sometimes, an SSRS subscription for some reason. It can be annoying, particularly as the appropriate response can be hard to see immediately. There may be a long list of jobs that failed one morning if a Mail Server is down, and trying to work out a way of running each one again can be painful. It’s almost an argument for using shared schedules a lot, but the problem with this is that there are bound to be other things on that shared schedule that you wouldn’t want to be re-run. Luckily, there’s a table in the ReportServer database called dbo.Subscriptions, which is where LastStatus of the Subscription is stored. Having found the subscriptions that you’re interested in, finding the SQL Agent Jobs that correspond to them can be frustrating. Luckily, the jobstep command contains the subscriptionid, so it’s possible to look them up based on that. And of course, once the jobs have been found, they can be executed easily enough. In this example, I produce a list of the commands to run the jobs. I can copy the results out and execute them. select 'exec sp_start_job @job_name = ''' + cast(j.name as varchar(40)) + '''' from msdb.dbo.sysjobs j join msdb.dbo.sysjobsteps js on js.job_id = j.job_id join [ReportServer].[dbo].[Subscriptions] s on js.command like '%' + cast(s.subscriptionid as varchar(40)) + '%' where s.LastStatus like 'Failure sending mail%'; Another option could be to return the job step commands directly (js.command in this query), but my preference is to run the job that contains the step. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Visualising data a different way with Pivot collections

- by Rob Farley

Roger’s been doing a great job extending PivotViewer recently, and you can find the list of LobsterPot pivots at http://pivot.lobsterpot.com.au Many months back, the TED Talk that Gary Flake did about Pivot caught my imagination, and I did some research into it. At the time, most of what we did with Pivot was geared towards what we could do for clients, including making Pivot collections based on students at a school, and using it to browse PDF invoices by their various properties. We had actual commercial work based on Pivot collections back then, and it was all kinds of fun. Later, we made some collections for events that were happening, and even got featured in the TechEd Australia keynote. But I’m getting ahead of myself... let me explain the concept. A Pivot collection is an XML file (with .cxml extension) which lists Items, each linking to an image that’s stored in a Deep Zoom format (this means that it contains tiles like Bing Maps, so that the browser can request only the ones of interest according to the zoom level). This collection can be shown in a Silverlight application that uses the PivotViewer control, or in the Pivot Browser that’s available from getpivot.com. Filtering and sorting the items according to their facets (attributes, such as size, age, category, etc), the PivotViewer rearranges the way that these are shown in a very dynamic way. To quote Gary Flake, this lets us “see patterns which are otherwise hidden”. This browsing mechanism is very suited to a number of different methods, because it’s just that – browsing. It’s not searching, it’s more akin to window-shopping than doing an internet search. When we decided to put something together for the conferences such as TechEd Australia 2010 and the PASS Summit 2010, we did some screen-scraping to provide a different view of data that was already available online. Nick Hodge and Michael Kordahi from Microsoft liked the idea a lot, and after a bit of tweaking, we produced one that Michael used in the TechEd Australia keynote to show the variety of talks on offer. It’s interesting to see a pattern in this data: The Office track has the most sessions, but if the Interactive Sessions and Instructor-Led Labs are removed, it drops down to only the sixth most popular track, with Cloud Computing taking over. This is something which just isn’t obvious when you look an ordinary search tool. You get a much better feel for the data when moving around it like this. The more observant amongst you will have noticed some difference in the collection that Michael is demonstrating in the picture above with the screenshots I’ve shown. That’s because it’s been extended some more. At the SQLBits conference in the UK this year, I had some interesting discussions with the guys from Xpert360, particularly Phil Carter, who I’d met in 2009 at an earlier SQLBits conference. They had got around to producing a Pivot collection based on the SQLBits data, which we had been planning to do but ran out of time. We discussed some of ways that Pivot could be used, including the ways that my old friend Howard Dierking had extended it for the MSDN Magazine. I’m not suggesting I influenced Xpert360 at all, but they certainly inspired us with some of their posts on the matter So with LobsterPot guys David Gardiner and Roger Noble both having dabbled in Pivot collections (and Dave doing some for clients), I set Roger to work on extending it some more. He’s used various events and so on to be able to make an environment that allows us to do quick deployment of new collections, as well as showing the data in a grid view which behaves as if it were simply a third view of the data (the other two being the array of images and the ‘histogram’ view). I see PivotViewer as being a significant step in data visualisation – so much so that I feature it when I deliver talks on Spatial Data Visualisation methods. Any time when there is information that can be conveyed through an image, you have to ask yourself how best to show that image, and whether that image is the focal point. For Spatial data, the image is most often a map, and the map becomes the central mode for navigation. I show Pivot with postcode areas, since I can browse the postcodes based on their data, and many of the images are recognisable (to locals of South Australia). Naturally, the images could link through to the map itself, and so on, but generally people think of Spatial data in terms of navigating a map, which doesn’t always gel with the information you’re trying to extract. Roger’s even looking into ways to hook PivotViewer into the Bing Maps API, in a similar way to the Deep Earth project, displaying different levels of map detail according to how ‘zoomed in’ the images are. Some of the work that Dave did with one of the schools was generating the Deep Zoom tiles “on the fly”, based on images stored in a database, and Roger has produced a collection which uses images from flickr, that lets you move from one search term to another. Pulling the images down from flickr.com isn’t particularly ideal from a performance aspect, and flickr doesn’t store images in a small-enough format to really lend itself to this use, but you might agree that it’s an interesting concept which compares nicely to using Maps. I’m looking forward to future versions of the PivotViewer control, and hope they provide many more events that can be used, and even more hooks into it. Naturally, LobsterPot could help provide your business with a PivotViewer experience, but you can probably do a lot of it yourself too. There’s a thorough guide at getpivot.com, which is how we got into it. For some examples of what we’ve done, have a look at http://pivot.lobsterpot.com.au. I’d like to see PivotViewer really catch on a data visualisation tool.

Read the article
Slide-decks from recent Adelaide SQL Server UG meetings

- by Rob Farley

The UK has been well represented this summer at the Adelaide SQL Server User Group, with presentations from Chris Testa-O’Neill (isn’t that the right link? Maybe try this one) and Martin Cairney. The slides are available here and here. I thought I’d particularly mention Martin’s, and how it’s relevant to this month’s T-SQL Tuesday. Martin spoke about Policy-Based Management and the Enterprise Policy Management Framework – something which is remarkably under-used, and yet which can really impact your ability to look after environments. If you have policies set up, then you can easily test each of your SQL instances to see if they are still satisfying a set of policies as defined. Automation (the topic of this month’s T-SQL Tuesday) should mean that your life is made easier, thereby enabling to you to do more. It shouldn’t remove the human element, but should remove (most of) the human errors. People still need to manage the situation, and work out what needs to be done, etc. We haven’t reached a point where computers can replace people, but they are very good at replace the mundaneness and monotony of our jobs. They’ve made our lives more interesting (although many would rightly argue that they have also made our lives more complex) by letting us focus on the stuff that changes. Martin named his talk Put Your Feet Up, which nicely expresses the fact that managing systems shouldn’t be about running around checking things all the time. It must be about having systems in place which tell you when things aren’t going well. It’s never quite as simple as being able to actually put your feet up, but certainly no system should require constant attention. It’s definitely a policy we at LobsterPot adhere to, whether it’s an alert to let us know that an ETL package has run successfully, or a script that generates some code for a report. If things can be automated, it reduces the chance of error, reduces the repetitive nature of work, and in general, keeps both consultants and clients much happier.

Read the article
24 hours to pass until 24 Hours of PASS

- by Rob Farley

There’s a bunch of stuff going on at the moment in the SQL world, so if you’ve missed this particular piece of news, let me tell you a bit about it. Twice a year, the SQL community puts on its biggest virtual event – 24 Hours of PASS. And the next one is tomorrow – March 21st, 2012. Twenty-four sessions, back-to-back, featuring a selection of some of the best presenters in the SQL world, speakers from all over the world, coming together in an online collaboration that so far has well over thirty thousand registrations across the presentations. Some people are signed up for all 24 sessions, some only one. Traditionally, LiveMeeting has been used as the platform for this event, but this year we’re going with a new platform – IBTalk. It promises big, and we’re hoping it won’t let us down. LiveMeeting has been great, and we thank Microsoft for providing it as a platform for the past few years. However, as the event has grown, we’ve found that a new idea is necessary. Last year a search was done for a new platform, and IBTalk ticked the right boxes. The feedback from the presenters and moderators so far has been overwhelmingly positive, and we’re hoping that this is going to really enhance the user experience. One of my favourite features of the platform is the language side. It provides a pretty good translation service. Users who join a session will see a flag on the left of the screen. If they click it, they can change the language to one of 15 on offer. Picking this changes all the labels on everything. It even translates the text in the Q&A window. What this means is that someone from Brazil can ask their question in Portuguese, and the presenter will see it in English. Then if the answer is typed in English, the questioner will be able to see the answer, also in Portuguese. Or they can switch to English to see it as the answerer typed it. I know there’s always the risk of bad translations going on, but I’ve heard good things about this translation service. But there’s more – IBTalk are providing staff to type up closed captioning live during the event. So if English isn’t your first language, don’t worry! Picking your language will also let you see subtitles in your chosen language. I’m hoping that this event is the start of PASS being able to reach people from all corners of the world. Wouldn’t it be great to find that this event is successful, and that the next 24HOP (later in the year, our Summit Preview event) has just as many non-English speakers tuning in as English speakers? If you haven’t been planning which sessions you’re going to attend, you really should get over to sqlpass.org/24hours and have a look through what’s on offer. There’s some amazing material from some of the industry’s brightest, covering a wide range of topics, from classic SQL areas to the brand new SQL 2012 features. There really should be something for every SQL professional. Check the time zones though – if you’re in the US you might be on Summer time, and an hour closer to GMT than normal. Massive thanks must go to Microsoft, SQL Sentry and Idera for sponsoring this event. Without sponsors we wouldn’t be able to put any of this on. These companies are helping 24HOP continue to grow into an event for the whole world. See you tomorrow! @rob_farley | #24hop | #sqlpass

Read the article
24 hours to pass until 24 Hours of PASS

- by Rob Farley

There’s a bunch of stuff going on at the moment in the SQL world, so if you’ve missed this particular piece of news, let me tell you a bit about it. Twice a year, the SQL community puts on its biggest virtual event – 24 Hours of PASS. And the next one is tomorrow – March 21st, 2012. Twenty-four sessions, back-to-back, featuring a selection of some of the best presenters in the SQL world, speakers from all over the world, coming together in an online collaboration that so far has well over thirty thousand registrations across the presentations. Some people are signed up for all 24 sessions, some only one. Traditionally, LiveMeeting has been used as the platform for this event, but this year we’re going with a new platform – IBTalk. It promises big, and we’re hoping it won’t let us down. LiveMeeting has been great, and we thank Microsoft for providing it as a platform for the past few years. However, as the event has grown, we’ve found that a new idea is necessary. Last year a search was done for a new platform, and IBTalk ticked the right boxes. The feedback from the presenters and moderators so far has been overwhelmingly positive, and we’re hoping that this is going to really enhance the user experience. One of my favourite features of the platform is the language side. It provides a pretty good translation service. Users who join a session will see a flag on the left of the screen. If they click it, they can change the language to one of 15 on offer. Picking this changes all the labels on everything. It even translates the text in the Q&A window. What this means is that someone from Brazil can ask their question in Portuguese, and the presenter will see it in English. Then if the answer is typed in English, the questioner will be able to see the answer, also in Portuguese. Or they can switch to English to see it as the answerer typed it. I know there’s always the risk of bad translations going on, but I’ve heard good things about this translation service. But there’s more – IBTalk are providing staff to type up closed captioning live during the event. So if English isn’t your first language, don’t worry! Picking your language will also let you see subtitles in your chosen language. I’m hoping that this event is the start of PASS being able to reach people from all corners of the world. Wouldn’t it be great to find that this event is successful, and that the next 24HOP (later in the year, our Summit Preview event) has just as many non-English speakers tuning in as English speakers? If you haven’t been planning which sessions you’re going to attend, you really should get over to sqlpass.org/24hours and have a look through what’s on offer. There’s some amazing material from some of the industry’s brightest, covering a wide range of topics, from classic SQL areas to the brand new SQL 2012 features. There really should be something for every SQL professional. Check the time zones though – if you’re in the US you might be on Summer time, and an hour closer to GMT than normal. Massive thanks must go to Microsoft, SQL Sentry and Idera for sponsoring this event. Without sponsors we wouldn’t be able to put any of this on. These companies are helping 24HOP continue to grow into an event for the whole world. See you tomorrow! @rob_farley | #24hop | #sqlpass

Read the article
LobsterPot Solutions in the USA

- by Rob Farley

We’re expanding! I’m thrilled to announce that Microsoft Gold Partner LobsterPot Solutions has started another branch appointing the amazing Ted Krueger (5-time SQL MVP awardee) as the US lead. Ted is well-known in the SQL Server world, having written books on indexing, consulting and on being a DBA (not to mention contributing chapters to both MVP Deep Dives books). He is an expert on replication and high availability, and strong in the Business Intelligence space – vast experience which is both broad and deep. Ted is based in the south east corner of Wisconsin, just north of Chicago. He has been a consultant for eons and has helped many clients with their projects and problems, taking the role as both technical lead and consulting lead. He is also tireless in supporting and developing the SQL Server community, presenting at conferences across America, and helping people through his blog, Twitter and more. Despite all this – it’s neither his technical excellence with SQL Server nor his consulting skill that made me want him to lead LobsterPot’s US venture. I wanted Ted because of his values. In the time I’ve known Ted, I’ve found his integrity to be excellent, and found him to be morally beyond reproach. This is the biggest priority I have when finding people to represent the LobsterPot brand. I have no qualms in recommending Ted’s character or work ethic. It’s not just my thoughts on him – all my trusted friends that know Ted agree about this. So last week, LobsterPot Solutions LLC was formed in the United States, and in a couple of weeks, we will be open for business! LobsterPot Solutions can be contacted via email at [email protected], on the web at either www.lobsterpot.com.au or www.lobsterpotsolutions.com, and on Twitter as @lobsterpot_au and @lobsterpot_us. Ted Kruger blogs at LessThanDot, and can also be found on Twitter and LinkedIn. This post is cross-posted from http://lobsterpotsolutions.com/lobsterpot-solutions-in-the-usa

Read the article
An MCM exam, Rob? Really?

- by Rob Farley

I took the SQL 2008 MCM Knowledge exam while in Seattle for the PASS Summit ten days ago. I wasn’t planning to do it, but I got persuaded to try. I was meaning to write this post to explain myself before the result came out, but it seems I didn’t get typing quickly enough. Those of you who know me will know I’m a big fan of certification, to a point. I’ve been involved with Microsoft Learning to help create exams. I’ve kept my certifications current since I first took an exam back in 1998, sitting many in beta, across quite a variety of topics. I’ve probably become quite good at them – I know I’ve definitely passed some that I really should’ve failed. I’ve also written that I don’t think exams are worth studying for. (That’s probably not entirely true, but it depends on your motivation. If you’re doing learning, I would encourage you to focus on what you need to know to do your job better. That will help you pass an exam – but the two skills are very different. I can coach someone on how to pass an exam, but that’s a different kind of teaching when compared to coaching someone about how to do a job. For example, the real world includes a lot of “it depends”, where you develop a feel for what the influencing factors might be. In an exam, its better to be able to know some of the “Don’t use this technology if XYZ is true” concepts better.) As for the Microsoft Certified Master certification… I’m not opposed to the idea of having the MCM (or in the future, MCSM) cert. But the barrier to entry feels quite high for me. When it was first introduced, the nearest testing centres to me were in Kuala Lumpur and Manila. Now there’s one in Perth, but that’s still a big effort. I know there are options in the US – such as one about an hour’s drive away from downtown Seattle, but it all just seems too hard. Plus, these exams are more expensive, and all up – I wasn’t sure I wanted to try them, particularly with the fact that I don’t like to study. I used to study for exams. It would drive my wife crazy. I’d have some exam scheduled for some time in the future (like the time I had two booked for two consecutive days at TechEd Australia 2005), and I’d make sure I was ready. Every waking moment would be spent pouring over exam material, and it wasn’t healthy. I got shaken out of that, though, when I ended up taking four exams in those two days in 2005 and passed them all. I also worked out that if I had a Second Shot available, then failing wasn’t a bad thing at all. Even without Second Shot, I’m much more okay about failing. But even just trying an MCM exam is a big effort. I wouldn’t want to fail one of them. Plus there’s the illusion to maintain. People have told me for a long time that I should just take the MCM exams – that I’d pass no problem. I’ve never been so sure. It was almost becoming a pride-point. Perhaps I should fail just to demonstrate that I can fail these things. Anyway – boB Taylor (@sqlboBT) persuaded me to try the SQL 2008 MCM Knowledge exam at the PASS Summit. They set up a testing centre in one of the room there, so it wasn’t out of my way at all. I had to squeeze it in between other commitments, and I certainly didn’t have time to even see what was on the syllabus, let alone study. In fact, I was so exhausted from the week that I fell asleep at least once (just for a moment though) during the actual exam. Perhaps the questions need more jokes, I’m not sure. I knew if I failed, then I might disappoint some people, but that I wouldn’t’ve spent a great deal of effort in trying to pass. On the other hand, if I did pass I’d then be under pressure to investigate the MCM Lab exam, which can be taken remotely (therefore, a much smaller amount of effort to make happen). In some ways, passing could end up just putting a bunch more pressure on me. Oh, and I did.

Read the article
Plan Operator Tuesday round-up

- by Rob Farley

Eighteen posts for T-SQL Tuesday #43 this month, discussing Plan Operators. I put them together and made the following clickable plan. It’s 1000px wide, so I hope you have a monitor wide enough. Let me explain this plan for you (people’s names are the links to the articles on their blogs – the same links as in the plan above). It was clearly a SELECT statement. Wayne Sheffield (@dbawayne) wrote about that, so we start with a SELECT physical operator, leveraging the logical operator Wayne Sheffield. The SELECT operator calls the Paul White operator, discussed by Jason Brimhall (@sqlrnnr) in his post. The Paul White operator is quite remarkable, and can consume three streams of data. Let’s look at those streams. The first pulls data from a Table Scan – Boris Hristov (@borishristov)’s post – using parallel threads (Bradley Ball – @sqlballs) that pull the data eagerly through a Table Spool (Oliver Asmus – @oliverasmus). A scalar operation is also performed on it, thanks to Jeffrey Verheul (@devjef)’s Compute Scalar operator. The second stream of data applies Evil (I figured that must mean a procedural TVF, but could’ve been anything), courtesy of Jason Strate (@stratesql). It performs this Evil on the merging of parallel streams (Steve Jones – @way0utwest), which suck data out of a Switch (Paul White – @sql_kiwi). This Switch operator is consuming data from up to four lookups, thanks to Kalen Delaney (@sqlqueen), Rick Krueger (@dataogre), Mickey Stuewe (@sqlmickey) and Kathi Kellenberger (@auntkathi). Unfortunately Kathi’s name is a bit long and has been truncated, just like in real plans. The last stream performs a join of two others via a Nested Loop (Matan Yungman – @matanyungman). One pulls data from a Spool (my post – @rob_farley) populated from a Table Scan (Jon Morisi). The other applies a catchall operator (the catchall is because Tamera Clark (@tameraclark) didn’t specify any particular operator, and a catchall is what gets shown when SSMS doesn’t know what to show. Surprisingly, it’s showing the yellow one, which is about cursors. Hopefully that’s not what Tamera planned, but anyway...) to the output from an Index Seek operator (Sebastian Meine – @sqlity). Lastly, I think everyone put in 110% effort, so that’s what all the operators cost. That didn’t leave anything for me, unfortunately, but that’s okay. Also, because he decided to use the Paul White operator, Jason Brimhall gets 0%, and his 110% was given to Paul’s Switch operator post. I hope you’ve enjoyed this T-SQL Tuesday, and have learned something extra about Plan Operators. Keep your eye out for next month’s one by watching the Twitter Hashtag #tsql2sday, and why not contribute a post to the party? Big thanks to Adam Machanic as usual for starting all this. @rob_farley

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article
Exploding maps in Reporting Services 2008 R2

- by Rob Farley

Kaboom! Well, that was the imagery that secretly appeared in my mind when I saw “USA By State Exploded” in the list of installed maps in Report Builder 3.0 – part of the spatial offering of SQL Server Reporting Server 2008 R2. Alas, it just means that the borders are bigger. Clicking on it showed me. Unfortunately, I’m not interested in maps of the US. None of my clients are there (at least, not yet – feel free to get in touch if you want to change this ‘feature’ of my company). So instead, I’ve recently been getting hold of some data for Australian areas. I’ve just bought some PostCode shapes for South Australia, and will use this in demos for conferences and for showing clients how this kind of report can really impact their reporting. One of the companies I was talking about getting shape files sent me a sample. So I chose the “ESRI shapefile” option you see above, and browsed to my file. It appeared in the window like this: Australians will immediately recognise this as the area around Wollongong, just south of Sydney. Well, apart from me. I didn’t. I had to put a Bing Maps layer behind it to work that out, but that’s not for this post. The thing that I discovered was that if I selected the Exploded USA option (but without clicking Next), and then chose my shape file, then my area around Wollongong would be exploded too! Huh! I think this is actually a bug, but a potentially useful one! Some further investigation (involving creating two identical reports, one with this exploded view, one without), showed that the Exploded View is done by reducing the ScaleFactor property of the PolygonLayer in the map control. The Exploded version has it below 1. If you set to above one, your shapes overlap. I discovered this by accident… I guess I hadn’t looked through all the PolygonLayer options to work out what they all do. And because this post is about Reporting, it can qualify for this month’s T-SQL Tuesday, hosted by Aaron Nelson (@sqlvariant). Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Summit Old, Summit New, Summit Borrowed...

- by Rob Farley

PASS Summit is coming up, and I thought I’d post a few things. Summit Old... At the PASS Summit, you will get the chance to hear presentations by the SQL Server establishment. Just about every big name in the SQL Server world is a regular at the PASS Summit, so you will get to hear and meet people like Kalen Delaney (@sqlqueen) (who just recently got awarded MVP status for the 20th year running), and from all around the world such as the UK’s Chris Webb (@technitrain) or Pinal Dave (@pinaldave) from India. Almost all the household names in SQL Server will be there, including a large contingent from Microsoft. The PASS Summit is by far the best place to meet the legends of SQL Server. And they’re not all old. Some are, but most of them are younger than you might think. ...Summit New... The hottest topics are often about the newest technologies (such as SQL Server 2012). But you will almost certainly learn new stuff about older versions too. But that’s not what I wanted to pick on for this point. There are many new speakers at every PASS Summit, and content that has not been covered in other places. This year, for example, LobsterPot’s Roger Noble (@roger_noble) is giving a presentation for the first time. He’s a regular around the Australian circuit, but this is his first time presenting to a US audience. New Zealand’s Paul White (@sql_kiwi) is attending his first PASS Summit, and will be giving over four hours of incredibly deep stuff that has never been presented anywhere in the US before (I can’t say the world, because he did present similar material in Adelaide earlier in the year). ...Summit Borrowed... No, I’m not talking about plagiarism – the talks you’ll hear are all their own work. But you will get a lot of stuff you’ll be able to take back and apply at work. The PASS Summit sessions are not full of sales-pitches, telling you about how great things could be if only you’d buy some third-party vendor product. It’s simply not that kind of conference, and PASS doesn’t allow that kind of talk to take place. Instead, you’ll be taught techniques, and be able to download scripts and slides to let you perform that magic back at work when you get home. You will definitely find plenty of ideas to borrow at the PASS Summit. ...Summit Blue Yeah – and there’s karaoke. Blue - Jason - SQL Karaoke - YouTube

Read the article
Visually stunning maps and PivotViewer

- by Rob Farley

One of the things about PivotViewer is that it runs in the Silverlight platform and can be extended recently. One of my guys at LobsterPot, Roger Noble, has used this to incorporate a Bing Maps layer, showing items which have Latitude and Longitude values there. We’re already talking to a hospital about using this to allow them to browse their patient data, including showing the patients on a map according to which bed they’re in. Interesting times – this will involve having custom tiles instead of the ones from Bing Maps, but the idea is similar. Of course, we’ll be using Bing Maps to show where the patients live. I should also mention that this is a work-in-progress still. Figuring out how to use PivotViewer isn’t trivial, and we’ve done quite a lot of experimenting to see how to get things working. If you find bugs, please feel free to let me know (rob_farley at hotmail will usually reach me), and we’ll add them to our list. Here are some screenshots that I made recently using the collection at http://pivot.lobsterpot.com.au/flickr – by selecting a tag, you can get a new bunch of images. A couple of images that were taken in Iceland. Some from St Mary’s Lighthouse near Newcastle, UK. And some from around Big Ben in London. I’d recommend using either Firefox or Internet Explorer if you choose to browse this yourself. It seems the Chrome browser support for Silverlight doesn’t quite handle things as nicely as we’d all like. I imagine that at some point, we may enhance the Flickr collection, to be able to search on more than tags, but as a sample collection, it seems to work quite well.

Read the article
Spooling in SQL execution plans

- by Rob Farley

Sewing has never been my thing. I barely even know the terminology, and when discussing this with American friends, I even found out that half the words that Americans use are different to the words that English and Australian people use. That said – let’s talk about spools! In particular, the Spool operators that you find in some SQL execution plans. This post is for T-SQL Tuesday, hosted this month by me! I’ve chosen to write about spools because they seem to get a bad rap (even in my song I used the line “There’s spooling from a CTE, they’ve got recursion needlessly”). I figured it was worth covering some of what spools are about, and hopefully explain why they are remarkably necessary, and generally very useful. If you have a look at the Books Online page about Plan Operators, at http://msdn.microsoft.com/en-us/library/ms191158.aspx, and do a search for the word ‘spool’, you’ll notice it says there are 46 matches. 46! Yeah, that’s what I thought too... Spooling is mentioned in several operators: Eager Spool, Lazy Spool, Index Spool (sometimes called a Nonclustered Index Spool), Row Count Spool, Spool, Table Spool, and Window Spool (oh, and Cache, which is a special kind of spool for a single row, but as it isn’t used in SQL 2012, I won’t describe it any further here). Spool, Table Spool, Index Spool, Window Spool and Row Count Spool are all physical operators, whereas Eager Spool and Lazy Spool are logical operators, describing the way that the other spools work. For example, you might see a Table Spool which is either Eager or Lazy. A Window Spool can actually act as both, as I’ll mention in a moment. In sewing, cotton is put onto a spool to make it more useful. You might buy it in bulk on a cone, but if you’re going to be using a sewing machine, then you quite probably want to have it on a spool or bobbin, which allows it to be used in a more effective way. This is the picture that I want you to think about in relation to your data. I’m sure you use spools every time you use your sewing machine. I know I do. I can’t think of a time when I’ve got out my sewing machine to do some sewing and haven’t used a spool. However, I often run SQL queries that don’t use spools. You see, the data that is consumed by my query is typically in a useful state without a spool. It’s like I can just sew with my cotton despite it not being on a spool! Many of my favourite features in T-SQL do like to use spools though. This looks like a very similar query to before, but includes an OVER clause to return a column telling me the number of rows in my data set. I’ll describe what’s going on in a few paragraphs’ time. So what does a Spool operator actually do? The spool operator consumes a set of data, and stores it in a temporary structure, in the tempdb database. This structure is typically either a Table (ie, a heap), or an Index (ie, a b-tree). If no data is actually needed from it, then it could also be a Row Count spool, which only stores the number of rows that the spool operator consumes. A Window Spool is another option if the data being consumed is tightly linked to windows of data, such as when the ROWS/RANGE clause of the OVER clause is being used. You could maybe think about the type of spool being like whether the cotton is going onto a small bobbin to fit in the base of the sewing machine, or whether it’s a larger spool for the top. A Table or Index Spool is either Eager or Lazy in nature. Eager and Lazy are Logical operators, which talk more about the behaviour, rather than the physical operation. If I’m sewing, I can either be all enthusiastic and get all my cotton onto the spool before I start, or I can do it as I need it. “Lazy” might not the be the best word to describe a person – in the SQL world it describes the idea of either fetching all the rows to build up the whole spool when the operator is called (Eager), or populating the spool only as it’s needed (Lazy). Window Spools are both physical and logical. They’re eager on a per-window basis, but lazy between windows. And when is it needed? The way I see it, spools are needed for two reasons. 1 – When data is going to be needed AGAIN. 2 – When data needs to be kept away from the original source. If you’re someone that writes long stored procedures, you are probably quite aware of the second scenario. I see plenty of stored procedures being written this way – where the query writer populates a temporary table, so that they can make updates to it without risking the original table. SQL does this too. Imagine I’m updating my contact list, and some of my changes move data to later in the book. If I’m not careful, I might update the same row a second time (or even enter an infinite loop, updating it over and over). A spool can make sure that I don’t, by using a copy of the data. This problem is known as the Halloween Effect (not because it’s spooky, but because it was discovered in late October one year). As I’m sure you can imagine, the kind of spool you’d need to protect against the Halloween Effect would be eager, because if you’re only handling one row at a time, then you’re not providing the protection... An eager spool will block the flow of data, waiting until it has fetched all the data before serving it up to the operator that called it. In the query below I’m forcing the Query Optimizer to use an index which would be upset if the Name column values got changed, and we see that before any data is fetched, a spool is created to load the data into. This doesn’t stop the index being maintained, but it does mean that the index is protected from the changes that are being done. There are plenty of times, though, when you need data repeatedly. Consider the query I put above. A simple join, but then counting the number of rows that came through. The way that this has executed (be it ideal or not), is to ask that a Table Spool be populated. That’s the Table Spool operator on the top row. That spool can produce the same set of rows repeatedly. This is the behaviour that we see in the bottom half of the plan. In the bottom half of the plan, we see that the a join is being done between the rows that are being sourced from the spool – one being aggregated and one not – producing the columns that we need for the query. Table v Index When considering whether to use a Table Spool or an Index Spool, the question that the Query Optimizer needs to answer is whether there is sufficient benefit to storing the data in a b-tree. The idea of having data in indexes is great, but of course there is a cost to maintaining them. Here we’re creating a temporary structure for data, and there is a cost associated with populating each row into its correct position according to a b-tree, as opposed to simply adding it to the end of the list of rows in a heap. Using a b-tree could even result in page-splits as the b-tree is populated, so there had better be a reason to use that kind of structure. That all depends on how the data is going to be used in other parts of the plan. If you’ve ever thought that you could use a temporary index for a particular query, well this is it – and the Query Optimizer can do that if it thinks it’s worthwhile. It’s worth noting that just because a Spool is populated using an Index Spool, it can still be fetched using a Table Spool. The details about whether or not a Spool used as a source shows as a Table Spool or an Index Spool is more about whether a Seek predicate is used, rather than on the underlying structure. Recursive CTE I’ve already shown you an example of spooling when the OVER clause is used. You might see them being used whenever you have data that is needed multiple times, and CTEs are quite common here. With the definition of a set of data described in a CTE, if the query writer is leveraging this by referring to the CTE multiple times, and there’s no simplification to be leveraged, a spool could theoretically be used to avoid reapplying the CTE’s logic. Annoyingly, this doesn’t happen. Consider this query, which really looks like it’s using the same data twice. I’m creating a set of data (which is completely deterministic, by the way), and then joining it back to itself. There seems to be no reason why it shouldn’t use a spool for the set described by the CTE, but it doesn’t. On the other hand, if we don’t pull as many columns back, we might see a very different plan. You see, CTEs, like all sub-queries, are simplified out to figure out the best way of executing the whole query. My example is somewhat contrived, and although there are plenty of cases when it’s nice to give the Query Optimizer hints about how to execute queries, it usually doesn’t do a bad job, even without spooling (and you can always use a temporary table). When recursion is used, though, spooling should be expected. Consider what we’re asking for in a recursive CTE. We’re telling the system to construct a set of data using an initial query, and then use set as a source for another query, piping this back into the same set and back around. It’s very much a spool. The analogy of cotton is long gone here, as the idea of having a continual loop of cotton feeding onto a spool and off again doesn’t quite fit, but that’s what we have here. Data is being fed onto the spool, and getting pulled out a second time when the spool is used as a source. (This query is running on AdventureWorks, which has a ManagerID column in HumanResources.Employee, not AdventureWorks2012) The Index Spool operator is sucking rows into it – lazily. It has to be lazy, because at the start, there’s only one row to be had. However, as rows get populated onto the spool, the Table Spool operator on the right can return rows when asked, ending up with more rows (potentially) getting back onto the spool, ready for the next round. (The Assert operator is merely checking to see if we’ve reached the MAXRECURSION point – it vanishes if you use OPTION (MAXRECURSION 0), which you can try yourself if you like). Spools are useful. Don’t lose sight of that. Every time you use temporary tables or table variables in a stored procedure, you’re essentially doing the same – don’t get upset at the Query Optimizer for doing so, even if you think the spool looks like an expensive part of the query. I hope you’re enjoying this T-SQL Tuesday. Why not head over to my post that is hosting it this month to read about some other plan operators? At some point I’ll write a summary post – once I have you should find a comment below pointing at it. @rob_farley

Read the article
T-SQL in Chicago – the LobsterPot teams with DataEducation

- by Rob Farley

In May, I’ll be in the US. I have board meetings for PASS at the SQLRally event in Dallas, and then I’m going to be spending a bit of time in Chicago. The big news is that while I’m in Chicago (May 14-16), I’m going to teach my “Advanced T-SQL Querying and Reporting: Building Effectiveness” course. This is a course that I’ve been teaching since the 2005 days, and have modified over time for 2008 and 2012. It’s very much my most popular course, and I love teaching it. Let me tell you why. For years, I wrote queries and thought I was good at it. I was a developer. I’d written a lot of C (and other, more fun languages like Prolog and Lisp) at university, and then got into the ‘real world’ and coded in VB, PL/SQL, and so on through to C#, and saw SQL (whichever database system it was) as just a way of getting the data back. I could write a query to return just about whatever data I wanted, and that was good. I was better at it than the people around me, and that helped. (It didn’t help my progression into management, then it just became a frustration, but for the most part, it was good to know that I was good at this particular thing.) But then I discovered the other side of querying – the execution plan. I started to learn about the translation from what I’d written into the plan, and this impacted my query-writing significantly. I look back at the queries I wrote before I understood this, and shudder. I wrote queries that were correct, but often a long way from effective. I’d done query tuning, but had largely done it without considering the plan, just inferring what indexes would help. This is not a performance-tuning course. It’s focused on the T-SQL that you read and write. But performance is a significant and recurring theme. Effective T-SQL has to be about performance – it’s the biggest way that a query becomes effective. There are other aspects too though – such as using constructs better. For example – I can write code that modifies data nicely, but if I haven’t learned about the MERGE statement and the way that it can impact things, I’m missing a few tricks. If you’re going to do this course, a good place to be is the situation I was in a few years before I wrote this course. You’re probably comfortable with writing T-SQL queries. You know how to make a SELECT statement do what you need it to, but feel there has to be a better way. You can write JOINs easily, and understand how to use LEFT JOIN to make sure you don’t filter out rows from the first table, but you’re coding blind. The first module I cover is on Query Execution. Take a look at the Course Outline at Data Education’s website. The first part of the first module is on the components of a SELECT statement (where I make you think harder about GROUP BY than you probably have before), but then we jump straight into Execution Plans. Some stuff on indexes is in there too, as is simplification and SARGability. Some of this is stuff that you may have heard me present on at conferences, but here you have me for three days straight. I’m sure you can imagine that we revisit these topics throughout the rest of the course as well, and you’d be right. In the second and third modules we look at a bunch of other aspects, including some of the T-SQL constructs that lots of people don’t know, and various other things that can help your T-SQL be, well, more effective. I’ve had quite a lot of people do this course and be itching to get back to work even on the first day. That’s not a comment about the jokes I tell, but because people want to look at the queries they run. LobsterPot Solutions is thrilled to be partnering with Data Education to bring this training to Chicago. Visit their website to register for the course. @rob_farley

Read the article

Search Results

Search found 615 results on 25 pages for 'sean farley'.

Page 2/25 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >