Search Results

Search found 1339 results on 54 pages for 'rob farley'.

Page 2/54 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

Script to establish SSH tunnel and then run another program that uses the tunnel

- by Rob Hills

I am running a GUI app (Gnucash) that connects to a remote Postgres database via a secure shell session. I can use the SSH -L command to tunnel a local port and then separately run Gnucash and this works fine. What I'd like to do is use a single shell script that sets up the tunnel and then calls Gnucash. Is that possible? If so, how do I do it? Currently, I run commands like the following in 2 separate terminal windows: ssh -L 5433:127.0.0.1:19097 [email protected] gnucash postgres://gnucash@localhost:5433/gnucash_db If I simply put both lines in a shell script, the first line drops me into the remote shell and the second line doesn't execute until I exit the remote shell. TIA, Rob Hills

Read the article
SQLBits - Unicode Porn

- by Most Valuable Yak (Rob Volk)

We've just finished up a fantastic event at SQLBits X in London! If you've never been to SQLBits and you can make it to the UK, I highly recommend it. If you didn't attend, here's what you missed. Meanwhile, for those who attended the Lightning Talk sessions and were disappointed that I ran out of time, here's the last part that you would have seen: /* How to Lose Friends and Irritate People...With Unicode! Rob Volk SQLBits X - London - March 31, 2012 */ -- some sexy SQL DECLARE @oohbaby TABLE(i INT NOT NULL UNIQUE, uni_char AS NCHAR(i), hex AS CAST(i AS BINARY(2))) INSERT @oohbaby VALUES(664),(1022),(1023),(1120),(1150),(8857),(11609),(42420),(42427) -- change results font to larger size, some only work in grid font SELECT * FROM @oohbaby SELECT NCHAR(1022) + NCHAR(1023) AS Page3Girl It's probably better that you run this yourself, in the privacy of your own home/office, you know *wink* *wink* *nudge* *nudge* *say no more*

Read the article
Invisible mouse cursor

- by Rob

There have been some similar posts but nothing specific to me. Sometimes i boot my laptop and all is well. Other times I boot up and after the login screen my mouse cursor disappears, I can still use it, its just invisible. I start up fire fox and the cursor is visible but only on the application window.... My sysetem is: samsung R60 Plus, with 4gb ram and a T7500, using the ati Xpress 1250 graphics. This is with Ubuntu 11.10. Does any one know of a work around? Many thanks Rob,

Read the article
delete pointer to 2d array c ++

- by user1848054

i have this pointer to 2d array of Robot class Robot ***rob; and this is here the code for the constructor !! and the program works fine !!! but now i am trying to build a destructor to delete this pointer !! and it keeps on crashing the program !! my question is , how to delete this pointer to 2d array of robots ? RobotsWorld::RobotsWorld(int x , int y) { X=x;Y=y; // returns the limitation of the matrix rob = new Robot**[x]; for(int i = 0; i < x; i++) { rob[i] = new Robot*[y]; for(int j = 0; j < y; j++) { rob[i][j] = NULL; } } }

Read the article
JavaFX, Google Maps, and NetBeans Platform

- by Geertjan

Thanks to a great new article by Rob Terpilowski, and other work and research he describes in that article, it's now trivial to introduce a map component to a NetBeans Platform application. Making use of the GMapsFX library, as described in Rob's article, which provides a JavaFX API for Google Maps, you can very quickly knock this application together. Click to enlarge the image. Here's all the code (from Rob's article): @TopComponent.Description( preferredID = "MapTopComponent", persistenceType = TopComponent.PERSISTENCE_ALWAYS ) @TopComponent.Registration(mode = "editor", openAtStartup = true) @ActionID(category = "Window", id = "org.map.MapTopComponent") @ActionReference(path = "Menu/Window" /*, position = 333 */) @TopComponent.OpenActionRegistration( displayName = "#CTL_MapWindowAction", preferredID = "MapTopComponent" ) @NbBundle.Messages({ "CTL_MapWindowAction=Map", "CTL_MapTopComponent=Map Window", "HINT_MapTopComponent=This is a Map window" }) public class MapWindow extends TopComponent implements MapComponentInitializedListener { protected GoogleMapView mapComponent; protected GoogleMap map; private static final double latitude = 52.3667; private static final double longitude = 4.9000; public MapWindow() { setName(Bundle.CTL_MapTopComponent()); setToolTipText(Bundle.HINT_MapTopComponent()); setLayout(new BorderLayout()); JFXPanel panel = new JFXPanel(); Platform.setImplicitExit(false); Platform.runLater(() -> { mapComponent = new GoogleMapView(); mapComponent.addMapInializedListener(this); BorderPane root = new BorderPane(mapComponent); Scene scene = new Scene(root); panel.setScene(scene); }); add(panel, BorderLayout.CENTER); } @Override public void mapInitialized() { //Once the map has been loaded by the Webview, initialize the map details. LatLong center = new LatLong(latitude, longitude); MapOptions options = new MapOptions(); options.center(center) .mapMarker(true) .zoom(9) .overviewMapControl(false) .panControl(false) .rotateControl(false) .scaleControl(false) .streetViewControl(false) .zoomControl(false) .mapType(MapTypeIdEnum.ROADMAP); map = mapComponent.createMap(options); //Add a couple of markers to the map. MarkerOptions markerOptions = new MarkerOptions(); LatLong markerLatLong = new LatLong(latitude, longitude); markerOptions.position(markerLatLong) .title("My new Marker") .animation(Animation.DROP) .visible(true); Marker myMarker = new Marker(markerOptions); MarkerOptions markerOptions2 = new MarkerOptions(); LatLong markerLatLong2 = new LatLong(latitude, longitude); markerOptions2.position(markerLatLong2) .title("My new Marker") .visible(true); Marker myMarker2 = new Marker(markerOptions2); map.addMarker(myMarker); map.addMarker(myMarker2); //Add an info window to the Map. InfoWindowOptions infoOptions = new InfoWindowOptions(); infoOptions.content("<h2>Center of the Universe</h2>") .position(center); InfoWindow window = new InfoWindow(infoOptions); window.open(map, myMarker); } } Awesome work Rob, will be useful for many developers out there.

Read the article
Nepotism In The SQL Family

- by Rob Farley

There’s a bunch of sayings about nepotism. It’s unpopular, unless you’re the family member who is getting the opportunity. But of course, so much in life (and career) is about who you know. From the perspective of the person who doesn’t get promoted (when the family member is), nepotism is simply unfair; even more so when the promoted one seems less than qualified, or incompetent in some way. We definitely get a bit miffed about that. But let’s also look at it from the other side of the fence – the person who did the promoting. To them, their son/daughter/nephew/whoever is just another candidate, but one in whom they have more faith. They’ve spent longer getting to know that person. They know their weaknesses and their strengths, and have seen them in all kinds of situations. They expect them to stay around in the company longer. And yes, they may have plans for that person to inherit one day. Sure, they have a vested interest, because they’d like their family members to have strong careers, but it’s not just about that – it’s often best for the company as well. I’m not announcing that the next LobsterPot employee is one of my sons (although I wouldn’t be opposed to the idea of getting them involved), but actually, admitting that almost all the LobsterPot employees are SQLFamily members… …which makes this post good for T-SQL Tuesday, this month hosted by Jeffrey Verheul (@DevJef). You see, SQLFamily is the concept that the people in the SQL Server community are close. We have something in common that goes beyond ordinary friendship. We might only see each other a few times a year, at events like the PASS Summit and SQLSaturdays, but the bonds that are formed are strong, going far beyond typical professional relationships. And these are the people that I am prepared to hire. People that I have got to know. I get to know their skill level, how well they explain things, how confident people are in their expertise, and what their values are. Of course there people that I wouldn’t hire, but I’m a lot more comfortable hiring someone that I’ve already developed a feel for. I need to trust the LobsterPot brand to people, and that means they need to have a similar value system to me. They need to have a passion for helping people and doing what they can to make a difference. Above all, they need to have integrity. Therefore, I believe in nepotism. All the people I’ve hired so far are people from the SQL community. I don’t know whether I’ll always be able to hire that way, but I have no qualms admitting that the things I look for in an employee are things that I can recognise best in those that are referred to as SQLFamily. …like Ted Krueger (@onpnt), LobsterPot’s newest employee and the guy who is representing our brand in America. I’m completely proud of this guy. He’s everything I want in an employee. He’s an experienced consultant (even wrote a book on it!), loving husband and father, genuine expert, and incredibly respected by his peers. It’s not favouritism, it’s just choosing someone I’ve been interviewing for years. @rob_farley

Read the article
Analytic functions – they’re not aggregates

- by Rob Farley

SQL 2012 brings us a bunch of new analytic functions, together with enhancements to the OVER clause. People who have known me over the years will remember that I’m a big fan of the OVER clause and the types of things that it brings us when applied to aggregate functions, as well as the ranking functions that it enables. The OVER clause was introduced in SQL Server 2005, and remained frustratingly unchanged until SQL Server 2012. This post is going to look at a particular aspect of the analytic functions though (not the enhancements to the OVER clause). When I give presentations about the analytic functions around Australia as part of the tour of SQL Saturdays (starting in Brisbane this Thursday), and in Chicago next month, I’ll make sure it’s sufficiently well described. But for this post – I’m going to skip that and assume you get it. The analytic functions introduced in SQL 2012 seem to come in pairs – FIRST_VALUE and LAST_VALUE, LAG and LEAD, CUME_DIST and PERCENT_RANK, PERCENTILE_CONT and PERCENTILE_DISC. Perhaps frustratingly, they take slightly different forms as well. The ones I want to look at now are FIRST_VALUE and LAST_VALUE, and PERCENTILE_CONT and PERCENTILE_DISC. The reason I’m pulling this ones out is that they always produce the same result within their partitions (if you’re applying them to the whole partition). Consider the following query: SELECT YEAR(OrderDate), FIRST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), LAST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)), PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)) FROM Sales.SalesOrderHeader ; This is designed to get the TotalDue for the first order of the year, the last order of the year, and also the 95% percentile, using both the continuous and discrete methods (‘discrete’ means it picks the closest one from the values available – ‘continuous’ means it will happily use something between, similar to what you would do for a traditional median of four values). I’m sure you can imagine the results – a different value for each field, but within each year, all the rows the same. Notice that I’m not grouping by the year. Nor am I filtering. This query gives us a result for every row in the SalesOrderHeader table – 31465 in this case (using the original AdventureWorks that dates back to the SQL 2005 days). The RANGE BETWEEN bit in FIRST_VALUE and LAST_VALUE is needed to make sure that we’re considering all the rows available. If we don’t specify that, it assumes we only mean “RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”, which means that LAST_VALUE ends up being the row we’re looking at. At this point you might think about other environments such as Access or Reporting Services, and remember aggregate functions like FIRST. We really should be able to do something like: SELECT YEAR(OrderDate), FIRST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FROM Sales.SalesOrderHeader GROUP BY YEAR(OrderDate) ; But you can’t. You get that age-old error: Msg 8120, Level 16, State 1, Line 5 Column 'Sales.SalesOrderHeader.OrderDate' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. Msg 8120, Level 16, State 1, Line 5 Column 'Sales.SalesOrderHeader.SalesOrderID' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. Hmm. You see, FIRST_VALUE isn’t an aggregate function. None of these analytic functions are. There are too many things involved for SQL to realise that the values produced might be identical within the group. Furthermore, you can’t even surround it in a MAX. Then you get a different error, telling you that you can’t use windowed functions in the context of an aggregate. And so we end up grouping by doing a DISTINCT. SELECT DISTINCT YEAR(OrderDate), FIRST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), LAST_VALUE(TotalDue) OVER (PARTITION BY YEAR(OrderDate) ORDER BY OrderDate, SalesOrderID RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING), PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)), PERCENTILE_DISC(0.95) WITHIN GROUP (ORDER BY TotalDue) OVER (PARTITION BY YEAR(OrderDate)) FROM Sales.SalesOrderHeader ; I’m sorry. It’s just the way it goes. Hopefully it’ll change the future, but for now, it’s what you’ll have to do. If we look in the execution plan, we see that it’s incredibly ugly, and actually works out the results of these analytic functions for all 31465 rows, finally performing the distinct operation to convert it into the four rows we get in the results. You might be able to achieve a better plan using things like TOP, or the kind of calculation that I used in http://sqlblog.com/blogs/rob_farley/archive/2011/08/23/t-sql-thoughts-about-the-95th-percentile.aspx (which is how PERCENTILE_CONT works), but it’s definitely convenient to use these functions, and in time, I’m sure we’ll see good improvements in the way that they are implemented. Oh, and this post should be good for fellow SQL Server MVP Nigel Sammy’s T-SQL Tuesday this month.

Read the article
Handling special characters with FOR XML PATH('')

- by Rob Farley

Because I hate seeing > or & in my results… Since SQL Server 2005, we’ve been able to use FOR XML PATH('') to do string concatenation. I’ve blogged about it before several times. But I don’t think I’ve blogged about the fact that it all goes a bit wrong if you have special characters in the strings you’re concatenating. Generally, I don’t even worry about this. I should, but I don’t, particularly when the solution is so easy. Suppose I want to concatenate the list of user databases...(read more)

Read the article
My new laptop - with a really nice battery option

- by Rob Farley

It was about time I got a new laptop, and so I made a phone-call to Dell to discuss my options. I decided not to get an SSD from them, because I’d rather choose one myself – the sales guy tells me that changing the HD doesn’t void my warranty, so that’s good (incidentally, I’d love to hear people’s recommendations for which SSD to get for my laptop). Unfortunately this machine only has one HD slot, but I figure that I’ll put lots of stuff onto external disks anyway. The machine I got was a Dell Studio XPS 16. It’s red (which suits my company), but also has the Intel® Core™ i7-820QM Processor, which is 4 Cores/8 Threads. Makes for a pretty Task Manager, but nothing like the one I saw at SQLBits last year (at 96 cores), or the one that my good friend James Rowland-Jones writes about here. But the reason for this post is actually something in the software that comes with the machine – you know, the stuff that most people uninstall at the earliest opportunity. I had just reinstalled the operating system, and was going through the utilities to get the drivers up-to-date, when I noticed that one of Dell applications included an option to disable battery charging. So I installed it. And sure enough, I can tell the battery not to charge now. Clearly Dell see it as a temporary option, and one that’s designed for when you’re on a plane. But for me, I most often use my laptop with the power plugged in, which means I don’t need to have my battery continually topping itself up. So I really love this option, but I feel like it could go a little further. I’d like “Not Charging” to be the default option, and let me set it when I want to charge it (which should theoretically make my battery last longer). I also intend to work out how this option works, so that I can script it and put it into my StartUp options (so it can be the Default setting). Actually – if someone has already worked this out and can tell me what it does, then please feel free to let me know. Even better would be an external switch. I had a switch on my old laptop (a Dell Latitude) for WiFi, so that I could turn that off before I turned on the computer (this laptop doesn’t give me that option – no physical switch for flight mode). I guess it just means I’ll get used to leaving the WiFi off by default, and turning it on when I want it – might save myself some battery power that way too. Soon I’ll need to take the plunge and sync my iPhone with the new laptop. I’m a little worried that I might lose something – Apple’s messages about how my stuff will be wiped and replaced with what’s on the PC doesn’t fill me with confidence, as it’s a new PC that doesn’t have stuff on it. But having a new machine is definitely a nice experience, and one that I can recommend. I’m sure when I get around to buying an SSD I’ll feel like it’s shiny and new all over again! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
When is a SQL function not a function?

- by Rob Farley

Should SQL Server even have functions? (Oh yeah – this is a T-SQL Tuesday post, hosted this month by Brad Schulz) Functions serve an important part of programming, in almost any language. A function is a piece of code that is designed to return something, as opposed to a piece of code which isn’t designed to return anything (which is known as a procedure). SQL Server is no different. You can call stored procedures, even from within other stored procedures, and you can call functions and use these in other queries. Stored procedures might query something, and therefore ‘return data’, but a function in SQL is considered to have the type of the thing returned, and can be used accordingly in queries. Consider the internal GETDATE() function. SELECT GETDATE(), SomeDatetimeColumn FROM dbo.SomeTable; There’s no logical difference between the field that is being returned by the function and the field that’s being returned by the table column. Both are the datetime field – if you didn’t have inside knowledge, you wouldn’t necessarily be able to tell which was which. And so as developers, we find ourselves wanting to create functions that return all kinds of things – functions which look up values based on codes, functions which do string manipulation, and so on. But it’s rubbish. Ok, it’s not all rubbish, but it mostly is. And this isn’t even considering the SARGability impact. It’s far more significant than that. (When I say the SARGability aspect, I mean “because you’re unlikely to have an index on the result of some function that’s applied to a column, so try to invert the function and query the column in an unchanged manner”) I’m going to consider the three main types of user-defined functions in SQL Server: Scalar Inline Table-Valued Multi-statement Table-Valued I could also look at user-defined CLR functions, including aggregate functions, but not today. I figure that most people don’t tend to get around to doing CLR functions, and I’m going to focus on the T-SQL-based user-defined functions. Most people split these types of function up into two types. So do I. Except that most people pick them based on ‘scalar or table-valued’. I’d rather go with ‘inline or not’. If it’s not inline, it’s rubbish. It really is. Let’s start by considering the two kinds of table-valued function, and compare them. These functions are going to return the sales for a particular salesperson in a particular year, from the AdventureWorks database. CREATE FUNCTION dbo.FetchSales_inline(@salespersonid int, @orderyear int) RETURNS TABLE AS RETURN ( SELECT e.LoginID as EmployeeLogin, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ) ; GO CREATE FUNCTION dbo.FetchSales_multi(@salespersonid int, @orderyear int) RETURNS @results TABLE ( EmployeeLogin nvarchar(512), OrderDate datetime, SalesOrderID int ) AS BEGIN INSERT @results (EmployeeLogin, OrderDate, SalesOrderID) SELECT e.LoginID, o.OrderDate, o.SalesOrderID FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ; RETURN END ; GO You’ll notice that I’m being nice and responsible with the use of the DATEADD function, so that I have SARGability on the OrderDate filter. Regular readers will be hoping I’ll show what’s going on in the execution plans here. Here I’ve run two SELECT * queries with the “Show Actual Execution Plan” option turned on. Notice that the ‘Query cost’ of the multi-statement version is just 2% of the ‘Batch cost’. But also notice there’s trickery going on. And it’s nothing to do with that extra index that I have on the OrderDate column. Trickery. Look at it – clearly, the first plan is showing us what’s going on inside the function, but the second one isn’t. The second one is blindly running the function, and then scanning the results. There’s a Sequence operator which is calling the TVF operator, and then calling a Table Scan to get the results of that function for the SELECT operator. But surely it still has to do all the work that the first one is doing... To see what’s actually going on, let’s look at the Estimated plan. Now, we see the same plans (almost) that we saw in the Actuals, but we have an extra one – the one that was used for the TVF. Here’s where we see the inner workings of it. You’ll probably recognise the right-hand side of the TVF’s plan as looking very similar to the first plan – but it’s now being called by a stack of other operators, including an INSERT statement to be able to populate the table variable that the multi-statement TVF requires. And the cost of the TVF is 57% of the batch! But it gets worse. Let’s consider what happens if we don’t need all the columns. We’ll leave out the EmployeeLogin column. Here, we see that the inline function call has been simplified down. It doesn’t need the Employee table. The join is redundant and has been eliminated from the plan, making it even cheaper. But the multi-statement plan runs the whole thing as before, only removing the extra column when the Table Scan is performed. A multi-statement function is a lot more powerful than an inline one. An inline function can only be the result of a single sub-query. It’s essentially the same as a parameterised view, because views demonstrate this same behaviour of extracting the definition of the view and using it in the outer query. A multi-statement function is clearly more powerful because it can contain far more complex logic. But a multi-statement function isn’t really a function at all. It’s a stored procedure. It’s wrapped up like a function, but behaves like a stored procedure. It would be completely unreasonable to expect that a stored procedure could be simplified down to recognise that not all the columns might be needed, but yet this is part of the pain associated with this procedural function situation. The biggest clue that a multi-statement function is more like a stored procedure than a function is the “BEGIN” and “END” statements that surround the code. If you try to create a multi-statement function without these statements, you’ll get an error – they are very much required. When I used to present on this kind of thing, I even used to call it “The Dangers of BEGIN and END”, and yes, I’ve written about this type of thing before in a similarly-named post over at my old blog. Now how about scalar functions... Suppose we wanted a scalar function to return the count of these. CREATE FUNCTION dbo.FetchSales_scalar(@salespersonid int, @orderyear int) RETURNS int AS BEGIN RETURN ( SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); END ; GO Notice the evil words? They’re required. Try to remove them, you just get an error. That’s right – any scalar function is procedural, despite the fact that you wrap up a sub-query inside that RETURN statement. It’s as ugly as anything. Hopefully this will change in future versions. Let’s have a look at how this is reflected in an execution plan. Here’s a query, its Actual plan, and its Estimated plan: SELECT e.LoginID, y.year, dbo.FetchSales_scalar(p.SalesPersonID, y.year) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; We see here that the cost of the scalar function is about twice that of the outer query. Nicely, the query optimizer has worked out that it doesn’t need the Employee table, but that’s a bit of a red herring here. There’s actually something way more significant going on. If I look at the properties of that UDF operator, it tells me that the Estimated Subtree Cost is 0.337999. If I just run the query SELECT dbo.FetchSales_scalar(281,2003); we see that the UDF cost is still unchanged. You see, this 0.0337999 is the cost of running the scalar function ONCE. But when we ran that query with the CROSS JOIN in it, we returned quite a few rows. 68 in fact. Could’ve been a lot more, if we’d had more salespeople or more years. And so we come to the biggest problem. This procedure (I don’t want to call it a function) is getting called 68 times – each one between twice as expensive as the outer query. And because it’s calling it in a separate context, there is even more overhead that I haven’t considered here. The cheek of it, to say that the Compute Scalar operator here costs 0%! I know a number of IT projects that could’ve used that kind of costing method, but that’s another story that I’m not going to go into here. Let’s look at a better way. Suppose our scalar function had been implemented as an inline one. Then it could have been expanded out like a sub-query. It could’ve run something like this: SELECT e.LoginID, y.year, (SELECT COUNT(*) FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = p.SalesPersonID AND o.OrderDate >= DATEADD(year,y.year-2000,'20000101') AND o.OrderDate < DATEADD(year,y.year-2000+1,'20000101') ) AS NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID; Don’t worry too much about the Scan of the SalesOrderHeader underneath a Nested Loop. If you remember from plenty of other posts on the matter, execution plans don’t push the data through. That Scan only runs once. The Index Spool sucks the data out of it and populates a structure that is used to feed the Stream Aggregate. The Index Spool operator gets called 68 times, but the Scan only once (the Number of Executions property demonstrates this). Here, the Query Optimizer has a full picture of what’s being asked, and can make the appropriate decision about how it accesses the data. It can simplify it down properly. To get this kind of behaviour from a function, we need it to be inline. But without inline scalar functions, we need to make our function be table-valued. Luckily, that’s ok. CREATE FUNCTION dbo.FetchSales_inline2(@salespersonid int, @orderyear int) RETURNS table AS RETURN (SELECT COUNT(*) as NumSales FROM Sales.SalesOrderHeader AS o LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = o.SalesPersonID WHERE o.SalesPersonID = @salespersonid AND o.OrderDate >= DATEADD(year,@orderyear-2000,'20000101') AND o.OrderDate < DATEADD(year,@orderyear-2000+1,'20000101') ); GO But we can’t use this as a scalar. Instead, we need to use it with the APPLY operator. SELECT e.LoginID, y.year, n.NumSales FROM (VALUES (2001),(2002),(2003),(2004)) AS y (year) CROSS JOIN Sales.SalesPerson AS p LEFT JOIN HumanResources.Employee AS e ON e.EmployeeID = p.SalesPersonID OUTER APPLY dbo.FetchSales_inline2(p.SalesPersonID, y.year) AS n; And now, we get the plan that we want for this query. All we’ve done is tell the function that it’s returning a table instead of a single value, and removed the BEGIN and END statements. We’ve had to name the column being returned, but what we’ve gained is an actual inline simplifiable function. And if we wanted it to return multiple columns, it could do that too. I really consider this function to be superior to the scalar function in every way. It does need to be handled differently in the outer query, but in many ways it’s a more elegant method there too. The function calls can be put amongst the FROM clause, where they can then be used in the WHERE or GROUP BY clauses without fear of calling the function multiple times (another horrible side effect of functions). So please. If you see BEGIN and END in a function, remember it’s not really a function, it’s a procedure. And then fix it. @rob_farley

Read the article
SQLBits VI – The sixth sets

- by Rob Farley

My involvement stopped with the tagline, but SQLBits VI is on tomorrow. The theme of the event is Performance Tuning, which has nothing to do with Bruce Willis or dead people – unless Bruce Willis has just become a database expert and been shot for doing a dropping an index (some would say that’s a crime worthy of the death penalty). It’s a shame my involvement hasn’t been more, because it’s such a terrific event, and it would’ve been good to have been there for a second time. It’s a long way to...(read more)

Read the article
SQL Spatial: Getting “nearest” calculations working properly

- by Rob Farley

If you’ve ever done spatial work with SQL Server, I hope you’ve come across the ‘nearest’ problem. You have five thousand stores around the world, and you want to identify the one that’s closest to a particular place. Maybe you want the store closest to the LobsterPot office in Adelaide, at -34.925806, 138.605073. Or our new US office, at 42.524929, -87.858244. Or maybe both! You know how to do this. You don’t want to use an aggregate MIN or MAX, because you want the whole row, telling you which store it is. You want to use TOP, and if you want to find the closest store for multiple locations, you use APPLY. Let’s do this (but I’m going to use addresses in AdventureWorks2012, as I don’t have a list of stores). Oh, and before I do, let’s make sure we have a spatial index in place. I’m going to use the default options. CREATE SPATIAL INDEX spin_Address ON Person.Address(SpatialLocation); And my actual query: WITH MyLocations AS (SELECT * FROM (VALUES ('LobsterPot Adelaide', geography::Point(-34.925806, 138.605073, 4326)), ('LobsterPot USA', geography::Point(42.524929, -87.858244, 4326)) ) t (Name, Geo)) SELECT l.Name, a.AddressLine1, a.City, s.Name AS [State], c.Name AS Country FROM MyLocations AS l CROSS APPLY ( SELECT TOP (1) * FROM Person.Address AS ad ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a JOIN Person.StateProvince AS s ON s.StateProvinceID = a.StateProvinceID JOIN Person.CountryRegion AS c ON c.CountryRegionCode = s.CountryRegionCode ; Great! This is definitely working. I know both those City locations, even if the AddressLine1s don’t quite ring a bell. I’m sure I’ll be able to find them next time I’m in the area. But of course what I’m concerned about from a querying perspective is what’s happened behind the scenes – the execution plan. This isn’t pretty. It’s not using my index. It’s sucking every row out of the Address table TWICE (which sucks), and then it’s sorting them by the distance to find the smallest one. It’s not pretty, and it takes a while. Mind you, I do like the fact that it saw an indexed view it could use for the State and Country details – that’s pretty neat. But yeah – users of my nifty website aren’t going to like how long that query takes. The frustrating thing is that I know that I can use the index to find locations that are within a particular distance of my locations quite easily, and Microsoft recommends this for solving the ‘nearest’ problem, as described at http://msdn.microsoft.com/en-au/library/ff929109.aspx. Now, in the first example on this page, it says that the query there will use the spatial index. But when I run it on my machine, it does nothing of the sort. I’m not particularly impressed. But what we see here is that parallelism has kicked in. In my scenario, it’s split the data up into 4 threads, but it’s still slow, and not using my index. It’s disappointing. But I can persuade it with hints! If I tell it to FORCESEEK, or use my index, or even turn off the parallelism with MAXDOP 1, then I get the index being used, and it’s a thing of beauty! Part of the plan is here: It’s massive, and it’s ugly, and it uses a TVF… but it’s quick. The way it works is to hook into the GeodeticTessellation function, which is essentially finds where the point is, and works out through the spatial index cells that surround it. This then provides a framework to be able to see into the spatial index for the items we want. You can read more about it at http://msdn.microsoft.com/en-us/library/bb895265.aspx#tessellation – including a bunch of pretty diagrams. One of those times when we have a much more complex-looking plan, but just because of the good that’s going on. This tessellation stuff was introduced in SQL Server 2012. But my query isn’t using it. When I try to use the FORCESEEK hint on the Person.Address table, I get the friendly error: Msg 8622, Level 16, State 1, Line 1 Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN. And I’m almost tempted to just give up and move back to the old method of checking increasingly large circles around my location. After all, I can even leverage multiple OUTER APPLY clauses just like I did in my recent Lookup post. WITH MyLocations AS (SELECT * FROM (VALUES ('LobsterPot Adelaide', geography::Point(-34.925806, 138.605073, 4326)), ('LobsterPot USA', geography::Point(42.524929, -87.858244, 4326)) ) t (Name, Geo)) SELECT l.Name, COALESCE(a1.AddressLine1,a2.AddressLine1,a3.AddressLine1), COALESCE(a1.City,a2.City,a3.City), s.Name AS [State], c.Name AS Country FROM MyLocations AS l OUTER APPLY ( SELECT TOP (1) * FROM Person.Address AS ad WHERE l.Geo.STDistance(ad.SpatialLocation) < 1000 ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a1 OUTER APPLY ( SELECT TOP (1) * FROM Person.Address AS ad WHERE l.Geo.STDistance(ad.SpatialLocation) < 5000 AND a1.AddressID IS NULL ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a2 OUTER APPLY ( SELECT TOP (1) * FROM Person.Address AS ad WHERE l.Geo.STDistance(ad.SpatialLocation) < 20000 AND a2.AddressID IS NULL ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a3 JOIN Person.StateProvince AS s ON s.StateProvinceID = COALESCE(a1.StateProvinceID,a2.StateProvinceID,a3.StateProvinceID) JOIN Person.CountryRegion AS c ON c.CountryRegionCode = s.CountryRegionCode ; But this isn’t friendly-looking at all, and I’d use the method recommended by Isaac Kunen, who uses a table of numbers for the expanding circles. It feels old-school though, when I’m dealing with SQL 2012 (and later) versions. So why isn’t my query doing what it’s supposed to? Remember the query... WITH MyLocations AS (SELECT * FROM (VALUES ('LobsterPot Adelaide', geography::Point(-34.925806, 138.605073, 4326)), ('LobsterPot USA', geography::Point(42.524929, -87.858244, 4326)) ) t (Name, Geo)) SELECT l.Name, a.AddressLine1, a.City, s.Name AS [State], c.Name AS Country FROM MyLocations AS l CROSS APPLY ( SELECT TOP (1) * FROM Person.Address AS ad ORDER BY l.Geo.STDistance(ad.SpatialLocation) ) AS a JOIN Person.StateProvince AS s ON s.StateProvinceID = a.StateProvinceID JOIN Person.CountryRegion AS c ON c.CountryRegionCode = s.CountryRegionCode ; Well, I just wasn’t reading http://msdn.microsoft.com/en-us/library/ff929109.aspx properly. The following requirements must be met for a Nearest Neighbor query to use a spatial index: A spatial index must be present on one of the spatial columns and the STDistance() method must use that column in the WHERE and ORDER BY clauses. The TOP clause cannot contain a PERCENT statement. The WHERE clause must contain a STDistance() method. If there are multiple predicates in the WHERE clause then the predicate containing STDistance() method must be connected by an AND conjunction to the other predicates. The STDistance() method cannot be in an optional part of the WHERE clause. The first expression in the ORDER BY clause must use the STDistance() method. Sort order for the first STDistance() expression in the ORDER BY clause must be ASC. All the rows for which STDistance returns NULL must be filtered out. Let’s start from the top. 1. Needs a spatial index on one of the columns that’s in the STDistance call. Yup, got the index. 2. No ‘PERCENT’. Yeah, I don’t have that. 3. The WHERE clause needs to use STDistance(). Ok, but I’m not filtering, so that should be fine. 4. Yeah, I don’t have multiple predicates. 5. The first expression in the ORDER BY is my distance, that’s fine. 6. Sort order is ASC, because otherwise we’d be starting with the ones that are furthest away, and that’s tricky. 7. All the rows for which STDistance returns NULL must be filtered out. But I don’t have any NULL values, so that shouldn’t affect me either. ...but something’s wrong. I do actually need to satisfy #3. And I do need to make sure #7 is being handled properly, because there are some situations (eg, differing SRIDs) where STDistance can return NULL. It says so at http://msdn.microsoft.com/en-us/library/bb933808.aspx – “STDistance() always returns null if the spatial reference IDs (SRIDs) of the geography instances do not match.” So if I simply make sure that I’m filtering out the rows that return NULL… …then it’s blindingly fast, I get the right results, and I’ve got the complex-but-brilliant plan that I wanted. It just wasn’t overly intuitive, despite being documented. @rob_farley

Read the article
Re-running SSRS subscription jobs that have failed

- by Rob Farley

Sometimes, an SSRS subscription for some reason. It can be annoying, particularly as the appropriate response can be hard to see immediately. There may be a long list of jobs that failed one morning if a Mail Server is down, and trying to work out a way of running each one again can be painful. It’s almost an argument for using shared schedules a lot, but the problem with this is that there are bound to be other things on that shared schedule that you wouldn’t want to be re-run. Luckily, there’s a table in the ReportServer database called dbo.Subscriptions, which is where LastStatus of the Subscription is stored. Having found the subscriptions that you’re interested in, finding the SQL Agent Jobs that correspond to them can be frustrating. Luckily, the jobstep command contains the subscriptionid, so it’s possible to look them up based on that. And of course, once the jobs have been found, they can be executed easily enough. In this example, I produce a list of the commands to run the jobs. I can copy the results out and execute them. select 'exec sp_start_job @job_name = ''' + cast(j.name as varchar(40)) + '''' from msdb.dbo.sysjobs j join msdb.dbo.sysjobsteps js on js.job_id = j.job_id join [ReportServer].[dbo].[Subscriptions] s on js.command like '%' + cast(s.subscriptionid as varchar(40)) + '%' where s.LastStatus like 'Failure sending mail%'; Another option could be to return the job step commands directly (js.command in this query), but my preference is to run the job that contains the step. Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
Stored Procedures with SSRS? Hmm… not so much

- by Rob Farley

Little Bobby Tables’ mother says you should always sanitise your data input. Except that I think she’s wrong. The SQL Injection aspect is for another post, where I’ll show you why I think SQL Injection is the same kind of attack as many other attacks, such as the old buffer overflow, but here I want to have a bit of a whinge about the way that some people sanitise data input, and even have a whinge about people who insist on using stored procedures for SSRS reports. Let me say that again, in case you missed it the first time: I want to have a whinge about people who insist on using stored procedures for SSRS reports. Let’s look at the data input sanitisation aspect – except that I’m going to call it ‘parameter validation’. I’m talking about code that looks like this: create procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin /* First check that @eomdate is a valid date */ if isdate(@eomdate) != 1 begin select 'Please enter a valid date' as ErrorMessage; return; end /* Then check that time has passed since @eomdate */ if datediff(day,@eomdate,sysdatetime()) < 5 begin select 'Sorry - EOM is not complete yet' as ErrorMessage; return; end /* If those checks have succeeded, return the data */ select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Notice that the code checks that a date has been entered. Seriously??!! This must only be to check for NULL values being passed in, because anything else would have to be a valid datetime to avoid an error. The other check is maybe fair enough, but I still don’t like it. The two problems I have with this stored procedure are the result sets and the small fact that the stored procedure even exists in the first place. But let’s consider the first one of these problems for starters. I’ll get to the second one in a moment. If you read Jes Borland (@grrl_geek)’s recent post about returning multiple result sets in Reporting Services, you’ll be aware that Reporting Services doesn’t support multiple results sets from a single query. And when it says ‘single query’, it includes ‘stored procedure call’. It’ll only handle the first result set that comes back. But that’s okay – we have RETURN statements, so our stored procedure will only ever return a single result set. Sometimes that result set might contain a single field called ErrorMessage, but it’s still only one result set. Except that it’s not okay, because Reporting Services needs to know what fields to expect. Your report needs to hook into your fields, so SSRS needs to have a way to get that information. For stored procs, it uses an option called FMTONLY. When Reporting Services tries to figure out what fields are going to be returned by a query (or stored procedure call), it doesn’t want to have to run the whole thing. That could take ages. (Maybe it’s seen some of the stored procedures I’ve had to deal with over the years!) So it turns on FMTONLY before it makes the call (and turns it off again afterwards). FMTONLY is designed to be able to figure out the shape of the output, without actually running the contents. It’s very useful, you might think. set fmtonly on exec dbo.GetMonthSummaryPerSalesPerson '20030401'; set fmtonly off Without the FMTONLY lines, this stored procedure returns a result set that has three columns and fourteen rows. But with FMTONLY turned on, those rows don’t come back. But what I do get back hurts Reporting Services. It doesn’t run the stored procedure at all. It just looks for anything that could be returned and pushes out a result set in that shape. Despite the fact that I’ve made sure that the logic will only ever return a single result set, the FMTONLY option kills me by returning three of them. It would have been much better to push these checks down into the query itself. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID order by SalesPersonID; end Now if we run it with FMTONLY turned on, we get the single result set back. But let’s consider the execution plan when we pass in an invalid date. First let’s look at one that returns data. I’ve got a semi-useful index in place on OrderDate, which includes the SalesPersonID and TotalDue fields. It does the job, despite a hefty Sort operation. …compared to one that uses a future date: You might notice that the estimated costs are similar – the Index Seek is still 28%, the Sort is still 71%. But the size of that arrow coming out of the Index Seek is a whole bunch smaller. The coolest thing here is what’s going on with that Index Seek. Let’s look at some of the properties of it. Glance down it with me… Estimated CPU cost of 0.0005728, 387 estimated rows, estimated subtree cost of 0.0044385, ForceSeek false, Number of Executions 0. That’s right – it doesn’t run. So much for reading plans right-to-left... The key is the Filter on the left of it. It has a Startup Expression Predicate in it, which means that it doesn’t call anything further down the plan (to the right) if the predicate evaluates to false. Using this method, we can make sure that our stored procedure contains a single query, and therefore avoid any problems with multiple result sets. If we wanted, we could always use UNION ALL to make sure that we can return an appropriate error message. alter procedure dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) as begin select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, /*Placeholder: */ '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID /* Now include the error messages */ union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 order by SalesPersonID; end But still I don’t like it, because it’s now a stored procedure with a single query. And I don’t like stored procedures that should be functions. That’s right – I think this should be a function, and SSRS should call the function. And I apologise to those of you who are now planning a bonfire for me. Guy Fawkes’ night has already passed this year, so I think you miss out. (And I’m not going to remind you about when the PASS Summit is in 2012.) create function dbo.GetMonthSummaryPerSalesPerson(@eomdate datetime) returns table as return ( select SalesPersonID, count(*) as NumSales, sum(TotalDue) as TotalSales, '' as ErrorMessage from Sales.SalesOrderHeader where /* Make sure that @eomdate is valid */ isdate(@eomdate) = 1 /* And that it's sufficiently past */ and datediff(day,@eomdate,sysdatetime()) >= 5 /* And now use it in the filter as appropriate */ and OrderDate >= dateadd(month,-1,@eomdate) and OrderDate < @eomdate group by SalesPersonID union all select 0, 0, 0, 'Please enter a valid date' as ErrorMessage where isdate(@eomdate) != 1 union all select 0, 0, 0, 'Sorry - EOM is not complete yet' as ErrorMessage where datediff(day,@eomdate,sysdatetime()) < 5 ); We’ve had to lose the ORDER BY – but that’s fine, as that’s a client thing anyway. We can have our reports leverage this stored query still, but we’re recognising that it’s a query, not a procedure. A procedure is designed to DO stuff, not just return data. We even get entries in sys.columns that confirm what the shape of the result set actually is, which makes sense, because a table-valued function is the right mechanism to return data. And we get so much more flexibility with this. If you haven’t seen the simplification stuff that I’ve preached on before, jump over to http://bit.ly/SimpleRob and watch the video of when I broke a microphone and nearly fell off the stage in Wales. You’ll see the impact of being able to have a simplifiable query. You can also read the procedural functions post I wrote recently, if you didn’t follow the link from a few paragraphs ago. So if we want the list of SalesPeople that made any kind of sales in a given month, we can do something like: select SalesPersonID from dbo.GetMonthSummaryPerSalesPerson(@eomonth) order by SalesPersonID; This doesn’t need to look up the TotalDue field, which makes a simpler plan. select * from dbo.GetMonthSummaryPerSalesPerson(@eomonth) where SalesPersonID is not null order by SalesPersonID; This one can avoid having to do the work on the rows that don’t have a SalesPersonID value, pushing the predicate into the Index Seek rather than filtering the results that come back to the report. If we had joins involved, we might see some of those being simplified out. We also get the ability to include query hints in individual reports. We shift from having a single-use stored procedure to having a reusable stored query – and isn’t that one of the main points of modularisation? Stored procedures in Reporting Services are just a bit limited for my liking. They’re useful in plenty of ways, but if you insist on using stored procedures all the time rather that queries that use functions – that’s rubbish. @rob_farley

Read the article
How many people will be with you during 24HOP?

- by Rob Farley

In less than a week, SQLPASS hosts another 24 Hours of PASS event, this time with an array of 24 female speakers (in honour of this month being Women’s History Month). Interestingly, the committee has had a few people ask if there are rules about how the event can be viewed, such as “How many people from any one organisation can watch it?” or “Does it matter if a few people are crowded around the same screen?” From a licensing and marketing perspective, there is value in knowing how many people are watching the event, but there are no restrictions about how the thing is viewed. In fact – if you’re planning to watch any of these events, I want to suggest an idea: Book a meeting room in your office with a projector, and watch 24HOP in there. If you’re planning to have it streaming in the background while you work, obviously this makes life a bit trickier. But if you’re planning to treat it as a training event (a 2-day conference if you like) and block out a bit of time for it (as well you should – there’s going to be some great stuff in there), then why not do it in a way that makes it so that other people can see that you’re watching it, and potentially join you. When an event like this runs, we can see how many different ‘people’ are attending each LiveMeeting session. What we can’t tell is how many actual people there are represented. Jessica Moss spoke to the Adelaide SQL Server User Group a few weeks ago via LiveMeeting, and LiveMeeting told us there were less than a dozen people attending. Really there were at least three times that number, because all the people in the room with me weren’t included. I’d love to imagine that every LiveMeeting attendee represented a crowd in a room, watching a shared screen. So there’s my challenge – don’t let your LiveMeeting session represent just you. Find a way of involving other people. At the very least, you’ll be able to discuss it with them afterwards. Now stick a comment on this post to let me know how many people are going to be joining you. :) If you’re not registered for the event yet, get yourself over to the SQLPASS site and make it happen.

Read the article
Slide-decks from recent Adelaide SQL Server UG meetings

- by Rob Farley

The UK has been well represented this summer at the Adelaide SQL Server User Group, with presentations from Chris Testa-O’Neill (isn’t that the right link? Maybe try this one) and Martin Cairney. The slides are available here and here. I thought I’d particularly mention Martin’s, and how it’s relevant to this month’s T-SQL Tuesday. Martin spoke about Policy-Based Management and the Enterprise Policy Management Framework – something which is remarkably under-used, and yet which can really impact your ability to look after environments. If you have policies set up, then you can easily test each of your SQL instances to see if they are still satisfying a set of policies as defined. Automation (the topic of this month’s T-SQL Tuesday) should mean that your life is made easier, thereby enabling to you to do more. It shouldn’t remove the human element, but should remove (most of) the human errors. People still need to manage the situation, and work out what needs to be done, etc. We haven’t reached a point where computers can replace people, but they are very good at replace the mundaneness and monotony of our jobs. They’ve made our lives more interesting (although many would rightly argue that they have also made our lives more complex) by letting us focus on the stuff that changes. Martin named his talk Put Your Feet Up, which nicely expresses the fact that managing systems shouldn’t be about running around checking things all the time. It must be about having systems in place which tell you when things aren’t going well. It’s never quite as simple as being able to actually put your feet up, but certainly no system should require constant attention. It’s definitely a policy we at LobsterPot adhere to, whether it’s an alert to let us know that an ETL package has run successfully, or a script that generates some code for a report. If things can be automated, it reduces the chance of error, reduces the repetitive nature of work, and in general, keeps both consultants and clients much happier.

Read the article
Visualising data a different way with Pivot collections

- by Rob Farley

Roger’s been doing a great job extending PivotViewer recently, and you can find the list of LobsterPot pivots at http://pivot.lobsterpot.com.au Many months back, the TED Talk that Gary Flake did about Pivot caught my imagination, and I did some research into it. At the time, most of what we did with Pivot was geared towards what we could do for clients, including making Pivot collections based on students at a school, and using it to browse PDF invoices by their various properties. We had actual commercial work based on Pivot collections back then, and it was all kinds of fun. Later, we made some collections for events that were happening, and even got featured in the TechEd Australia keynote. But I’m getting ahead of myself... let me explain the concept. A Pivot collection is an XML file (with .cxml extension) which lists Items, each linking to an image that’s stored in a Deep Zoom format (this means that it contains tiles like Bing Maps, so that the browser can request only the ones of interest according to the zoom level). This collection can be shown in a Silverlight application that uses the PivotViewer control, or in the Pivot Browser that’s available from getpivot.com. Filtering and sorting the items according to their facets (attributes, such as size, age, category, etc), the PivotViewer rearranges the way that these are shown in a very dynamic way. To quote Gary Flake, this lets us “see patterns which are otherwise hidden”. This browsing mechanism is very suited to a number of different methods, because it’s just that – browsing. It’s not searching, it’s more akin to window-shopping than doing an internet search. When we decided to put something together for the conferences such as TechEd Australia 2010 and the PASS Summit 2010, we did some screen-scraping to provide a different view of data that was already available online. Nick Hodge and Michael Kordahi from Microsoft liked the idea a lot, and after a bit of tweaking, we produced one that Michael used in the TechEd Australia keynote to show the variety of talks on offer. It’s interesting to see a pattern in this data: The Office track has the most sessions, but if the Interactive Sessions and Instructor-Led Labs are removed, it drops down to only the sixth most popular track, with Cloud Computing taking over. This is something which just isn’t obvious when you look an ordinary search tool. You get a much better feel for the data when moving around it like this. The more observant amongst you will have noticed some difference in the collection that Michael is demonstrating in the picture above with the screenshots I’ve shown. That’s because it’s been extended some more. At the SQLBits conference in the UK this year, I had some interesting discussions with the guys from Xpert360, particularly Phil Carter, who I’d met in 2009 at an earlier SQLBits conference. They had got around to producing a Pivot collection based on the SQLBits data, which we had been planning to do but ran out of time. We discussed some of ways that Pivot could be used, including the ways that my old friend Howard Dierking had extended it for the MSDN Magazine. I’m not suggesting I influenced Xpert360 at all, but they certainly inspired us with some of their posts on the matter So with LobsterPot guys David Gardiner and Roger Noble both having dabbled in Pivot collections (and Dave doing some for clients), I set Roger to work on extending it some more. He’s used various events and so on to be able to make an environment that allows us to do quick deployment of new collections, as well as showing the data in a grid view which behaves as if it were simply a third view of the data (the other two being the array of images and the ‘histogram’ view). I see PivotViewer as being a significant step in data visualisation – so much so that I feature it when I deliver talks on Spatial Data Visualisation methods. Any time when there is information that can be conveyed through an image, you have to ask yourself how best to show that image, and whether that image is the focal point. For Spatial data, the image is most often a map, and the map becomes the central mode for navigation. I show Pivot with postcode areas, since I can browse the postcodes based on their data, and many of the images are recognisable (to locals of South Australia). Naturally, the images could link through to the map itself, and so on, but generally people think of Spatial data in terms of navigating a map, which doesn’t always gel with the information you’re trying to extract. Roger’s even looking into ways to hook PivotViewer into the Bing Maps API, in a similar way to the Deep Earth project, displaying different levels of map detail according to how ‘zoomed in’ the images are. Some of the work that Dave did with one of the schools was generating the Deep Zoom tiles “on the fly”, based on images stored in a database, and Roger has produced a collection which uses images from flickr, that lets you move from one search term to another. Pulling the images down from flickr.com isn’t particularly ideal from a performance aspect, and flickr doesn’t store images in a small-enough format to really lend itself to this use, but you might agree that it’s an interesting concept which compares nicely to using Maps. I’m looking forward to future versions of the PivotViewer control, and hope they provide many more events that can be used, and even more hooks into it. Naturally, LobsterPot could help provide your business with a PivotViewer experience, but you can probably do a lot of it yourself too. There’s a thorough guide at getpivot.com, which is how we got into it. For some examples of what we’ve done, have a look at http://pivot.lobsterpot.com.au. I’d like to see PivotViewer really catch on a data visualisation tool.

Read the article
Slide-decks from recent Adelaide SQL Server UG meetings

- by Rob Farley

The UK has been well represented this summer at the Adelaide SQL Server User Group, with presentations from Chris Testa-O’Neill (isn’t that the right link? Maybe try this one) and Martin Cairney. The slides are available here and here. I thought I’d particularly mention Martin’s, and how it’s relevant to this month’s T-SQL Tuesday. Martin spoke about Policy-Based Management and the Enterprise Policy Management Framework – something which is remarkably under-used, and yet which can really impact your ability to look after environments. If you have policies set up, then you can easily test each of your SQL instances to see if they are still satisfying a set of policies as defined. Automation (the topic of this month’s T-SQL Tuesday) should mean that your life is made easier, thereby enabling to you to do more. It shouldn’t remove the human element, but should remove (most of) the human errors. People still need to manage the situation, and work out what needs to be done, etc. We haven’t reached a point where computers can replace people, but they are very good at replace the mundaneness and monotony of our jobs. They’ve made our lives more interesting (although many would rightly argue that they have also made our lives more complex) by letting us focus on the stuff that changes. Martin named his talk Put Your Feet Up, which nicely expresses the fact that managing systems shouldn’t be about running around checking things all the time. It must be about having systems in place which tell you when things aren’t going well. It’s never quite as simple as being able to actually put your feet up, but certainly no system should require constant attention. It’s definitely a policy we at LobsterPot adhere to, whether it’s an alert to let us know that an ETL package has run successfully, or a script that generates some code for a report. If things can be automated, it reduces the chance of error, reduces the repetitive nature of work, and in general, keeps both consultants and clients much happier.

Read the article
Xorg does not see my monitor EDID

- by sean farley

Below is the output from my Xorg.0. X.Org X Server 1.11.3 Release Date: 2011-12-16 [ 22.311] X Protocol Version 11, Revision 0 [ 22.311] Build Operating System: Linux 2.6.42-23-generic x86_64 Ubuntu [ 22.311] Current Operating System: Linux sean-P55-USB3 3.2.0-34-generic #53-Ubuntu SMP Thu Nov 15 10:48:16 UTC 2012 x86_64 [ 22.311] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-34-generic root=UUID=0a34603e-aee9-44d1-8982-a5a5a38c3e4d ro quiet splash [ 22.311] Build Date: 29 August 2012 12:12:33AM [ 22.311] xorg-server 2:1.11.4-0ubuntu10.8 (For technical support please see http://www.ubuntu.com/support) [ 22.311] Current version of pixman: 0.24.4 [ 22.311] Before reporting problems, check http://wiki.x.org to make sure that you have the latest version. [ 22.311] Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. [ 22.311] (==) Log file: "/var/log/Xorg.0.log", Time: Sat Nov 17 13:20:45 2012 [ 22.311] (==) Using config file: "/etc/X11/xorg.conf" [ 22.311] (==) Using system config directory "/usr/share/X11/xorg.conf.d" [ 22.311] (==) No Layout section. Using the first Screen section. [ 22.311] (==) No screen section available. Using defaults. [ 22.311] (**) |-->Screen "Default Screen Section" (0) [ 22.311] (**) | |-->Monitor "<default monitor>" [ 22.311] (==) No device specified for screen "Default Screen Section". Using the first device section listed. [ 22.311] (**) | |-->Device "Default Device" [ 22.311] (==) No monitor specified for screen "Default Screen Section". Using a default monitor configuration. [ 22.311] (==) Automatically adding devices I have searched all over, and followed lots of dead ends and unanswered questions on this issue. I need to get this monitor recognised so I can use the native resolution of 1600x1200. The Nvidia driver in Windows has no problem with this. The monitor is an old Iiyama HM204DT A. Is there a way of configuring Xorg manually to get these working? I have tried xrandr but this will not work. Output:- sean@sean-P55-USB3:~$ xrandr Screen 0: minimum 8 x 8, current 1152 x 864, maximum 16384 x 16384 DVI-I-0 connected 1152x864+0+0 (normal left inverted right x axis y axis) 0mm x 0mm 1024x768 60.0 + 1360x768 60.0 59.8 1152x864 60.0* 800x600 72.2 60.3 56.2 680x384 119.9 119.6 640x480 59.9 512x384 120.0 400x300 144.4 320x240 120.1 DVI-I-1 disconnected (normal left inverted right x axis y axis) DVI-I-2 disconnected (normal left inverted right x axis y axis) HDMI-0 disconnected (normal left inverted right x axis y axis) DVI-I-3 disconnected (normal left inverted right x axis y axis) Tried Nvidia Xorg.config: sean@sean-P55-USB3:~$ sudo nvidia-xconfig [sudo] password for sean: Using X configuration file: "/etc/X11/xorg.conf". VALIDATION ERROR: Data incomplete in file /etc/X11/xorg.conf. Device section "Default Device" must have a Driver line. Backed up file '/etc/X11/xorg.conf' as '/etc/X11/xorg.conf.nvidia-xconfig-original' Backed up file '/etc/X11/xorg.conf' as '/etc/X11/xorg.conf.backup' New X configuration file written to '/etc/X11/xorg.conf' How do I insert a driver line? This is a bit of a pain as I want to use my Vectorworks cad program in a WinXP Vbox at 1600x1200 but all virtual drives are restricted to the host screen resolution. Do i need to manually create EIDI info in Xorg? I am slightly confused about how Xorg and Nvidia relate Please help

Read the article
Plan Operator Tuesday round-up

- by Rob Farley

Eighteen posts for T-SQL Tuesday #43 this month, discussing Plan Operators. I put them together and made the following clickable plan. It’s 1000px wide, so I hope you have a monitor wide enough. Let me explain this plan for you (people’s names are the links to the articles on their blogs – the same links as in the plan above). It was clearly a SELECT statement. Wayne Sheffield (@dbawayne) wrote about that, so we start with a SELECT physical operator, leveraging the logical operator Wayne Sheffield. The SELECT operator calls the Paul White operator, discussed by Jason Brimhall (@sqlrnnr) in his post. The Paul White operator is quite remarkable, and can consume three streams of data. Let’s look at those streams. The first pulls data from a Table Scan – Boris Hristov (@borishristov)’s post – using parallel threads (Bradley Ball – @sqlballs) that pull the data eagerly through a Table Spool (Oliver Asmus – @oliverasmus). A scalar operation is also performed on it, thanks to Jeffrey Verheul (@devjef)’s Compute Scalar operator. The second stream of data applies Evil (I figured that must mean a procedural TVF, but could’ve been anything), courtesy of Jason Strate (@stratesql). It performs this Evil on the merging of parallel streams (Steve Jones – @way0utwest), which suck data out of a Switch (Paul White – @sql_kiwi). This Switch operator is consuming data from up to four lookups, thanks to Kalen Delaney (@sqlqueen), Rick Krueger (@dataogre), Mickey Stuewe (@sqlmickey) and Kathi Kellenberger (@auntkathi). Unfortunately Kathi’s name is a bit long and has been truncated, just like in real plans. The last stream performs a join of two others via a Nested Loop (Matan Yungman – @matanyungman). One pulls data from a Spool (my post – @rob_farley) populated from a Table Scan (Jon Morisi). The other applies a catchall operator (the catchall is because Tamera Clark (@tameraclark) didn’t specify any particular operator, and a catchall is what gets shown when SSMS doesn’t know what to show. Surprisingly, it’s showing the yellow one, which is about cursors. Hopefully that’s not what Tamera planned, but anyway...) to the output from an Index Seek operator (Sebastian Meine – @sqlity). Lastly, I think everyone put in 110% effort, so that’s what all the operators cost. That didn’t leave anything for me, unfortunately, but that’s okay. Also, because he decided to use the Paul White operator, Jason Brimhall gets 0%, and his 110% was given to Paul’s Switch operator post. I hope you’ve enjoyed this T-SQL Tuesday, and have learned something extra about Plan Operators. Keep your eye out for next month’s one by watching the Twitter Hashtag #tsql2sday, and why not contribute a post to the party? Big thanks to Adam Machanic as usual for starting all this. @rob_farley

Read the article
24 hours to pass until 24 Hours of PASS

- by Rob Farley

There’s a bunch of stuff going on at the moment in the SQL world, so if you’ve missed this particular piece of news, let me tell you a bit about it. Twice a year, the SQL community puts on its biggest virtual event – 24 Hours of PASS. And the next one is tomorrow – March 21st, 2012. Twenty-four sessions, back-to-back, featuring a selection of some of the best presenters in the SQL world, speakers from all over the world, coming together in an online collaboration that so far has well over thirty thousand registrations across the presentations. Some people are signed up for all 24 sessions, some only one. Traditionally, LiveMeeting has been used as the platform for this event, but this year we’re going with a new platform – IBTalk. It promises big, and we’re hoping it won’t let us down. LiveMeeting has been great, and we thank Microsoft for providing it as a platform for the past few years. However, as the event has grown, we’ve found that a new idea is necessary. Last year a search was done for a new platform, and IBTalk ticked the right boxes. The feedback from the presenters and moderators so far has been overwhelmingly positive, and we’re hoping that this is going to really enhance the user experience. One of my favourite features of the platform is the language side. It provides a pretty good translation service. Users who join a session will see a flag on the left of the screen. If they click it, they can change the language to one of 15 on offer. Picking this changes all the labels on everything. It even translates the text in the Q&A window. What this means is that someone from Brazil can ask their question in Portuguese, and the presenter will see it in English. Then if the answer is typed in English, the questioner will be able to see the answer, also in Portuguese. Or they can switch to English to see it as the answerer typed it. I know there’s always the risk of bad translations going on, but I’ve heard good things about this translation service. But there’s more – IBTalk are providing staff to type up closed captioning live during the event. So if English isn’t your first language, don’t worry! Picking your language will also let you see subtitles in your chosen language. I’m hoping that this event is the start of PASS being able to reach people from all corners of the world. Wouldn’t it be great to find that this event is successful, and that the next 24HOP (later in the year, our Summit Preview event) has just as many non-English speakers tuning in as English speakers? If you haven’t been planning which sessions you’re going to attend, you really should get over to sqlpass.org/24hours and have a look through what’s on offer. There’s some amazing material from some of the industry’s brightest, covering a wide range of topics, from classic SQL areas to the brand new SQL 2012 features. There really should be something for every SQL professional. Check the time zones though – if you’re in the US you might be on Summer time, and an hour closer to GMT than normal. Massive thanks must go to Microsoft, SQL Sentry and Idera for sponsoring this event. Without sponsors we wouldn’t be able to put any of this on. These companies are helping 24HOP continue to grow into an event for the whole world. See you tomorrow! @rob_farley | #24hop | #sqlpass

Read the article
24 hours to pass until 24 Hours of PASS

- by Rob Farley

There’s a bunch of stuff going on at the moment in the SQL world, so if you’ve missed this particular piece of news, let me tell you a bit about it. Twice a year, the SQL community puts on its biggest virtual event – 24 Hours of PASS. And the next one is tomorrow – March 21st, 2012. Twenty-four sessions, back-to-back, featuring a selection of some of the best presenters in the SQL world, speakers from all over the world, coming together in an online collaboration that so far has well over thirty thousand registrations across the presentations. Some people are signed up for all 24 sessions, some only one. Traditionally, LiveMeeting has been used as the platform for this event, but this year we’re going with a new platform – IBTalk. It promises big, and we’re hoping it won’t let us down. LiveMeeting has been great, and we thank Microsoft for providing it as a platform for the past few years. However, as the event has grown, we’ve found that a new idea is necessary. Last year a search was done for a new platform, and IBTalk ticked the right boxes. The feedback from the presenters and moderators so far has been overwhelmingly positive, and we’re hoping that this is going to really enhance the user experience. One of my favourite features of the platform is the language side. It provides a pretty good translation service. Users who join a session will see a flag on the left of the screen. If they click it, they can change the language to one of 15 on offer. Picking this changes all the labels on everything. It even translates the text in the Q&A window. What this means is that someone from Brazil can ask their question in Portuguese, and the presenter will see it in English. Then if the answer is typed in English, the questioner will be able to see the answer, also in Portuguese. Or they can switch to English to see it as the answerer typed it. I know there’s always the risk of bad translations going on, but I’ve heard good things about this translation service. But there’s more – IBTalk are providing staff to type up closed captioning live during the event. So if English isn’t your first language, don’t worry! Picking your language will also let you see subtitles in your chosen language. I’m hoping that this event is the start of PASS being able to reach people from all corners of the world. Wouldn’t it be great to find that this event is successful, and that the next 24HOP (later in the year, our Summit Preview event) has just as many non-English speakers tuning in as English speakers? If you haven’t been planning which sessions you’re going to attend, you really should get over to sqlpass.org/24hours and have a look through what’s on offer. There’s some amazing material from some of the industry’s brightest, covering a wide range of topics, from classic SQL areas to the brand new SQL 2012 features. There really should be something for every SQL professional. Check the time zones though – if you’re in the US you might be on Summer time, and an hour closer to GMT than normal. Massive thanks must go to Microsoft, SQL Sentry and Idera for sponsoring this event. Without sponsors we wouldn’t be able to put any of this on. These companies are helping 24HOP continue to grow into an event for the whole world. See you tomorrow! @rob_farley | #24hop | #sqlpass

Read the article
Exploding maps in Reporting Services 2008 R2

- by Rob Farley

Kaboom! Well, that was the imagery that secretly appeared in my mind when I saw “USA By State Exploded” in the list of installed maps in Report Builder 3.0 – part of the spatial offering of SQL Server Reporting Server 2008 R2. Alas, it just means that the borders are bigger. Clicking on it showed me. Unfortunately, I’m not interested in maps of the US. None of my clients are there (at least, not yet – feel free to get in touch if you want to change this ‘feature’ of my company). So instead, I’ve recently been getting hold of some data for Australian areas. I’ve just bought some PostCode shapes for South Australia, and will use this in demos for conferences and for showing clients how this kind of report can really impact their reporting. One of the companies I was talking about getting shape files sent me a sample. So I chose the “ESRI shapefile” option you see above, and browsed to my file. It appeared in the window like this: Australians will immediately recognise this as the area around Wollongong, just south of Sydney. Well, apart from me. I didn’t. I had to put a Bing Maps layer behind it to work that out, but that’s not for this post. The thing that I discovered was that if I selected the Exploded USA option (but without clicking Next), and then chose my shape file, then my area around Wollongong would be exploded too! Huh! I think this is actually a bug, but a potentially useful one! Some further investigation (involving creating two identical reports, one with this exploded view, one without), showed that the Exploded View is done by reducing the ScaleFactor property of the PolygonLayer in the map control. The Exploded version has it below 1. If you set to above one, your shapes overlap. I discovered this by accident… I guess I hadn’t looked through all the PolygonLayer options to work out what they all do. And because this post is about Reporting, it can qualify for this month’s T-SQL Tuesday, hosted by Aaron Nelson (@sqlvariant). Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
LobsterPot Solutions in the USA

- by Rob Farley

We’re expanding! I’m thrilled to announce that Microsoft Gold Partner LobsterPot Solutions has started another branch appointing the amazing Ted Krueger (5-time SQL MVP awardee) as the US lead. Ted is well-known in the SQL Server world, having written books on indexing, consulting and on being a DBA (not to mention contributing chapters to both MVP Deep Dives books). He is an expert on replication and high availability, and strong in the Business Intelligence space – vast experience which is both broad and deep. Ted is based in the south east corner of Wisconsin, just north of Chicago. He has been a consultant for eons and has helped many clients with their projects and problems, taking the role as both technical lead and consulting lead. He is also tireless in supporting and developing the SQL Server community, presenting at conferences across America, and helping people through his blog, Twitter and more. Despite all this – it’s neither his technical excellence with SQL Server nor his consulting skill that made me want him to lead LobsterPot’s US venture. I wanted Ted because of his values. In the time I’ve known Ted, I’ve found his integrity to be excellent, and found him to be morally beyond reproach. This is the biggest priority I have when finding people to represent the LobsterPot brand. I have no qualms in recommending Ted’s character or work ethic. It’s not just my thoughts on him – all my trusted friends that know Ted agree about this. So last week, LobsterPot Solutions LLC was formed in the United States, and in a couple of weeks, we will be open for business! LobsterPot Solutions can be contacted via email at [email protected], on the web at either www.lobsterpot.com.au or www.lobsterpotsolutions.com, and on Twitter as @lobsterpot_au and @lobsterpot_us. Ted Kruger blogs at LessThanDot, and can also be found on Twitter and LinkedIn. This post is cross-posted from http://lobsterpotsolutions.com/lobsterpot-solutions-in-the-usa

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

Search Results

Search found 1339 results on 54 pages for 'rob farley'.

Page 2/54 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by Rob Hills

- by Most Valuable Yak (Rob Volk)

- by Rob

- by user1848054

- by Geertjan

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by sean farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

- by Rob Farley

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >