Search Results

Search found 2251 results on 91 pages for 'ken paul'.

Page 6/91 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • When is a Seek not a Seek?

    - by Paul White
    The following script creates a single-column clustered table containing the integers from 1 to 1,000 inclusive. IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test ; GO CREATE TABLE #Test ( id INTEGER PRIMARY KEY CLUSTERED ); ; INSERT #Test (id) SELECT V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 1000 ; Let’s say we need to find the rows with values from 100 to 170, excluding any values that divide exactly by 10.  One way to write that query would be: SELECT T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; That query produces a pretty efficient-looking query plan: Knowing that the source column is defined as an INTEGER, we could also express the query this way: SELECT T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; We get a similar-looking plan: If you look closely, you might notice that the line connecting the two icons is a little thinner than before.  The first query is estimated to produce 61.9167 rows – very close to the 63 rows we know the query will return.  The second query presents a tougher challenge for SQL Server because it doesn’t know how to predict the selectivity of the modulo expression (T.id % 10 > 0).  Without that last line, the second query is estimated to produce 68.1667 rows – a slight overestimate.  Adding the opaque modulo expression results in SQL Server guessing at the selectivity.  As you may know, the selectivity guess for a greater-than operation is 30%, so the final estimate is 30% of 68.1667, which comes to 20.45 rows. The second difference is that the Clustered Index Seek is costed at 99% of the estimated total for the statement.  For some reason, the final SELECT operator is assigned a small cost of 0.0000484 units; I have absolutely no idea why this is so, or what it models.  Nevertheless, we can compare the total cost for both queries: the first one comes in at 0.0033501 units, and the second at 0.0034054.  The important point is that the second query is costed very slightly higher than the first, even though it is expected to produce many fewer rows (20.45 versus 61.9167). If you run the two queries, they produce exactly the same results, and both complete so quickly that it is impossible to measure CPU usage for a single execution.  We can, however, compare the I/O statistics for a single run by running the queries with STATISTICS IO ON: Table '#Test'. Scan count 63, logical reads 126, physical reads 0. Table '#Test'. Scan count 01, logical reads 002, physical reads 0. The query with the IN list uses 126 logical reads (and has a ‘scan count’ of 63), while the second query form completes with just 2 logical reads (and a ‘scan count’ of 1).  It is no coincidence that 126 = 63 * 2, by the way.  It is almost as if the first query is doing 63 seeks, compared to one for the second query. In fact, that is exactly what it is doing.  There is no indication of this in the graphical plan, or the tool-tip that appears when you hover your mouse over the Clustered Index Seek icon.  To see the 63 seek operations, you have click on the Seek icon and look in the Properties window (press F4, or right-click and choose from the menu): The Seek Predicates list shows a total of 63 seek operations – one for each of the values from the IN list contained in the first query.  I have expanded the first seek node to show the details; it is seeking down the clustered index to find the entry with the value 101.  Each of the other 62 nodes expands similarly, and the same information is contained (even more verbosely) in the XML form of the plan. Each of the 63 seek operations starts at the root of the clustered index B-tree and navigates down to the leaf page that contains the sought key value.  Our table is just large enough to need a separate root page, so each seek incurs 2 logical reads (one for the root, and one for the leaf).  We can see the index depth using the INDEXPROPERTY function, or by using the a DMV: SELECT S.index_type_desc, S.index_depth FROM sys.dm_db_index_physical_stats ( DB_ID(N'tempdb'), OBJECT_ID(N'tempdb..#Test', N'U'), 1, 1, DEFAULT ) AS S ; Let’s look now at the Properties window when the Clustered Index Seek from the second query is selected: There is just one seek operation, which starts at the root of the index and navigates the B-tree looking for the first key that matches the Start range condition (id >= 101).  It then continues to read records at the leaf level of the index (following links between leaf-level pages if necessary) until it finds a row that does not meet the End range condition (id <= 169).  Every row that meets the seek range condition is also tested against the Residual Predicate highlighted above (id % 10 > 0), and is only returned if it matches that as well. You will not be surprised that the single seek (with a range scan and residual predicate) is much more efficient than 63 singleton seeks.  It is not 63 times more efficient (as the logical reads comparison would suggest), but it is around three times faster.  Let’s run both query forms 10,000 times and measure the elapsed time: DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON; SET STATISTICS XML OFF; ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; GO DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; On my laptop, running SQL Server 2008 build 4272 (SP2 CU2), the IN form of the query takes around 830ms and the range query about 300ms.  The main point of this post is not performance, however – it is meant as an introduction to the next few parts in this mini-series that will continue to explore scans and seeks in detail. When is a seek not a seek?  When it is 63 seeks © Paul White 2011 email: [email protected] twitter: @SQL_kiwi

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

  • Opening a View with a Table without a NavigationController

    - by Ken
    Hiya, I'm pretty new to developing with Cocoa Touch/XCode and I came across a problem. I'm making a sort of RSS reader for a newssite and I have 5 views of tables navigated with 5 tabs in a TabBarController. If someone selects a newsitem I want another view to open showing the complete newsitem. My problem is that it won't work. This is my code: - (NSInteger)tableView:(UITableView *)tableView numberOfRowsInSection(NSInteger)section{ return [[[self rssParser]rssItems]count]; } - (UITableViewCell *)tableView:(UITableView *)tableView cellForRowAtIndexPath:(NSIndexPath *)indexPath{ UITableViewCell * cell = [tableView dequeueReusableCellWithIdentifier:@"rssItemCell"]; if(nil == cell){ cell = [[[UITableViewCell alloc] initWithStyle:UITableViewCellStyleSubtitle reuseIdentifier:@"rssItemCell"]autorelease]; } cell.textLabel.text = [[[[self rssParser]rssItems]objectAtIndex:indexPath.row]title]; cell.detailTextLabel.text = [[[[self rssParser]rssItems]objectAtIndex:indexPath.row]description]; cell.accessoryType = UITableViewCellAccessoryDisclosureIndicator; return cell; } - (void)tableView:(UITableView *)tableView didSelectRowAtIndexPath:(NSIndexPath *)indexPath { [[self appDelegate] setCurrentlySelectedBlogItem:[[[self rssParser]rssItems]objectAtIndex:indexPath.row]]; [self.appDelegate loadNewsDetails]; } And it calls this method in my delegate: -(void)loadNewsDetails{ [[self rootController]pushViewController:detailController animated:YES]; } Please tell me what I'm doing wrong. BTW I do not want to use a NavigationController, just the tabbar I'm using. Thanks in advance, Ken

    Read the article

  • Install CLSQL on Mac OS X

    - by Ken
    I have SBCL installed (via macports/darwinports) on my Intel Core 2 Duo Macbook running 10.5.8. I've installed several libraries like this: (require 'asdf) (require 'asdf-install) (asdf-install:install 'cl-who) But when I tried to install CLSQL this way ('clsql) after it downloaded, I got this: ... ; registering #<SYSTEM CLSQL-UFFI {123D9E01}> as CLSQL-UFFI ; $ cd /Users/ken/.sbcl/site/clsql-5.0.5/uffi/; make cc -arch x86_64 -arch i386 -bundle /usr/lib/bundle1.o -flat_namespace -undefined suppress clsql_uffi.c -o clsql_uffi.dylib ld: duplicate symbol dyld_stub_binding_helper in /usr/lib/bundle1.o and /usr/lib/bundle1.o for architecture i386 ld: duplicate symbol dyld_stub_binding_helper in /usr/lib/bundle1.o and /usr/lib/bundle1.o for architecture x86_64 collect2: ld returned 1 exit status collect2: ld returned 1 exit status lipo: can't open input file: /var/folders/Nf/Nf4o5ArDFaWBH2OwtnWM3E+++TQ/-Tmp-//ccJyZxou.out (No such file or directory) make: *** [clsql_uffi.so] Error 1 Is there something I forgot, or some trick to get it to build on Mac OS X? I know very little about C libraries on the Mac these days, so I don't even know where to start on this. Thanks!

    Read the article

  • how to split a very large database on sql server

    - by ken jackson
    I have a 90 GB SQL Server database that I want to make more manageable. It stores stock data from 50+ different stocks from 2009 and 2010, and each stock is a separate table. Some tables have hundreds of millions of rows, and other have just a few million. What I want to do is somehow split the database, so that I don't have a single database file that is 90 GB. What I want is to be able to somehow magically split all the tables so that I can backup the 2009 data once and not have to keep on including it in the backup every time I backup the entire database, however, I would like the 2009 data to be included whenever I do a query. Is partitioning the database the way to go? Will it do the above for me, or will I need some other solution? I research partitioning, but I wasn't sure if that would solve all my problems. I wasn't able to find anything that would tell me whether or not it would migrate prexisting data, or whether it only worked for newly inserted data. Any help or pointers would be much appreciated. Thanks in advance, Ken

    Read the article

  • ImageViews sometimes not displaying in FrameLayout activity

    - by Ken
    The top level layout in my activity is a framelayout. I have completed, debugged and tested this app and it works exactly like it should in all respects on my g1 and on various emulators. But on 3.7-inch displays running 2.1+, some imageviews packed in a linearlayout are periodically not visible. I know that they are there because you can touch and drag them with effect in the app. So I assume somehow they have gotten under the SurfaceView that is the main component of the app. This is apparently so even though the SurfaceView is declared in the xml prior to the LinearLayout. However, the ImageViews IN the LinearLayout are added programmatically towards the end of onCreate(). Framelayout stacks everything that is added to it, one on top of the other--the only way you will see more than one child of a frame layout is if they are smaller than the screen and are placed apart from eachother. Oddly, sometimes the imageviews ARE visible--it is random. Anyway, I've been trying to combat this with framelayout.bringChildToFront(View v) on the linearlayout without success. I wonder if anyone has any insight into how the behavior could be random like that, and how I should code these imageviews to keep this from happening, and why the problem appears only to occur on 3.7 vs 3.2 inch screens (as it happens, the two 3.2-inch screens were both htc, so vendor might be factor too). [edit] Actually, I've determined that this is a 2.2 issue, not a screen size (or even vendor) issue. Can't ensure that ImageViews added to a framelayout with a SurfaceView in it will appear on top of the surfaceview. I ran some tests in the respective onDraw() methods and the imageviews are 'visible' (0), and nothing does anything to the alpha of the drawables, which are there as well at ondraw(). [/edit] Any insight would be welcomed. Ken T.

    Read the article

  • Explained: EF 6 and “Could not determine storage version; a valid storage connection or a version hint is required.”

    - by Ken Cox [MVP]
    I have a legacy ASP.NET 3.5 web site that I’ve upgraded to a .NET 4 web application. At the same time, I upgraded to Entity Framework 6. Suddenly one of the pages returned the following error: [ArgumentException: Could not determine storage version; a valid storage connection or a version hint is required.]    System.Data.SqlClient.SqlVersionUtils.GetSqlVersion(String versionHint) +11372412    System.Data.SqlClient.SqlProviderServices.GetDbProviderManifest(String versionHint) +91    System.Data.Common.DbProviderServices.GetProviderManifest(String manifestToken) +92 [ProviderIncompatibleException: The provider did not return a ProviderManifest instance.]    System.Data.Common.DbProviderServices.GetProviderManifest(String manifestToken) +11431433    System.Data.Metadata.Edm.Loader.InitializeProviderManifest(Action`3 addError) +11370982    System.Data.EntityModel.SchemaObjectModel.Schema.HandleAttribute(XmlReader reader) +216 A search of the error message didn’t turn up anything helpful except that someone mentioned that the error messages was bogus in his case. The page in question uses the ASP.NET EntityDataSource control, consumed by a Telerik RadGrid. This is a fabulous combination for putting a huge amount of functionality on a page in a very short time. Unfortunately, the 6.0.1 release of EF6 doesn’t support EntityDataSource. According to the people in charge, support is planned but there’s no timeline for an EntityDataSource build that works with EF6.  I’m not sure what to do in the meantime. Should I back out EF6 or manually wire up the RadGrid? The upshot is that you might want to rethink plans to upgrade to Entity Framework 6 for Web forms projects if they rely on that handy control. It might also help to spend a User voice vote here:  http://data.uservoice.com/forums/72025-entity-framework-feature-suggestions/suggestions/3702890-support-for-asp-net-entitydatasource-and-dynamicda

    Read the article

  • Entity Framework Code-First to Provide Replacement for ASP.NET Profile Provider

    - by Ken Cox [MVP]
    A while back, I coordinated a project to add support for the SQL Table Profile Provider in ASP.NET 4 Web Applications.  We urged Microsoft to improve ASP.NET’s built-in Profile support so our workaround wouldn’t be necessary. Instead, Microsoft plans to provide a replacement for ASP.NET Profile in a forthcoming release. In response to my feature suggestion on Connect, Microsoft says we should look for something even better using Entity Framework: “When code-first is officially released the final piece of a full replacement of the ASP.NET Profile will have arrived. Once code-first for EF4 is released, developers will have a really easy and very approachable way to create any arbitrary class, and automatically have the .NET Framework create a table to provide storage for that class. Furthermore developer will also have full LINQ-query capabilities against code-first classes. “ The downside is that there won’t be a way to retrofit this Profile replacement to pre- ASP.NET 4 Web applications. At least there’ll still be the MVP workaround code. It looks like it’s time for me to dig into a CTP of EF Code-First to see what’s available.   Scott Guthrie has been blogging about Code-First Development with Entity Framework 4. It’s not clear when the EF Code-First is coming, but my guess is that it’ll be part of the VS 2010/.NET 4 service pack.

    Read the article

  • Fix: Windows Live Mail Error ID 0x8004108D

    - by Ken Cox [MVP]
    Over the last few days Windows Live Mail stopped fetching my Hotmail. I assumed it was a problem with Microsoft’s NNTP Web service, so I went back to using the browser to check my Hotmail. Well, it didn’t right itself, so I started probing and discovered the fix… no default connection. To repair it: From the Tools menu, click Accounts. On the Accounts screen select Hotmail and then click Properties. On the Connection tab select Local Area Network and check the Always Connect to This Account Using...(read more)

    Read the article

  • Display ‘–Select–’ in an ASP.NET DropDownList

    - by Ken Cox [MVP]
    A purchaser of my book writes: “I would like a drop down list to display the text: "-Select-" initially instead of the first value of the data it is bound to.” Here you go…   <%@ Page Language="VB" %> <script runat="server">     Protected Sub Page_Load(ByVal sender As Object, _                            ...(read more)

    Read the article

  • Heaps of Trouble?

    - by Paul White NZ
    If you’re not already a regular reader of Brad Schulz’s blog, you’re missing out on some great material.  In his latest entry, he is tasked with optimizing a query run against tables that have no indexes at all.  The problem is, predictably, that performance is not very good.  The catch is that we are not allowed to create any indexes (or even new statistics) as part of our optimization efforts. In this post, I’m going to look at the problem from a slightly different angle, and present an alternative solution to the one Brad found.  Inevitably, there’s going to be some overlap between our entries, and while you don’t necessarily need to read Brad’s post before this one, I do strongly recommend that you read it at some stage; he covers some important points that I won’t cover again here. The Example We’ll use data from the AdventureWorks database, copied to temporary unindexed tables.  A script to create these structures is shown below: CREATE TABLE #Custs ( CustomerID INTEGER NOT NULL, TerritoryID INTEGER NULL, CustomerType NCHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #Prods ( ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, Name NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, ); GO CREATE TABLE #OrdHeader ( SalesOrderID INTEGER NOT NULL, OrderDate DATETIME NOT NULL, SalesOrderNumber NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AI NOT NULL, CustomerID INTEGER NOT NULL, ); GO CREATE TABLE #OrdDetail ( SalesOrderID INTEGER NOT NULL, OrderQty SMALLINT NOT NULL, LineTotal NUMERIC(38,6) NOT NULL, ProductMainID INTEGER NOT NULL, ProductSubID INTEGER NOT NULL, ProductSubSubID INTEGER NOT NULL, ); GO INSERT #Custs ( CustomerID, TerritoryID, CustomerType ) SELECT C.CustomerID, C.TerritoryID, C.CustomerType FROM AdventureWorks.Sales.Customer C WITH (TABLOCK); GO INSERT #Prods ( ProductMainID, ProductSubID, ProductSubSubID, Name ) SELECT P.ProductID, P.ProductID, P.ProductID, P.Name FROM AdventureWorks.Production.Product P WITH (TABLOCK); GO INSERT #OrdHeader ( SalesOrderID, OrderDate, SalesOrderNumber, CustomerID ) SELECT H.SalesOrderID, H.OrderDate, H.SalesOrderNumber, H.CustomerID FROM AdventureWorks.Sales.SalesOrderHeader H WITH (TABLOCK); GO INSERT #OrdDetail ( SalesOrderID, OrderQty, LineTotal, ProductMainID, ProductSubID, ProductSubSubID ) SELECT D.SalesOrderID, D.OrderQty, D.LineTotal, D.ProductID, D.ProductID, D.ProductID FROM AdventureWorks.Sales.SalesOrderDetail D WITH (TABLOCK); The query itself is a simple join of the four tables: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #OrdDetail D ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID JOIN #OrdHeader H ON D.SalesOrderID = H.SalesOrderID JOIN #Custs C ON H.CustomerID = C.CustomerID ORDER BY P.ProductMainID ASC OPTION (RECOMPILE, MAXDOP 1); Remember that these tables have no indexes at all, and only the single-column sampled statistics SQL Server automatically creates (assuming default settings).  The estimated query plan produced for the test query looks like this (click to enlarge): The Problem The problem here is one of cardinality estimation – the number of rows SQL Server expects to find at each step of the plan.  The lack of indexes and useful statistical information means that SQL Server does not have the information it needs to make a good estimate.  Every join in the plan shown above estimates that it will produce just a single row as output.  Brad covers the factors that lead to the low estimates in his post. In reality, the join between the #Prods and #OrdDetail tables will produce 121,317 rows.  It should not surprise you that this has rather dire consequences for the remainder of the query plan.  In particular, it makes a nonsense of the optimizer’s decision to use Nested Loops to join to the two remaining tables.  Instead of scanning the #OrdHeader and #Custs tables once (as it expected), it has to perform 121,317 full scans of each.  The query takes somewhere in the region of twenty minutes to run to completion on my development machine. A Solution At this point, you may be thinking the same thing I was: if we really are stuck with no indexes, the best we can do is to use hash joins everywhere. We can force the exclusive use of hash joins in several ways, the two most common being join and query hints.  A join hint means writing the query using the INNER HASH JOIN syntax; using a query hint involves adding OPTION (HASH JOIN) at the bottom of the query.  The difference is that using join hints also forces the order of the join, whereas the query hint gives the optimizer freedom to reorder the joins at its discretion. Adding the OPTION (HASH JOIN) hint results in this estimated plan: That produces the correct output in around seven seconds, which is quite an improvement!  As a purely practical matter, and given the rigid rules of the environment we find ourselves in, we might leave things there.  (We can improve the hashing solution a bit – I’ll come back to that later on). Faster Nested Loops It might surprise you to hear that we can beat the performance of the hash join solution shown above using nested loops joins exclusively, and without breaking the rules we have been set. The key to this part is to realize that a condition like (A = B) can be expressed as (A <= B) AND (A >= B).  Armed with this tremendous new insight, we can rewrite the join predicates like so: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #OrdDetail D JOIN #OrdHeader H ON D.SalesOrderID >= H.SalesOrderID AND D.SalesOrderID <= H.SalesOrderID JOIN #Custs C ON H.CustomerID >= C.CustomerID AND H.CustomerID <= C.CustomerID JOIN #Prods P ON P.ProductMainID >= D.ProductMainID AND P.ProductMainID <= D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (RECOMPILE, LOOP JOIN, MAXDOP 1, FORCE ORDER); I’ve also added LOOP JOIN and FORCE ORDER query hints to ensure that only nested loops joins are used, and that the tables are joined in the order they appear.  The new estimated execution plan is: This new query runs in under 2 seconds. Why Is It Faster? The main reason for the improvement is the appearance of the eager Index Spools, which are also known as index-on-the-fly spools.  If you read my Inside The Optimiser series you might be interested to know that the rule responsible is called JoinToIndexOnTheFly. An eager index spool consumes all rows from the table it sits above, and builds a index suitable for the join to seek on.  Taking the index spool above the #Custs table as an example, it reads all the CustomerID and TerritoryID values with a single scan of the table, and builds an index keyed on CustomerID.  The term ‘eager’ means that the spool consumes all of its input rows when it starts up.  The index is built in a work table in tempdb, has no associated statistics, and only exists until the query finishes executing. The result is that each unindexed table is only scanned once, and just for the columns necessary to build the temporary index.  From that point on, every execution of the inner side of the join is answered by a seek on the temporary index – not the base table. A second optimization is that the sort on ProductMainID (required by the ORDER BY clause) is performed early, on just the rows coming from the #OrdDetail table.  The optimizer has a good estimate for the number of rows it needs to sort at that stage – it is just the cardinality of the table itself.  The accuracy of the estimate there is important because it helps determine the memory grant given to the sort operation.  Nested loops join preserves the order of rows on its outer input, so sorting early is safe.  (Hash joins do not preserve order in this way, of course). The extra lazy spool on the #Prods branch is a further optimization that avoids executing the seek on the temporary index if the value being joined (the ‘outer reference’) hasn’t changed from the last row received on the outer input.  It takes advantage of the fact that rows are still sorted on ProductMainID, so if duplicates exist, they will arrive at the join operator one after the other. The optimizer is quite conservative about introducing index spools into a plan, because creating and dropping a temporary index is a relatively expensive operation.  It’s presence in a plan is often an indication that a useful index is missing. I want to stress that I rewrote the query in this way primarily as an educational exercise – I can’t imagine having to do something so horrible to a production system. Improving the Hash Join I promised I would return to the solution that uses hash joins.  You might be puzzled that SQL Server can create three new indexes (and perform all those nested loops iterations) faster than it can perform three hash joins.  The answer, again, is down to the poor information available to the optimizer.  Let’s look at the hash join plan again: Two of the hash joins have single-row estimates on their build inputs.  SQL Server fixes the amount of memory available for the hash table based on this cardinality estimate, so at run time the hash join very quickly runs out of memory. This results in the join spilling hash buckets to disk, and any rows from the probe input that hash to the spilled buckets also get written to disk.  The join process then continues, and may again run out of memory.  This is a recursive process, which may eventually result in SQL Server resorting to a bailout join algorithm, which is guaranteed to complete eventually, but may be very slow.  The data sizes in the example tables are not large enough to force a hash bailout, but it does result in multiple levels of hash recursion.  You can see this for yourself by tracing the Hash Warning event using the Profiler tool. The final sort in the plan also suffers from a similar problem: it receives very little memory and has to perform multiple sort passes, saving intermediate runs to disk (the Sort Warnings Profiler event can be used to confirm this).  Notice also that because hash joins don’t preserve sort order, the sort cannot be pushed down the plan toward the #OrdDetail table, as in the nested loops plan. Ok, so now we understand the problems, what can we do to fix it?  We can address the hash spilling by forcing a different order for the joins: SELECT P.ProductMainID AS PID, P.Name, D.OrderQty, H.SalesOrderNumber, H.OrderDate, C.TerritoryID FROM #Prods P JOIN #Custs C JOIN #OrdHeader H ON H.CustomerID = C.CustomerID JOIN #OrdDetail D ON D.SalesOrderID = H.SalesOrderID ON P.ProductMainID = D.ProductMainID AND P.ProductSubID = D.ProductSubID AND P.ProductSubSubID = D.ProductSubSubID ORDER BY D.ProductMainID OPTION (MAXDOP 1, HASH JOIN, FORCE ORDER); With this plan, each of the inputs to the hash joins has a good estimate, and no hash recursion occurs.  The final sort still suffers from the one-row estimate problem, and we get a single-pass sort warning as it writes rows to disk.  Even so, the query runs to completion in three or four seconds.  That’s around half the time of the previous hashing solution, but still not as fast as the nested loops trickery. Final Thoughts SQL Server’s optimizer makes cost-based decisions, so it is vital to provide it with accurate information.  We can’t really blame the performance problems highlighted here on anything other than the decision to use completely unindexed tables, and not to allow the creation of additional statistics. I should probably stress that the nested loops solution shown above is not one I would normally contemplate in the real world.  It’s there primarily for its educational and entertainment value.  I might perhaps use it to demonstrate to the sceptical that SQL Server itself is crying out for an index. Be sure to read Brad’s original post for more details.  My grateful thanks to him for granting permission to reuse some of his material. Paul White Email: [email protected] Twitter: @PaulWhiteNZ

    Read the article

  • Help A Hacker: Give ‘Em The Windows Source Code

    - by Ken Cox [MVP]
    The announcement of another Windows megapatch reminded me of a WikiLeaks story about Microsoft Windows that hasn’t attracted much attention. Alarmingly, we learn that the hackers have the Windows source code to study and test for vulnerabilities. Chinese hackers used the knowledge to breach Google’s accounts and servers: “In 2003, the CNITSEC signed a Government Security Program (GSP) international agreement with Microsoft that allowed select companies such as TOPSEC access to Microsoft source code in order to secure the Windows platform” “CNITSEC enterprises has recruited Chinese hackers in support of nationally-funded "network attack scientific research projects." From June 2002 to March 2003, TOPSEC employed a known Chinese hacker, Lin Yong (a.k.a. Lion and owner of the Honker Union of China), as senior security service engineer…” Windows is widely seen as unsecurable. It doesn’t help that Chinese government-funded hackers are probing the source code for vulnerabilities. It seems odd that people who didn’t write the code can find vulnerabilities faster than the owners of the code. Perhaps the U.S. government should hire its own hackers to go over the same Windows source code and then tell Microsoft how to secure its product?

    Read the article

  • EF 4’s PluralizationService Class: A Singularly Impossible Plurality

    - by Ken Cox [MVP]
    Entity Framework’s new 4.0 designer does its best to generate correct plural and singular forms of object names. This magic is done through the PluralizationService Class found in the System.Data.Entity.Design.PluralizationServices namespace and in the System.Data.Entity.Design.dll assembly. [Before you ask… Yes, I’ll post my example page, the service, and the project source code as soon as my ISP makes ASP.NET 4 RTM available. Stay tuned.] Anyone who speaks English is brutally aware of the ridiculous...(read more)

    Read the article

  • First look at the cloud with Google App Engine and Udacity

    - by Ken Hortsch
    Udacity is free online university and offers a CS253 intro to web development class.  Since I am currently a web developer/architect in ASP.Net (and recent project has brought me back to Java) it is a bit light.  However, it does offer me a nice problem set and my first exposure to writing cloud apps with GAE and Google Datastore. Steve Huffman, who developed reddit, is the instructor and does offer nice real-life stories.  Give it a look.

    Read the article

  • Fix: WCF - The type provided as the Service attribute value in the ServiceHost directive could not

    - by Ken Cox [MVP]
    I wanted to expose some raw data to users in my current ASP.NET 3.5 web site project. I created a subdirectory called ‘datafeeds’ and added a WCF Data Service. I wired the dataservice up to the Entity Framework class and, on running the ItemDataService.svc file, was greeted with: The type  <> provided as the Service attribute value in the ServiceHost directive could not be found So why couldn’t it find the class? It was right there in the… oops! Instead of putting the ItemDataService.vb...(read more)

    Read the article

  • Extracting the Date from a DateTime in Entity Framework 4 and LINQ

    - by Ken Cox [MVP]
    In my current ASP.NET 4 project, I’m displaying dates in a GridDateTimeColumn of Telerik’s ASP.NET Radgrid control. I don’t care about the time stuff, so my DataFormatString shows only the date bits: <telerik:GridDateTimeColumn FilterControlWidth="100px"   DataField="DateCreated" HeaderText="Created"    SortExpression="DateCreated" ReadOnly="True"    UniqueName="DateCreated" PickerType="DatePicker"    DataFormatString="{0:dd MMM yy}"> My problem was that I couldn’t get the built-in column filtering (it uses Telerik’s DatePicker control) to behave.  The DatePicker assumes that the time is 00:00:00 but the data would have times like 09:22:21. So, when you select a date and apply the EqualTo filter, you get no results. You would get results if all the time portions were 00:00:00. In essence, I wanted my Entity Framework query to give the DatePicker what it wanted… a Date without the Time portion. Fortunately, EF4 provides the TruncateTime  function. After you include Imports System.Data.Objects.EntityFunctions You’ll find that your EF queries will accept the TruncateTime function. Here’s my routine: Protected Sub RadGrid1_NeedDataSource _     (ByVal source As Object, _      ByVal e As Telerik.Web.UI.GridNeedDataSourceEventArgs) _     Handles RadGrid1.NeedDataSource     Dim ent As New OfficeBookDBEntities1     Dim TopBOMs = From t In ent.TopBom, i In ent.Items _                   Where t.BusActivityID = busActivityID _       And i.BusActivityID And t.ItemID = i.RecordID _       Order By t.DateUpdated Descending _       Select New With {.TopBomID = t.TopBomID, .ItemID = t.ItemID, _                        .PartNumber = i.PartNumber, _                        .Description = i.Description, .Notes = t.Notes, _                        .DateCreated = TruncateTime(t.DateCreated), _                        .DateUpdated = TruncateTime(t.DateUpdated)}     RadGrid1.DataSource = TopBOMs End Sub Now when I select March 14, 2011 on the DatePicker, the filter doesn’t stumble on time values that don’t make sense. Full Disclosure: Telerik gives me (and other developer MVPs) free copies of their suite.

    Read the article

  • So…is it a Seek or a Scan?

    - by Paul White
    You’re probably most familiar with the terms ‘Seek’ and ‘Scan’ from the graphical plans produced by SQL Server Management Studio (SSMS).  The image to the left shows the most common ones, with the three types of scan at the top, followed by four types of seek.  You might look to the SSMS tool-tip descriptions to explain the differences between them: Not hugely helpful are they?  Both mention scans and ranges (nothing about seeks) and the Index Seek description implies that it will not scan the index entirely (which isn’t necessarily true). Recall also yesterday’s post where we saw two Clustered Index Seek operations doing very different things.  The first Seek performed 63 single-row seeking operations; and the second performed a ‘Range Scan’ (more on those later in this post).  I hope you agree that those were two very different operations, and perhaps you are wondering why there aren’t different graphical plan icons for Range Scans and Seeks?  I have often wondered about that, and the first person to mention it after yesterday’s post was Erin Stellato (twitter | blog): Before we go on to make sense of all this, let’s look at another example of how SQL Server confusingly mixes the terms ‘Scan’ and ‘Seek’ in different contexts.  The diagram below shows a very simple heap table with two columns, one of which is the non-clustered Primary Key, and the other has a non-unique non-clustered index defined on it.  The right hand side of the diagram shows a simple query, it’s associated query plan, and a couple of extracts from the SSMS tool-tip and Properties windows. Notice the ‘scan direction’ entry in the Properties window snippet.  Is this a seek or a scan?  The different references to Scans and Seeks are even more pronounced in the XML plan output that the graphical plan is based on.  This fragment is what lies behind the single Index Seek icon shown above: You’ll find the same confusing references to Seeks and Scans throughout the product and its documentation. Making Sense of Seeks Let’s forget all about scans for a moment, and think purely about seeks.  Loosely speaking, a seek is the process of navigating an index B-tree to find a particular index record, most often at the leaf level.  A seek starts at the root and navigates down through the levels of the index to find the point of interest: Singleton Lookups The simplest sort of seek predicate performs this traversal to find (at most) a single record.  This is the case when we search for a single value using a unique index and an equality predicate.  It should be readily apparent that this type of search will either find one record, or none at all.  This operation is known as a singleton lookup.  Given the example table from before, the following query is an example of a singleton lookup seek: Sadly, there’s nothing in the graphical plan or XML output to show that this is a singleton lookup – you have to infer it from the fact that this is a single-value equality seek on a unique index.  The other common examples of a singleton lookup are bookmark lookups – both the RID and Key Lookup forms are singleton lookups (an RID lookup finds a single record in a heap from the unique row locator, and a Key Lookup does much the same thing on a clustered table).  If you happen to run your query with STATISTICS IO ON, you will notice that ‘Scan Count’ is always zero for a singleton lookup. Range Scans The other type of seek predicate is a ‘seek plus range scan’, which I will refer to simply as a range scan.  The seek operation makes an initial descent into the index structure to find the first leaf row that qualifies, and then performs a range scan (either backwards or forwards in the index) until it reaches the end of the scan range. The ability of a range scan to proceed in either direction comes about because index pages at the same level are connected by a doubly-linked list – each page has a pointer to the previous page (in logical key order) as well as a pointer to the following page.  The doubly-linked list is represented by the green and red dotted arrows in the index diagram presented earlier.  One subtle (but important) point is that the notion of a ‘forward’ or ‘backward’ scan applies to the logical key order defined when the index was built.  In the present case, the non-clustered primary key index was created as follows: CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col ASC) ) ; Notice that the primary key index specifies an ascending sort order for the single key column.  This means that a forward scan of the index will retrieve keys in ascending order, while a backward scan would retrieve keys in descending key order.  If the index had been created instead on key_col DESC, a forward scan would retrieve keys in descending order, and a backward scan would return keys in ascending order. A range scan seek predicate may have a Start condition, an End condition, or both.  Where one is missing, the scan starts (or ends) at one extreme end of the index, depending on the scan direction.  Some examples might help clarify that: the following diagram shows four queries, each of which performs a single seek against a column holding every integer from 1 to 100 inclusive.  The results from each query are shown in the blue columns, and relevant attributes from the Properties window appear on the right: Query 1 specifies that all key_col values less than 5 should be returned in ascending order.  The query plan achieves this by seeking to the start of the index leaf (there is no explicit starting value) and scanning forward until the End condition (key_col < 5) is no longer satisfied (SQL Server knows it can stop looking as soon as it finds a key_col value that isn’t less than 5 because all later index entries are guaranteed to sort higher). Query 2 asks for key_col values greater than 95, in descending order.  SQL Server returns these results by seeking to the end of the index, and scanning backwards (in descending key order) until it comes across a row that isn’t greater than 95.  Sharp-eyed readers may notice that the end-of-scan condition is shown as a Start range value.  This is a bug in the XML show plan which bubbles up to the Properties window – when a backward scan is performed, the roles of the Start and End values are reversed, but the plan does not reflect that.  Oh well. Query 3 looks for key_col values that are greater than or equal to 10, and less than 15, in ascending order.  This time, SQL Server seeks to the first index record that matches the Start condition (key_col >= 10) and then scans forward through the leaf pages until the End condition (key_col < 15) is no longer met. Query 4 performs much the same sort of operation as Query 3, but requests the output in descending order.  Again, we have to mentally reverse the Start and End conditions because of the bug, but otherwise the process is the same as always: SQL Server finds the highest-sorting record that meets the condition ‘key_col < 25’ and scans backward until ‘key_col >= 20’ is no longer true. One final point to note: seek operations always have the Ordered: True attribute.  This means that the operator always produces rows in a sorted order, either ascending or descending depending on how the index was defined, and whether the scan part of the operation is forward or backward.  You cannot rely on this sort order in your queries of course (you must always specify an ORDER BY clause if order is important) but SQL Server can make use of the sort order internally.  In the four queries above, the query optimizer was able to avoid an explicit Sort operator to honour the ORDER BY clause, for example. Multiple Seek Predicates As we saw yesterday, a single index seek plan operator can contain one or more seek predicates.  These seek predicates can either be all singleton seeks or all range scans – SQL Server does not mix them.  For example, you might expect the following query to contain two seek predicates, a singleton seek to find the single record in the unique index where key_col = 10, and a range scan to find the key_col values between 15 and 20: SELECT key_col FROM dbo.Example WHERE key_col = 10 OR key_col BETWEEN 15 AND 20 ORDER BY key_col ASC ; In fact, SQL Server transforms the singleton seek (key_col = 10) to the equivalent range scan, Start:[key_col >= 10], End:[key_col <= 10].  This allows both range scans to be evaluated by a single seek operator.  To be clear, this query results in two range scans: one from 10 to 10, and one from 15 to 20. Final Thoughts That’s it for today – tomorrow we’ll look at monitoring singleton lookups and range scans, and I’ll show you a seek on a heap table. Yes, a seek.  On a heap.  Not an index! If you would like to run the queries in this post for yourself, there’s a script below.  Thanks for reading! IF OBJECT_ID(N'dbo.Example', N'U') IS NOT NULL BEGIN DROP TABLE dbo.Example; END ; -- Test table is a heap -- Non-clustered primary key on 'key_col' CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ; -- Non-unique non-clustered index on the 'data' column CREATE NONCLUSTERED INDEX [IX dbo.Example data] ON dbo.Example (data) ; -- Add 100 rows INSERT dbo.Example WITH (TABLOCKX) ( key_col, data ) SELECT key_col = V.number, data = V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 100 ; -- ================ -- Singleton lookup -- ================ ; -- Single value equality seek in a unique index -- Scan count = 0 when STATISTIS IO is ON -- Check the XML SHOWPLAN SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col = 32 ; -- =========== -- Range Scans -- =========== ; -- Query 1 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col <= 5 ORDER BY E.key_col ASC ; -- Query 2 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col > 95 ORDER BY E.key_col DESC ; -- Query 3 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col >= 10 AND E.key_col < 15 ORDER BY E.key_col ASC ; -- Query 4 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col >= 20 AND E.key_col < 25 ORDER BY E.key_col DESC ; -- Final query (singleton + range = 2 range scans) SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col = 10 OR E.key_col BETWEEN 15 AND 20 ORDER BY E.key_col ASC ; -- === TIDY UP === DROP TABLE dbo.Example; © 2011 Paul White email: [email protected] twitter: @SQL_Kiwi

    Read the article

  • Hello Operator, My Switch Is Bored

    - by Paul White
    This is a post for T-SQL Tuesday #43 hosted by my good friend Rob Farley. The topic this month is Plan Operators. I haven’t taken part in T-SQL Tuesday before, but I do like to write about execution plans, so this seemed like a good time to start. This post is in two parts. The first part is primarily an excuse to use a pretty bad play on words in the title of this blog post (if you’re too young to know what a telephone operator or a switchboard is, I hate you). The second part of the post looks at an invisible query plan operator (so to speak). 1. My Switch Is Bored Allow me to present the rare and interesting execution plan operator, Switch: Books Online has this to say about Switch: Following that description, I had a go at producing a Fast Forward Cursor plan that used the TOP operator, but had no luck. That may be due to my lack of skill with cursors, I’m not too sure. The only application of Switch in SQL Server 2012 that I am familiar with requires a local partitioned view: CREATE TABLE dbo.T1 (c1 int NOT NULL CHECK (c1 BETWEEN 00 AND 24)); CREATE TABLE dbo.T2 (c1 int NOT NULL CHECK (c1 BETWEEN 25 AND 49)); CREATE TABLE dbo.T3 (c1 int NOT NULL CHECK (c1 BETWEEN 50 AND 74)); CREATE TABLE dbo.T4 (c1 int NOT NULL CHECK (c1 BETWEEN 75 AND 99)); GO CREATE VIEW V1 AS SELECT c1 FROM dbo.T1 UNION ALL SELECT c1 FROM dbo.T2 UNION ALL SELECT c1 FROM dbo.T3 UNION ALL SELECT c1 FROM dbo.T4; Not only that, but it needs an updatable local partitioned view. We’ll need some primary keys to meet that requirement: ALTER TABLE dbo.T1 ADD CONSTRAINT PK_T1 PRIMARY KEY (c1);   ALTER TABLE dbo.T2 ADD CONSTRAINT PK_T2 PRIMARY KEY (c1);   ALTER TABLE dbo.T3 ADD CONSTRAINT PK_T3 PRIMARY KEY (c1);   ALTER TABLE dbo.T4 ADD CONSTRAINT PK_T4 PRIMARY KEY (c1); We also need an INSERT statement that references the view. Even more specifically, to see a Switch operator, we need to perform a single-row insert (multi-row inserts use a different plan shape): INSERT dbo.V1 (c1) VALUES (1); And now…the execution plan: The Constant Scan manufactures a single row with no columns. The Compute Scalar works out which partition of the view the new value should go in. The Assert checks that the computed partition number is not null (if it is, an error is returned). The Nested Loops Join executes exactly once, with the partition id as an outer reference (correlated parameter). The Switch operator checks the value of the parameter and executes the corresponding input only. If the partition id is 0, the uppermost Clustered Index Insert is executed, adding a row to table T1. If the partition id is 1, the next lower Clustered Index Insert is executed, adding a row to table T2…and so on. In case you were wondering, here’s a query and execution plan for a multi-row insert to the view: INSERT dbo.V1 (c1) VALUES (1), (2); Yuck! An Eager Table Spool and four Filters! I prefer the Switch plan. My guess is that almost all the old strategies that used a Switch operator have been replaced over time, using things like a regular Concatenation Union All combined with Start-Up Filters on its inputs. Other new (relative to the Switch operator) features like table partitioning have specific execution plan support that doesn’t need the Switch operator either. This feels like a bit of a shame, but perhaps it is just nostalgia on my part, it’s hard to know. Please do let me know if you encounter a query that can still use the Switch operator in 2012 – it must be very bored if this is the only possible modern usage! 2. Invisible Plan Operators The second part of this post uses an example based on a question Dave Ballantyne asked using the SQL Sentry Plan Explorer plan upload facility. If you haven’t tried that yet, make sure you’re on the latest version of the (free) Plan Explorer software, and then click the Post to SQLPerformance.com button. That will create a site question with the query plan attached (which can be anonymized if the plan contains sensitive information). Aaron Bertrand and I keep a close eye on questions there, so if you have ever wanted to ask a query plan question of either of us, that’s a good way to do it. The problem The issue I want to talk about revolves around a query issued against a calendar table. The script below creates a simplified version and adds 100 years of per-day information to it: USE tempdb; GO CREATE TABLE dbo.Calendar ( dt date NOT NULL, isWeekday bit NOT NULL, theYear smallint NOT NULL,   CONSTRAINT PK__dbo_Calendar_dt PRIMARY KEY CLUSTERED (dt) ); GO -- Monday is the first day of the week for me SET DATEFIRST 1;   -- Add 100 years of data INSERT dbo.Calendar WITH (TABLOCKX) (dt, isWeekday, theYear) SELECT CA.dt, isWeekday = CASE WHEN DATEPART(WEEKDAY, CA.dt) IN (6, 7) THEN 0 ELSE 1 END, theYear = YEAR(CA.dt) FROM Sandpit.dbo.Numbers AS N CROSS APPLY ( VALUES (DATEADD(DAY, N.n - 1, CONVERT(date, '01 Jan 2000', 113))) ) AS CA (dt) WHERE N.n BETWEEN 1 AND 36525; The following query counts the number of weekend days in 2013: SELECT Days = COUNT_BIG(*) FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; It returns the correct result (104) using the following execution plan: The query optimizer has managed to estimate the number of rows returned from the table exactly, based purely on the default statistics created separately on the two columns referenced in the query’s WHERE clause. (Well, almost exactly, the unrounded estimate is 104.289 rows.) There is already an invisible operator in this query plan – a Filter operator used to apply the WHERE clause predicates. We can see it by re-running the query with the enormously useful (but undocumented) trace flag 9130 enabled: Now we can see the full picture. The whole table is scanned, returning all 36,525 rows, before the Filter narrows that down to just the 104 we want. Without the trace flag, the Filter is incorporated in the Clustered Index Scan as a residual predicate. It is a little bit more efficient than using a separate operator, but residual predicates are still something you will want to avoid where possible. The estimates are still spot on though: Anyway, looking to improve the performance of this query, Dave added the following filtered index to the Calendar table: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear) WHERE isWeekday = 0; The original query now produces a much more efficient plan: Unfortunately, the estimated number of rows produced by the seek is now wrong (365 instead of 104): What’s going on? The estimate was spot on before we added the index! Explanation You might want to grab a coffee for this bit. Using another trace flag or two (8606 and 8612) we can see that the cardinality estimates were exactly right initially: The highlighted information shows the initial cardinality estimates for the base table (36,525 rows), the result of applying the two relational selects in our WHERE clause (104 rows), and after performing the COUNT_BIG(*) group by aggregate (1 row). All of these are correct, but that was before cost-based optimization got involved :) Cost-based optimization When cost-based optimization starts up, the logical tree above is copied into a structure (the ‘memo’) that has one group per logical operation (roughly speaking). The logical read of the base table (LogOp_Get) ends up in group 7; the two predicates (LogOp_Select) end up in group 8 (with the details of the selections in subgroups 0-6). These two groups still have the correct cardinalities as trace flag 8608 output (initial memo contents) shows: During cost-based optimization, a rule called SelToIdxStrategy runs on group 8. It’s job is to match logical selections to indexable expressions (SARGs). It successfully matches the selections (theYear = 2013, is Weekday = 0) to the filtered index, and writes a new alternative into the memo structure. The new alternative is entered into group 8 as option 1 (option 0 was the original LogOp_Select): The new alternative is to do nothing (PhyOp_NOP = no operation), but to instead follow the new logical instructions listed below the NOP. The LogOp_GetIdx (full read of an index) goes into group 21, and the LogOp_SelectIdx (selection on an index) is placed in group 22, operating on the result of group 21. The definition of the comparison ‘the Year = 2013’ (ScaOp_Comp downwards) was already present in the memo starting at group 2, so no new memo groups are created for that. New Cardinality Estimates The new memo groups require two new cardinality estimates to be derived. First, LogOp_Idx (full read of the index) gets a predicted cardinality of 10,436. This number comes from the filtered index statistics: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH STAT_HEADER; The second new cardinality derivation is for the LogOp_SelectIdx applying the predicate (theYear = 2013). To get a number for this, the cardinality estimator uses statistics for the column ‘theYear’, producing an estimate of 365 rows (there are 365 days in 2013!): DBCC SHOW_STATISTICS (Calendar, theYear) WITH HISTOGRAM; This is where the mistake happens. Cardinality estimation should have used the filtered index statistics here, to get an estimate of 104 rows: DBCC SHOW_STATISTICS (Calendar, Weekends) WITH HISTOGRAM; Unfortunately, the logic has lost sight of the link between the read of the filtered index (LogOp_GetIdx) in group 22, and the selection on that index (LogOp_SelectIdx) that it is deriving a cardinality estimate for, in group 21. The correct cardinality estimate (104 rows) is still present in the memo, attached to group 8, but that group now has a PhyOp_NOP implementation. Skipping over the rest of cost-based optimization (in a belated attempt at brevity) we can see the optimizer’s final output using trace flag 8607: This output shows the (incorrect, but understandable) 365 row estimate for the index range operation, and the correct 104 estimate still attached to its PhyOp_NOP. This tree still has to go through a few post-optimizer rewrites and ‘copy out’ from the memo structure into a tree suitable for the execution engine. One step in this process removes PhyOp_NOP, discarding its 104-row cardinality estimate as it does so. To finish this section on a more positive note, consider what happens if we add an OVER clause to the query aggregate. This isn’t intended to be a ‘fix’ of any sort, I just want to show you that the 104 estimate can survive and be used if later cardinality estimation needs it: SELECT Days = COUNT_BIG(*) OVER () FROM dbo.Calendar AS C WHERE theYear = 2013 AND isWeekday = 0; The estimated execution plan is: Note the 365 estimate at the Index Seek, but the 104 lives again at the Segment! We can imagine the lost predicate ‘isWeekday = 0’ as sitting between the seek and the segment in an invisible Filter operator that drops the estimate from 365 to 104. Even though the NOP group is removed after optimization (so we don’t see it in the execution plan) bear in mind that all cost-based choices were made with the 104-row memo group present, so although things look a bit odd, it shouldn’t affect the optimizer’s plan selection. I should also mention that we can work around the estimation issue by including the index’s filtering columns in the index key: CREATE NONCLUSTERED INDEX Weekends ON dbo.Calendar(theYear, isWeekday) WHERE isWeekday = 0 WITH (DROP_EXISTING = ON); There are some downsides to doing this, including that changes to the isWeekday column may now require Halloween Protection, but that is unlikely to be a big problem for a static calendar table ;)  With the updated index in place, the original query produces an execution plan with the correct cardinality estimation showing at the Index Seek: That’s all for today, remember to let me know about any Switch plans you come across on a modern instance of SQL Server! Finally, here are some other posts of mine that cover other plan operators: Segment and Sequence Project Common Subexpression Spools Why Plan Operators Run Backwards Row Goals and the Top Operator Hash Match Flow Distinct Top N Sort Index Spools and Page Splits Singleton and Range Seeks Bitmaps Hash Join Performance Compute Scalar © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

    Read the article

  • ASP.NET/IIS Fix: The 'Microsoft.Jet.OLEDB.4.0' provider is not registered on the local machine.

    - by Ken Cox [MVP]
    In my latest ASP.NET project, I refresh the sample data using an Excel spreadsheet from the client. After upgrading to Windows Server 2008 R2, I suddenly discovered this error: The 'Microsoft.Jet.OLEDB.4.0' provider is not registered on the local machine. The error message is totally bogus! The problem is that I’m running IIS on a 64-bit machine and the ol’ OLEDB thingy just isn’t up with the times. To fix it, go into the IIS Manager and find out which Application Pool the site is using. In my case...(read more)

    Read the article

  • Fix: Azure Disabled over 49 cents? Beware of using a Java Virtual Machine on Microsoft Azure

    - by Ken Cox [MVP]
    I love my MSDN Azure account. I can spin up a demo/dev app or VM in seconds. In fact, it is so easy to create a virtual machine that Azure shut down my whole account! Last night I spun up a Java Virtual Machine to play with some Android stuff. My mistake was that I didn’t read the Virtual Machine pricing warning: “I have a MSDN Azure Benefit subscription. Can I use my monthly Azure credits to purchase Oracle software?” “No, Azure credits in our MSDN offers are not applicable to Oracle software. In order to purchase Oracle software in the MSDN Azure Benefit subscription, customers need to turn off their {0} spending limit and pay at the regular pay-as-you-go rate. Otherwise, Oracle usage will hit the {1} spending limit and the subscription will be immediately disabled.”  Immediately disabled? Yup. Everything connected to the subscription was shut off, deallocated, rendered useless - even the free Web sites and the free Sendgrid email service.  The fix? I had to remove the spending limit from my account so I could pay $0.49 (49 cents) for the JVM usage. I still had $134.10 in credits remaining for regular usage with 6 days left in the billing month.  Now the restoration/clean-up begins… figuring out how to get the web sites and services back online.  To me, the preferable way would be for Azure to warn me when setting up a JVM that I had no way of paying for the service. In the alternative, shut down just the offending services – the ones that can’t be covered by the regular credits. What a mess.

    Read the article

  • Ruby but not Rails on my Resume

    - by Ken Bloom
    I have listed Ruby as a skill on my resume becuase I've been programming in Ruby for 5 years while I work on my Ph.D. thesis. I've mostly been using it to implement natural language processing algorithms. I'm starting to look for a job, and I posted my resume to a few sites (as an extra bonus when applying to certain on-target jobs). Now I get recruiters calling me to offer me Ruby on Rails jobs. The problem is that I've never learned Rails. It was never relevant to what I'm doing for my Ph.D. How do you recommend handling this situation to avoid wasting my time and theirs? (And learning Rails probably isn't an option until I finish my thesis.) Can my resume be adjusted to make this clearer? Should it be adjusted? Should I just politely tell them on the phone that I don't know Rails? By the way, the relevant part of my resume simply says: Skills: Programming Languages: C, C++, Java, Scala, Ruby, LaTeX Databases: MySQL, XML, XPath and lists a few other skill areas that couldn't possibly be confused with a Rails developer.

    Read the article

  • EF 4 Pluralization Update

    - by Ken Cox [MVP]
    I previously wrote about playing with EF 4’s PluralizationService class . Now that OrcsWeb is running ASP.NET 4, you can play with my little pluralization page and its WCF service online. The source code (such as it is!) can be downloaded from the MSDN Code Gallery here: http://code.msdn.microsoft.com/PluralizationService BTW, one annoyance is that the WDSL still includes the default namespace:  namespace="http://tempuri.org/" I swatted a couple of these instances, but if you know...(read more)

    Read the article

  • Operation MVC

    - by Ken Lovely, MCSE, MCDBA, MCTS
    It was time to create a new site. I figured VS 2010 is out so I should write it using MVC and Entity Framework. I have been very happy with MVC. My boss has had me making an administration web site in MVC2 but using 2008. I think one of the greatest features of MVC is you get to work with root of the app. It is kind of like being an iron worker; you get to work with the metal, mold it from scratch. Getting my articles out of my database and onto web pages was by far easier with MVC than it was with regular ASP.NET. This code is what I use to post the article to that page. It's pretty straightforward. The link in the menu is passes the id which is simply the url to the page. It looks for that url in the database and returns the rest of the article.   DataResults dr = new DataResults(); string title = string.Empty; string article = string.Empty; foreach (var D in dr.ReturnArticle(ViewData["PageName"].ToString())) { title = D.Title; article = D.Article; } public   List<CurrentArticle> ReturnArticle(string id) { var resultlist = new List<CurrentArticle>(); DBDataContext context = new DBDataContext(); var results = from D in context.MyContents where D.MVCURL.Contains(id) select D;foreach (var result in results) { CurrentArticle ca = new CurrentArticle(); ca.Title = result.Title; ca.Article = result.Article; ca.Summary = result.Summary; resultlist.Add(ca); } return resultlist;}

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >