clustered - Page 4 - Developer IT

Has anyone installed NServiceBus onto a Microsoft clustered server?

- by David

I am trying to install NServiceBus onto a clustered win2k3 host. The configuration utility provided (runner.exe) throw some errors that I did not catch, and it now runs correctly. When running NServiceBus.Host.exe i am get this error repeatedly: System.Transactions.TransactionAbortedException: The transaction has aborted. --- System.Transactions.TransactionManagerCommunicationException: Communication with the underlying transaction manager has failed. --- System.Runtime.InteropServices.COMException (0x8004D01B): The Transaction Manager is not available. (Exception from HRESULT: 0x8004D01B) at System.Transactions.Oletx.IDtcProxyShimFactory.ConnectToProxy(String nodeName, Guid resourceManagerIdentifier, IntPtr managedIdentifier, Boolean& nodeNameMatches, UInt32& whereaboutsSize, CoTaskMemHandle& whereaboutsBuffer, IResourceManagerShim& resourceManagerShim) at System.Transactions.Oletx.DtcTransactionManager.Initialize() --- End of inner exception stack trace --- at System.Transactions.Oletx.OletxTransactionManager.ProxyException(COMException comException) at System.Transactions.Oletx.DtcTransactionManager.Initialize() at System.Transactions.Oletx.DtcTransactionManager.get_ProxyShimFactory() at System.Transactions.Oletx.OletxTransactionManager.CreateTransaction(TransactionOptions properties) at System.Transactions.TransactionStatePromoted.EnterState(InternalTransaction tx) --- End of inner exception stack trace --- at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx) at System.Transactions.EnlistableStates.Promote(InternalTransaction tx) at System.Transactions.Transaction.Promote() at System.Transactions.TransactionInterop.ConvertToOletxTransaction(Transaction transaction) at System.Transactions.TransactionInterop.GetDtcTransaction(Transaction transaction) at System.Messaging.MessageQueue.StaleSafeReceiveMessage(UInt32 timeout, Int32 action, MQPROPS properties, NativeOverlapped* overlapped, ReceiveCallback receiveCallback, CursorHandle cursorHandle, IntPtr transaction) at System.Messaging.MessageQueue.ReceiveCurrent(TimeSpan timeout, Int32 action, CursorHandle cursor, MessagePropertyFilter filter, MessageQueueTransaction internalTransaction, MessageQueueTransactionType transactionType) at System.Messaging.MessageQueue.Receive(TimeSpan timeout, MessageQueueTransactionType transactionType) at NServiceBus.Unicast.Transport.Msmq.MsmqTransport.ReceiveMessageFromQueueAfterPeekWasSuccessful() in d:\BuildAgent-02\work\672d81652eaca4e1\src\impl\unicast\NServiceBus.Unicast.Msmq\MsmqTransport.cs:line 551 Has anyone successfully put NServiceBus onto a clustered server, if so, how did you get it working?

Read the article

Moving a single Windows 2008 box to a clustered implementation?

- by chris

I have a system that currently runs on a single box, Windows 2008 Enterprise. This is just used as a web server. What is involved in creating a cluster? Basically doing this for availability reasons - the load on the system will be pretty light.

Read the article

How can I uninstall a clustered SQL instance if the cluster has been destroyed?

- by Bob

First time going through this scenario, and apparently I did it very wrong. On the DB servers I deleted the cluster group that held SQL and Reporting Services. I then destroyed the cluster. Then I tried to uninstall SQL. No dice. SQL still thinks its part of the non-existant cluster and will not let me uninstall it. I went into the Maintenance menu of the SQL setup and tried to Remove Node...nope. Unless I find a way out of this I will have to rebuild the OS if I can't get SQL off the box.

Read the article

Adding a clustered index to a SQL table: what dangers exist for a live production system?

- by MoSlo

Right, keep in mind i need to describe this by abstracting all possible confidential info: I've been put in charge of a 10-year old transactional system of which the majority business logic is implemented at database level (triggers, stored procedures etc). Win2000 server, MSSQL 2000 Enterprise. No immediate plans for replacing/updating the system are being considered :( The core process is a program that executes transactions - specifically, it executes a stored procedure with various parameters, lets call it sp_ProcessTrans. The program executes the stored procedure at asynchronous intervals. By itself, things work fine. But there are 30 instances of this program on remotely located workstations, all of them asynchronously executing sp_ProcessTrans and then retrieving data from the SQL server (execution is pretty regular - ranging 0 to 60 times a minute, depending on what items the program instance is responsible for) . Performance of the system has dropped considerably with 10 yrs of data growth: the reason is the deadlocks and specifically deadlock wait times. The deadlock is on the Employee table. I have discovered: In sp_ProcessTrans' execution, it selects from an Employee table 7 times (dont ask) The select is done on a field that is NOT the primary key No index exists on this field. Thus a table scan is performed. 7 times. per transaction So the reason for deadlocks is clear. I created a non-unique ordered clustered index on the field (field looks good, almost unique, NUM(7), very rarely changes). Immediate improvement in the test environment. The problem is that i cannot simulate the deadlocks in a test environment (I'd need 30 workstations; i'd need to simulate 'realistic' activity on those stations, so visualization is out). I need to know if i must schedule downtime. Creating an index shouldn't be a risky operation for MSSQL, but is there any danger (data corruption in transactions/select statements/extra wait time etc) to create this field index on the production database while the transactions are still taking place? (although i can select a time when transactions are fairly quiet through the 30 stations) Are there any hidden dangers i'm not seeing (not looking forward to needing to restore the DB if something goes wrong, restoring would take a lot of time with 10yrs of data).

Read the article

Whether to use UNION or OR in SQL Server Queries

- by Dinesh Asanka

Recently I came across with an article on DB2 about using Union instead of OR. So I thought of carrying out a research on SQL Server on what scenarios UNION is optimal in and which scenarios OR would be best. I will analyze this with a few scenarios using samples taken from the AdventureWorks database Sales.SalesOrderDetail table. Scenario 1: Selecting all columns So we are going to select all columns and you have a non-clustered index on the ProductID column. --Query 1 : OR SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR ProductID =709 OR ProductID =998 OR ProductID =875 OR ProductID =976 OR ProductID =874 --Query 2 : UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 709 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 998 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 875 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 976 UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 874 So query 1 is using OR and the later is using UNION. Let us analyze the execution plans for these queries. Query 1 Query 2 As expected Query 1 will use Clustered Index Scan but Query 2, uses all sorts of things. In this case, since it is using multiple CPUs you might have CX_PACKET waits as well. Let’s look at the profiler results for these two queries: CPU Reads Duration Row Counts OR 78 1252 389 3854 UNION 250 7495 660 3854 You can see from the above table the UNION query is not performing well as the OR query though both are retuning same no of rows (3854).These results indicate that, for the above scenario UNION should be used. Scenario 2: Non-Clustered and Clustered Index Columns only --Query 1 : OR SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR ProductID =709 OR ProductID =998 OR ProductID =875 OR ProductID =976 OR ProductID =874 GO --Query 2 : UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 709 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 998 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 875 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 976 UNION SELECT ProductID,SalesOrderID, SalesOrderDetailID FROM Sales.SalesOrderDetail WHERE ProductID = 874 GO So this time, we will be selecting only index columns, which means these queries will avoid a data page lookup. As in the previous case we will analyze the execution plans: Query 1 Query 2 Again, Query 2 is more complex than Query 1. Let us look at the profile analysis: CPU Reads Duration Row Counts OR 0 24 208 3854 UNION 0 38 193 3854 In this analyzis, there is only slight difference between OR and UNION. Scenario 3: Selecting all columns for different fields Up to now, we were using only one column (ProductID) in the where clause. What if we have two columns for where clauses and let us assume both are covered by non-clustered indexes? --Query 1 : OR SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR CarrierTrackingNumber LIKE 'D0B8%' --Query 2 : UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT * FROM Sales.SalesOrderDetail WHERE CarrierTrackingNumber LIKE 'D0B8%' Query 1 Query 2: As we can see, the query plan for the second query has improved. Let us see the profiler results. CPU Reads Duration Row Counts OR 47 1278 443 1228 UNION 31 1334 400 1228 So in this case too, there is little difference between OR and UNION. Scenario 4: Selecting Clustered index columns for different fields Now let us go only with clustered indexes: --Query 1 : OR SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 OR CarrierTrackingNumber LIKE 'D0B8%' --Query 2 : UNION SELECT * FROM Sales.SalesOrderDetail WHERE ProductID = 714 UNION SELECT * FROM Sales.SalesOrderDetail WHERE CarrierTrackingNumber LIKE 'D0B8%' Query 1 Query 2 Now both execution plans are almost identical except is an additional Stream Aggregate is used in the first query. This means UNION has advantage over OR in this scenario. Let us see profiler results for these queries again. CPU Reads Duration Row Counts OR 0 319 366 1228 UNION 0 50 193 1228 Now see the differences, in this scenario UNION has somewhat of an advantage over OR. Conclusion Using UNION or OR depends on the scenario you are faced with. So you need to do your analyzing before selecting the appropriate method. Also, above the four scenarios are not all an exhaustive list of scenarios, I selected those for the broad description purposes only.

Read the article

Has anyone used UDS (Unified Development Server) on a clustered system?

- by gregturn

UDS (formerly known as Forte 4GL) is the platform our current system is running on. I remember finding a flag in a forum that documented how to deploy this on a Veritas cluster. However, now that we have modern hardware coming in, I can't find that note anywhere in Google.

Read the article

Can I open a clustered MQ queue for writing in perl?

- by JohnMcG

If I have a Websphere MQ queue defined on another queue manager in the cluster, is there a way I can open it for writing using the perl interface? Everything I've tried brings back mqrc 2085.

Read the article

SQL SERVER – Shrinking Database is Bad – Increases Fragmentation – Reduces Performance

- by pinaldave

Earlier, I had written two articles related to Shrinking Database. I wrote about why Shrinking Database is not good. SQL SERVER – SHRINKDATABASE For Every Database in the SQL Server SQL SERVER – What the Business Says Is Not What the Business Wants I received many comments on Why Database Shrinking is bad. Today we will go over a very interesting example that I have created for the same. Here are the quick steps of the example. Create a test database Create two tables and populate with data Check the size of both the tables Size of database is very low Check the Fragmentation of one table Fragmentation will be very low Truncate another table Check the size of the table Check the fragmentation of the one table Fragmentation will be very low SHRINK Database Check the size of the table Check the fragmentation of the one table Fragmentation will be very HIGH REBUILD index on one table Check the size of the table Size of database is very HIGH Check the fragmentation of the one table Fragmentation will be very low Here is the script for the same. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO Let us check the table size and fragmentation. Now let us TRUNCATE the table and check the size and Fragmentation. USE MASTER GO CREATE DATABASE ShrinkIsBed GO USE ShrinkIsBed GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Create FirstTable CREATE TABLE FirstTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_FirstTable_ID] ON FirstTable ( [ID] ASC ) ON [PRIMARY] GO -- Create SecondTable CREATE TABLE SecondTable (ID INT, FirstName VARCHAR(100), LastName VARCHAR(100), City VARCHAR(100)) GO -- Create Clustered Index on ID CREATE CLUSTERED INDEX [IX_SecondTable_ID] ON SecondTable ( [ID] ASC ) ON [PRIMARY] GO -- Insert One Hundred Thousand Records INSERT INTO FirstTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Insert One Hundred Thousand Records INSERT INTO SecondTable (ID,FirstName,LastName,City) SELECT TOP 100000 ROW_NUMBER() OVER (ORDER BY a.name) RowID, 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can clearly see that after TRUNCATE, the size of the database is not reduced and it is still the same as before TRUNCATE operation. After the Shrinking database operation, we were able to reduce the size of the database. If you notice the fragmentation, it is considerably high. The major problem with the Shrink operation is that it increases fragmentation of the database to very high value. Higher fragmentation reduces the performance of the database as reading from that particular table becomes very expensive. One of the ways to reduce the fragmentation is to rebuild index on the database. Let us rebuild the index and observe fragmentation and database size. -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REBUILD GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can notice that after rebuilding, Fragmentation reduces to a very low value (almost same to original value); however the database size increases way higher than the original. Before rebuilding, the size of the database was 5 MB, and after rebuilding, it is around 20 MB. Regular rebuilding the index is rebuild in the same user database where the index is placed. This usually increases the size of the database. Look at irony of the Shrinking database. One person shrinks the database to gain space (thinking it will help performance), which leads to increase in fragmentation (reducing performance). To reduce the fragmentation, one rebuilds index, which leads to size of the database to increase way more than the original size of the database (before shrinking). Well, by Shrinking, one did not gain what he was looking for usually. Rebuild indexing is not the best suggestion as that will create database grow again. I have always remembered the excellent post from Paul Randal regarding Shrinking the database is bad. I suggest every one to read that for accuracy and interesting conversation. Let us run following script where we Shrink the database and REORGANIZE. -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Shrink the Database DBCC SHRINKDATABASE (ShrinkIsBed); GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO -- Rebuild Index on FirstTable ALTER INDEX IX_SecondTable_ID ON SecondTable REORGANIZE GO -- Name of the Database and Size SELECT name, (size*8) Size_KB FROM sys.database_files GO -- Check Fragmentations in the database SELECT avg_fragmentation_in_percent, fragment_count FROM sys.dm_db_index_physical_stats (DB_ID(), OBJECT_ID('SecondTable'), NULL, NULL, 'LIMITED') GO You can see that REORGANIZE does not increase the size of the database or remove the fragmentation. Again, I no way suggest that REORGANIZE is the solution over here. This is purely observation using demo. Read the blog post of Paul Randal. Following script will clean up the database -- Clean up USE MASTER GO ALTER DATABASE ShrinkIsBed SET SINGLE_USER WITH ROLLBACK IMMEDIATE GO DROP DATABASE ShrinkIsBed GO There are few valid cases of the Shrinking database as well, but that is not covered in this blog post. We will cover that area some other time in future. Additionally, one can rebuild index in the tempdb as well, and we will also talk about the same in future. Brent has written a good summary blog post as well. Are you Shrinking your database? Well, when are you going to stop Shrinking it? Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Index, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article

SQL Server - Rebuilding Indexes

- by Renso

Goal: Rebuild indexes in SQL server. This can be done one at a time or with the example script below to rebuild all index for a specified table or for all tables in a given database. Why? The data in indexes gets fragmented over time. That means that as the index grows, the newly added rows to the index are physically stored in other sections of the allocated database storage space. Kind of like when you load your Christmas shopping into the trunk of your car and it is full you continue to load some on the back seat, in the same way some storage buffer is created for your index but once that runs out the data is then stored in other storage space and your data in your index is no longer stored in contiguous physical pages. To access the index the database manager has to "string together" disparate fragments to create the full-index and create one contiguous set of pages for that index. Defragmentation fixes that. What does the fragmentation affect?Depending of course on how large the table is and how fragmented the data is, can cause SQL Server to perform unnecessary data reads, slowing down SQL Server’s performance.Which index to rebuild?As a rule consider that when reorganize a table's clustered index, all other non-clustered indexes on that same table will automatically be rebuilt. A table can only have one clustered index.How to rebuild all the index for one table:The DBCC DBREINDEX command will not automatically rebuild all of the indexes on a given table in a databaseHow to rebuild all indexes for all tables in a given database:USE [myDB] -- enter your database name hereDECLARE @tableName varchar(255)DECLARE TableCursor CURSOR FORSELECT table_name FROM information_schema.tablesWHERE table_type = 'base table'OPEN TableCursorFETCH NEXT FROM TableCursor INTO @tableNameWHILE @@FETCH_STATUS = 0BEGINDBCC DBREINDEX(@tableName,' ',90) --a fill factor of 90%FETCH NEXT FROM TableCursor INTO @tableNameENDCLOSE TableCursorDEALLOCATE TableCursorWhat does this script do?Reindexes all indexes in all tables of the given database. Each index is filled with a fill factor of 90%. While the command DBCC DBREINDEX runs and rebuilds the indexes, that the table becomes unavailable for use by your users temporarily until the rebuild has completed, so don't do this during production hours as it will create a shared lock on the tables, although it will allow for read-only uncommitted data reads; i.e.e SELECT.What is the fill factor?Is the percentage of space on each index page for storing data when the index is created or rebuilt. It replaces the fill factor when the index was created, becoming the new default for the index and for any other nonclustered indexes rebuilt because a clustered index is rebuilt. When fillfactor is 0, DBCC DBREINDEX uses the fill factor value last specified for the index. This value is stored in the sys.indexes catalog view. If fillfactor is specified, table_name and index_name must be specified. If fillfactor is not specified, the default fill factor, 100, is used.How do I determine the level of fragmentation?Run the DBCC SHOWCONTIG command. However this requires you to specify the ID of both the table and index being. To make it a lot easier by only requiring you to specify the table name and/or index you can run this script:DECLARE@ID int,@IndexID int,@IndexName varchar(128)--Specify the table and index namesSELECT @IndexName = ‘index_name’ --name of the indexSET @ID = OBJECT_ID(‘table_name’) -- name of the tableSELECT @IndexID = IndIDFROM sysindexesWHERE id = @ID AND name = @IndexName--Show the level of fragmentationDBCC SHOWCONTIG (@id, @IndexID)Here is an example:DBCC SHOWCONTIG scanning 'Tickets' table...Table: 'Tickets' (1829581556); index ID: 1, database ID: 13TABLE level scan performed.- Pages Scanned................................: 915- Extents Scanned..............................: 119- Extent Switches..............................: 281- Avg. Pages per Extent........................: 7.7- Scan Density [Best Count:Actual Count].......: 40.78% [115:282]- Logical Scan Fragmentation ..................: 16.28%- Extent Scan Fragmentation ...................: 99.16%- Avg. Bytes Free per Page.....................: 2457.0- Avg. Page Density (full).....................: 69.64%DBCC execution completed. If DBCC printed error messages, contact your system administrator.What's important here?The Scan Density; Ideally it should be 100%. As time goes by it drops as fragmentation occurs. When the level drops below 75%, you should consider re-indexing.Here are the results of the same table and clustered index after running the script:DBCC SHOWCONTIG scanning 'Tickets' table...Table: 'Tickets' (1829581556); index ID: 1, database ID: 13TABLE level scan performed.- Pages Scanned................................: 692- Extents Scanned..............................: 87- Extent Switches..............................: 86- Avg. Pages per Extent........................: 8.0- Scan Density [Best Count:Actual Count].......: 100.00% [87:87]- Logical Scan Fragmentation ..................: 0.00%- Extent Scan Fragmentation ...................: 22.99%- Avg. Bytes Free per Page.....................: 639.8- Avg. Page Density (full).....................: 92.10%DBCC execution completed. If DBCC printed error messages, contact your system administrator.What's different?The Scan Density has increased from 40.78% to 100%; no fragmentation on the clustered index. Note that since we rebuilt the clustered index, all other index were also rebuilt.

Read the article

SQL SERVER – Renaming Index – Index Naming Conventions

- by pinaldave

If you are regular reader of this blog, you must be aware of that there are two kinds of blog posts 1) I share what I learn recently 2) I share what I learn and request your participation. Today’s blog post is where I need your opinion to make this blog post a good reference for future. Background Story Recently I came across system where users have changed the name of the few of the table to match their new standard naming convention. The name of the table should be self explanatory and they should have explain their purpose without either opening it or reading documentations. Well, not every time this is possible but again this should be the goal of any database modeler. Well, I no way encourage the name of the tables to be too long like ‘ContainsDetailsofNewInvoices’. May be the name of the table should be ‘Invoices’ and table should contain a column with New/Processed bit filed to indicate if the invoice is processed or not (if necessary). Coming back to original story, the database had several tables of which the name were changed. Story Continues… To continue the story let me take simple example. There was a table with the name ’ReceivedInvoices’, it was changed to new name as ‘TblInvoices’. As per their new naming standard they had to prefix every talbe with the words ‘Tbl’ and prefix every view with the letters ‘Vw’. Personally I do not see any need of the prefix but again, that issue is not here to discuss. Now after changing the name of the table they faced very interesting situation. They had few indexes on the table which had name of the table. Let us take an example. Old Name of Table: ReceivedInvoice Old Name of Index: Index_ReceivedInvoice1 Here is the new names New Name of Table: TblInvoices New Name of Index: ??? Well, their dilemma was what should be the new naming convention of the Indexes. Here is a quick proposal of the Index naming convention. Do let me know your opinion. If Index is Primary Clustered Index: PK_TableName If Index is Non-clustered Index: IX_TableName_ColumnName1_ColumnName2… If Index is Unique Non-clustered Index: UX_TableName_ColumnName1_ColumnName2… If Index is Columnstore Non-clustered Index: CL_TableName Here ColumnName is the column on which index is created. As there can be only one Primary Key Index and Columnstore Index per table, they do not require ColumnName in the name of the index. The purpose of this new naming convention is to increase readability. When any user come across this index, without opening their properties or definition, user can will know the details of the index. T-SQL script to Rename Indexes Here is quick T-SQL script to rename Indexes EXEC sp_rename N'SchemaName.TableName.IndexName', N'New_IndexName', N'INDEX'; GO Your Contribute Please Well, the organization has already defined above four guidelines, personally I follow very similar guidelines too. I have seen many variations like adding prefixes CL for Clustered Index and NCL for Non-clustered Index. I have often seen many not using UX prefix for Unique Index but rather use generic IX prefix only. Now do you think if they have missed anything in the coding standard. Is NCI and CI prefixed required to additionally describe the index names. I have once received suggestion to even add fill factor in the index name – which I do not recommend at all. What do you think should be ideal name of the index, so it explains all the most important properties? Additionally, you are welcome to vote if you believe changing the name of index is just waste of time and energy. Note: The purpose of the blog post is to encourage all to participate with their ideas. I will write follow up blog posts in future compiling all the suggestions. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Index, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article

Changing the indexing on existing table in SQL Server 2000

- by Raj

Guys, Here is the scenario: SQL Server 2000 (8.0.2055) Table currently has 478 million rows of data. The Primary Key column is an INT with IDENTITY. There is an Unique Constraint imposed on two other columns with a Non-Clustered Index. This is a vendor application and we are only responsible for maintaining the DB. Now the vendor has recommended doing the following "to improve performance" Drop the PK and Clustered Index Drop the non-clustered index on the two columns with the UNIQUE CONSTRAINT Recreate the PK, with a NON-CLUSTERED index Create a CLUSTERED index on the two columns with the UNIQUE CONSTRAINT I am not convinced that this is the right thing to do. I have a number of concerns. By dropping the PK and indexes, you will be creating a heap with 478 million rows of data. Then creating a CLUSTERED INDEX on two columns would be a really mammoth task. Would creating another table with the same structure and new indexing scheme and then copying the data over, dropping the old table and renaming the new one be a better approach? I am also not sure how the stored procs will react. Will they continue using the cached execution plan, considering that they are not being explicitly recompiled. I am simply not able to understand what kind of "performance improvement" this change will provide. I think that this will actually have the reverse effect. All thoughts welcome. Thanks in advance, Raj

Read the article

Advanced TSQL Tuning: Why Internals Knowledge Matters

- by Paul White

There is much more to query tuning than reducing logical reads and adding covering nonclustered indexes. Query tuning is not complete as soon as the query returns results quickly in the development or test environments. In production, your query will compete for memory, CPU, locks, I/O and other resources on the server. Today’s entry looks at some tuning considerations that are often overlooked, and shows how deep internals knowledge can help you write better TSQL. As always, we’ll need some example data. In fact, we are going to use three tables today, each of which is structured like this: Each table has 50,000 rows made up of an INTEGER id column and a padding column containing 3,999 characters in every row. The only difference between the three tables is in the type of the padding column: the first table uses CHAR(3999), the second uses VARCHAR(MAX), and the third uses the deprecated TEXT type. A script to create a database with the three tables and load the sample data follows: USE master; GO IF DB_ID('SortTest') IS NOT NULL DROP DATABASE SortTest; GO CREATE DATABASE SortTest COLLATE LATIN1_GENERAL_BIN; GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest', SIZE = 3GB, MAXSIZE = 3GB ); GO ALTER DATABASE SortTest MODIFY FILE ( NAME = 'SortTest_log', SIZE = 256MB, MAXSIZE = 1GB, FILEGROWTH = 128MB ); GO ALTER DATABASE SortTest SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE SortTest SET AUTO_CLOSE OFF ; ALTER DATABASE SortTest SET AUTO_CREATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_SHRINK OFF ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS ON ; ALTER DATABASE SortTest SET AUTO_UPDATE_STATISTICS_ASYNC ON ; ALTER DATABASE SortTest SET PARAMETERIZATION SIMPLE ; ALTER DATABASE SortTest SET READ_COMMITTED_SNAPSHOT OFF ; ALTER DATABASE SortTest SET MULTI_USER ; ALTER DATABASE SortTest SET RECOVERY SIMPLE ; USE SortTest; GO CREATE TABLE dbo.TestCHAR ( id INTEGER IDENTITY (1,1) NOT NULL, padding CHAR(3999) NOT NULL, CONSTRAINT [PK dbo.TestCHAR (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestMAX ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAX (id)] PRIMARY KEY CLUSTERED (id), ) ; CREATE TABLE dbo.TestTEXT ( id INTEGER IDENTITY (1,1) NOT NULL, padding TEXT NOT NULL, CONSTRAINT [PK dbo.TestTEXT (id)] PRIMARY KEY CLUSTERED (id), ) ; -- ============= -- Load TestCHAR (about 3s) -- ============= INSERT INTO dbo.TestCHAR WITH (TABLOCKX) ( padding ) SELECT padding = REPLICATE(CHAR(65 + (Data.n % 26)), 3999) FROM ( SELECT TOP (50000) n = ROW_NUMBER() OVER (ORDER BY (SELECT 0)) - 1 FROM master.sys.columns C1, master.sys.columns C2, master.sys.columns C3 ORDER BY n ASC ) AS Data ORDER BY Data.n ASC ; -- ============ -- Load TestMAX (about 3s) -- ============ INSERT INTO dbo.TestMAX WITH (TABLOCKX) ( padding ) SELECT CONVERT(VARCHAR(MAX), padding) FROM dbo.TestCHAR ORDER BY id ; -- ============= -- Load TestTEXT (about 5s) -- ============= INSERT INTO dbo.TestTEXT WITH (TABLOCKX) ( padding ) SELECT CONVERT(TEXT, padding) FROM dbo.TestCHAR ORDER BY id ; -- ========== -- Space used -- ========== -- EXECUTE sys.sp_spaceused @objname = 'dbo.TestCHAR'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAX'; EXECUTE sys.sp_spaceused @objname = 'dbo.TestTEXT'; ; CHECKPOINT ; That takes around 15 seconds to run, and shows the space allocated to each table in its output: To illustrate the points I want to make today, the example task we are going to set ourselves is to return a random set of 150 rows from each table. The basic shape of the test query is the same for each of the three test tables: SELECT TOP (150) T.id, T.padding FROM dbo.Test AS T ORDER BY NEWID() OPTION (MAXDOP 1) ; Test 1 – CHAR(3999) Running the template query shown above using the TestCHAR table as the target, we find that the query takes around 5 seconds to return its results. This seems slow, considering that the table only has 50,000 rows. Working on the assumption that generating a GUID for each row is a CPU-intensive operation, we might try enabling parallelism to see if that speeds up the response time. Running the query again (but without the MAXDOP 1 hint) on a machine with eight logical processors, the query now takes 10 seconds to execute – twice as long as when run serially. Rather than attempting further guesses at the cause of the slowness, let’s go back to serial execution and add some monitoring. The script below monitors STATISTICS IO output and the amount of tempdb used by the test query. We will also run a Profiler trace to capture any warnings generated during query execution. DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TC.id, TC.padding FROM dbo.TestCHAR AS TC ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; Let’s take a closer look at the statistics and query plan generated from this: Following the flow of the data from right to left, we see the expected 50,000 rows emerging from the Clustered Index Scan, with a total estimated size of around 191MB. The Compute Scalar adds a column containing a random GUID (generated from the NEWID() function call) for each row. With this extra column in place, the size of the data arriving at the Sort operator is estimated to be 192MB. Sort is a blocking operator – it has to examine all of the rows on its input before it can produce its first row of output (the last row received might sort first). This characteristic means that Sort requires a memory grant – memory allocated for the query’s use by SQL Server just before execution starts. In this case, the Sort is the only memory-consuming operator in the plan, so it has access to the full 243MB (248,696KB) of memory reserved by SQL Server for this query execution. Notice that the memory grant is significantly larger than the expected size of the data to be sorted. SQL Server uses a number of techniques to speed up sorting, some of which sacrifice size for comparison speed. Sorts typically require a very large number of comparisons, so this is usually a very effective optimization. One of the drawbacks is that it is not possible to exactly predict the sort space needed, as it depends on the data itself. SQL Server takes an educated guess based on data types, sizes, and the number of rows expected, but the algorithm is not perfect. In spite of the large memory grant, the Profiler trace shows a Sort Warning event (indicating that the sort ran out of memory), and the tempdb usage monitor shows that 195MB of tempdb space was used – all of that for system use. The 195MB represents physical write activity on tempdb, because SQL Server strictly enforces memory grants – a query cannot ‘cheat’ and effectively gain extra memory by spilling to tempdb pages that reside in memory. Anyway, the key point here is that it takes a while to write 195MB to disk, and this is the main reason that the query takes 5 seconds overall. If you are wondering why using parallelism made the problem worse, consider that eight threads of execution result in eight concurrent partial sorts, each receiving one eighth of the memory grant. The eight sorts all spilled to tempdb, resulting in inefficiencies as the spilled sorts competed for disk resources. More importantly, there are specific problems at the point where the eight partial results are combined, but I’ll cover that in a future post. CHAR(3999) Performance Summary: 5 seconds elapsed time 243MB memory grant 195MB tempdb usage 192MB estimated sort set 25,043 logical reads Sort Warning Test 2 – VARCHAR(MAX) We’ll now run exactly the same test (with the additional monitoring) on the table using a VARCHAR(MAX) padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TM.id, TM.padding FROM dbo.TestMAX AS TM ORDER BY NEWID() OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query takes around 8 seconds to complete (3 seconds longer than Test 1). Notice that the estimated row and data sizes are very slightly larger, and the overall memory grant has also increased very slightly to 245MB. The most marked difference is in the amount of tempdb space used – this query wrote almost 391MB of sort run data to the physical tempdb file. Don’t draw any general conclusions about VARCHAR(MAX) versus CHAR from this – I chose the length of the data specifically to expose this edge case. In most cases, VARCHAR(MAX) performs very similarly to CHAR – I just wanted to make test 2 a bit more exciting. MAX Performance Summary: 8 seconds elapsed time 245MB memory grant 391MB tempdb usage 193MB estimated sort set 25,043 logical reads Sort warning Test 3 – TEXT The same test again, but using the deprecated TEXT data type for the padding column: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) TT.id, TT.padding FROM dbo.TestTEXT AS TT ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; This time the query runs in 500ms. If you look at the metrics we have been checking so far, it’s not hard to understand why: TEXT Performance Summary: 0.5 seconds elapsed time 9MB memory grant 5MB tempdb usage 5MB estimated sort set 207 logical reads 596 LOB logical reads Sort warning SQL Server’s memory grant algorithm still underestimates the memory needed to perform the sorting operation, but the size of the data to sort is so much smaller (5MB versus 193MB previously) that the spilled sort doesn’t matter very much. Why is the data size so much smaller? The query still produces the correct results – including the large amount of data held in the padding column – so what magic is being performed here? TEXT versus MAX Storage The answer lies in how columns of the TEXT data type are stored. By default, TEXT data is stored off-row in separate LOB pages – which explains why this is the first query we have seen that records LOB logical reads in its STATISTICS IO output. You may recall from my last post that LOB data leaves an in-row pointer to the separate storage structure holding the LOB data. SQL Server can see that the full LOB value is not required by the query plan until results are returned, so instead of passing the full LOB value down the plan from the Clustered Index Scan, it passes the small in-row structure instead. SQL Server estimates that each row coming from the scan will be 79 bytes long – 11 bytes for row overhead, 4 bytes for the integer id column, and 64 bytes for the LOB pointer (in fact the pointer is rather smaller – usually 16 bytes – but the details of that don’t really matter right now). OK, so this query is much more efficient because it is sorting a very much smaller data set – SQL Server delays retrieving the LOB data itself until after the Sort starts producing its 150 rows. The question that normally arises at this point is: Why doesn’t SQL Server use the same trick when the padding column is defined as VARCHAR(MAX)? The answer is connected with the fact that if the actual size of the VARCHAR(MAX) data is 8000 bytes or less, it is usually stored in-row in exactly the same way as for a VARCHAR(8000) column – MAX data only moves off-row into LOB storage when it exceeds 8000 bytes. The default behaviour of the TEXT type is to be stored off-row by default, unless the ‘text in row’ table option is set suitably and there is room on the page. There is an analogous (but opposite) setting to control the storage of MAX data – the ‘large value types out of row’ table option. By enabling this option for a table, MAX data will be stored off-row (in a LOB structure) instead of in-row. SQL Server Books Online has good coverage of both options in the topic In Row Data. The MAXOOR Table The essential difference, then, is that MAX defaults to in-row storage, and TEXT defaults to off-row (LOB) storage. You might be thinking that we could get the same benefits seen for the TEXT data type by storing the VARCHAR(MAX) values off row – so let’s look at that option now. This script creates a fourth table, with the VARCHAR(MAX) data stored off-row in LOB pages: CREATE TABLE dbo.TestMAXOOR ( id INTEGER IDENTITY (1,1) NOT NULL, padding VARCHAR(MAX) NOT NULL, CONSTRAINT [PK dbo.TestMAXOOR (id)] PRIMARY KEY CLUSTERED (id), ) ; EXECUTE sys.sp_tableoption @TableNamePattern = N'dbo.TestMAXOOR', @OptionName = 'large value types out of row', @OptionValue = 'true' ; SELECT large_value_types_out_of_row FROM sys.tables WHERE [schema_id] = SCHEMA_ID(N'dbo') AND name = N'TestMAXOOR' ; INSERT INTO dbo.TestMAXOOR WITH (TABLOCKX) ( padding ) SELECT SPACE(0) FROM dbo.TestCHAR ORDER BY id ; UPDATE TM WITH (TABLOCK) SET padding.WRITE (TC.padding, NULL, NULL) FROM dbo.TestMAXOOR AS TM JOIN dbo.TestCHAR AS TC ON TC.id = TM.id ; EXECUTE sys.sp_spaceused @objname = 'dbo.TestMAXOOR' ; CHECKPOINT ; Test 4 – MAXOOR We can now re-run our test on the MAXOOR (MAX out of row) table: DECLARE @read BIGINT, @write BIGINT ; SELECT @read = SUM(num_of_bytes_read), @write = SUM(num_of_bytes_written) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; SET STATISTICS IO ON ; SELECT TOP (150) MO.id, MO.padding FROM dbo.TestMAXOOR AS MO ORDER BY NEWID() OPTION (MAXDOP 1, RECOMPILE) ; SET STATISTICS IO OFF ; SELECT tempdb_read_MB = (SUM(num_of_bytes_read) - @read) / 1024. / 1024., tempdb_write_MB = (SUM(num_of_bytes_written) - @write) / 1024. / 1024., internal_use_MB = ( SELECT internal_objects_alloc_page_count / 128.0 FROM sys.dm_db_task_space_usage WHERE session_id = @@SPID ) FROM tempdb.sys.database_files AS DBF JOIN sys.dm_io_virtual_file_stats(2, NULL) AS FS ON FS.file_id = DBF.file_id WHERE DBF.type_desc = 'ROWS' ; TEXT Performance Summary: 0.3 seconds elapsed time 245MB memory grant 0MB tempdb usage 193MB estimated sort set 207 logical reads 446 LOB logical reads No sort warning The query runs very quickly – slightly faster than Test 3, and without spilling the sort to tempdb (there is no sort warning in the trace, and the monitoring query shows zero tempdb usage by this query). SQL Server is passing the in-row pointer structure down the plan and only looking up the LOB value on the output side of the sort. The Hidden Problem There is still a huge problem with this query though – it requires a 245MB memory grant. No wonder the sort doesn’t spill to tempdb now – 245MB is about 20 times more memory than this query actually requires to sort 50,000 records containing LOB data pointers. Notice that the estimated row and data sizes in the plan are the same as in test 2 (where the MAX data was stored in-row). The optimizer assumes that MAX data is stored in-row, regardless of the sp_tableoption setting ‘large value types out of row’. Why? Because this option is dynamic – changing it does not immediately force all MAX data in the table in-row or off-row, only when data is added or actually changed. SQL Server does not keep statistics to show how much MAX or TEXT data is currently in-row, and how much is stored in LOB pages. This is an annoying limitation, and one which I hope will be addressed in a future version of the product. So why should we worry about this? Excessive memory grants reduce concurrency and may result in queries waiting on the RESOURCE_SEMAPHORE wait type while they wait for memory they do not need. 245MB is an awful lot of memory, especially on 32-bit versions where memory grants cannot use AWE-mapped memory. Even on a 64-bit server with plenty of memory, do you really want a single query to consume 0.25GB of memory unnecessarily? That’s 32,000 8KB pages that might be put to much better use. The Solution The answer is not to use the TEXT data type for the padding column. That solution happens to have better performance characteristics for this specific query, but it still results in a spilled sort, and it is hard to recommend the use of a data type which is scheduled for removal. I hope it is clear to you that the fundamental problem here is that SQL Server sorts the whole set arriving at a Sort operator. Clearly, it is not efficient to sort the whole table in memory just to return 150 rows in a random order. The TEXT example was more efficient because it dramatically reduced the size of the set that needed to be sorted. We can do the same thing by selecting 150 unique keys from the table at random (sorting by NEWID() for example) and only then retrieving the large padding column values for just the 150 rows we need. The following script implements that idea for all four tables: SET STATISTICS IO ON ; WITH TestTable AS ( SELECT * FROM dbo.TestCHAR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id = ANY (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAX ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestTEXT ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; WITH TestTable AS ( SELECT * FROM dbo.TestMAXOOR ), TopKeys AS ( SELECT TOP (150) id FROM TestTable ORDER BY NEWID() ) SELECT T1.id, T1.padding FROM TestTable AS T1 WHERE T1.id IN (SELECT id FROM TopKeys) OPTION (MAXDOP 1) ; SET STATISTICS IO OFF ; All four queries now return results in much less than a second, with memory grants between 6 and 12MB, and without spilling to tempdb. The small remaining inefficiency is in reading the id column values from the clustered primary key index. As a clustered index, it contains all the in-row data at its leaf. The CHAR and VARCHAR(MAX) tables store the padding column in-row, so id values are separated by a 3999-character column, plus row overhead. The TEXT and MAXOOR tables store the padding values off-row, so id values in the clustered index leaf are separated by the much-smaller off-row pointer structure. This difference is reflected in the number of logical page reads performed by the four queries: Table 'TestCHAR' logical reads 25511 lob logical reads 000 Table 'TestMAX'. logical reads 25511 lob logical reads 000 Table 'TestTEXT' logical reads 00412 lob logical reads 597 Table 'TestMAXOOR' logical reads 00413 lob logical reads 446 We can increase the density of the id values by creating a separate nonclustered index on the id column only. This is the same key as the clustered index, of course, but the nonclustered index will not include the rest of the in-row column data. CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestCHAR (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAX (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestTEXT (id); CREATE UNIQUE NONCLUSTERED INDEX uq1 ON dbo.TestMAXOOR (id); The four queries can now use the very dense nonclustered index to quickly scan the id values, sort them by NEWID(), select the 150 ids we want, and then look up the padding data. The logical reads with the new indexes in place are: Table 'TestCHAR' logical reads 835 lob logical reads 0 Table 'TestMAX' logical reads 835 lob logical reads 0 Table 'TestTEXT' logical reads 686 lob logical reads 597 Table 'TestMAXOOR' logical reads 686 lob logical reads 448 With the new index, all four queries use the same query plan (click to enlarge): Performance Summary: 0.3 seconds elapsed time 6MB memory grant 0MB tempdb usage 1MB sort set 835 logical reads (CHAR, MAX) 686 logical reads (TEXT, MAXOOR) 597 LOB logical reads (TEXT) 448 LOB logical reads (MAXOOR) No sort warning I’ll leave it as an exercise for the reader to work out why trying to eliminate the Key Lookup by adding the padding column to the new nonclustered indexes would be a daft idea Conclusion This post is not about tuning queries that access columns containing big strings. It isn’t about the internal differences between TEXT and MAX data types either. It isn’t even about the cool use of UPDATE .WRITE used in the MAXOOR table load. No, this post is about something else: Many developers might not have tuned our starting example query at all – 5 seconds isn’t that bad, and the original query plan looks reasonable at first glance. Perhaps the NEWID() function would have been blamed for ‘just being slow’ – who knows. 5 seconds isn’t awful – unless your users expect sub-second responses – but using 250MB of memory and writing 200MB to tempdb certainly is! If ten sessions ran that query at the same time in production that’s 2.5GB of memory usage and 2GB hitting tempdb. Of course, not all queries can be rewritten to avoid large memory grants and sort spills using the key-lookup technique in this post, but that’s not the point either. The point of this post is that a basic understanding of execution plans is not enough. Tuning for logical reads and adding covering indexes is not enough. If you want to produce high-quality, scalable TSQL that won’t get you paged as soon as it hits production, you need a deep understanding of execution plans, and as much accurate, deep knowledge about SQL Server as you can lay your hands on. The advanced database developer has a wide range of tools to use in writing queries that perform well in a range of circumstances. By the way, the examples in this post were written for SQL Server 2008. They will run on 2005 and demonstrate the same principles, but you won’t get the same figures I did because 2005 had a rather nasty bug in the Top N Sort operator. Fair warning: if you do decide to run the scripts on a 2005 instance (particularly the parallel query) do it before you head out for lunch… This post is dedicated to the people of Christchurch, New Zealand. © 2011 Paul White email: @[email protected] twitter: @SQL_Kiwi

Read the article

Seeking on a Heap, and Two Useful DMVs

- by Paul White

So far in this mini-series on seeks and scans, we have seen that a simple ‘seek’ operation can be much more complex than it first appears. A seek can contain one or more seek predicates – each of which can either identify at most one row in a unique index (a singleton lookup) or a range of values (a range scan). When looking at a query plan, we will often need to look at the details of the seek operator in the Properties window to see how many operations it is performing, and what type of operation each one is. As you saw in the first post in this series, the number of hidden seeking operations can have an appreciable impact on performance. Measuring Seeks and Scans I mentioned in my last post that there is no way to tell from a graphical query plan whether you are seeing a singleton lookup or a range scan. You can work it out – if you happen to know that the index is defined as unique and the seek predicate is an equality comparison, but there’s no separate property that says ‘singleton lookup’ or ‘range scan’. This is a shame, and if I had my way, the query plan would show different icons for range scans and singleton lookups – perhaps also indicating whether the operation was one or more of those operations underneath the covers. In light of all that, you might be wondering if there is another way to measure how many seeks of either type are occurring in your system, or for a particular query. As is often the case, the answer is yes – we can use a couple of dynamic management views (DMVs): sys.dm_db_index_usage_stats and sys.dm_db_index_operational_stats. Index Usage Stats The index usage stats DMV contains counts of index operations from the perspective of the Query Executor (QE) – the SQL Server component that is responsible for executing the query plan. It has three columns that are of particular interest to us: user_seeks – the number of times an Index Seek operator appears in an executed plan user_scans – the number of times a Table Scan or Index Scan operator appears in an executed plan user_lookups – the number of times an RID or Key Lookup operator appears in an executed plan An operator is counted once per execution (generating an estimated plan does not affect the totals), so an Index Seek that executes 10,000 times in a single plan execution adds 1 to the count of user seeks. Even less intuitively, an operator is also counted once per execution even if it is not executed at all. I will show you a demonstration of each of these things later in this post. Index Operational Stats The index operational stats DMV contains counts of index and table operations from the perspective of the Storage Engine (SE). It contains a wealth of interesting information, but the two columns of interest to us right now are: range_scan_count – the number of range scans (including unrestricted full scans) on a heap or index structure singleton_lookup_count – the number of singleton lookups in a heap or index structure This DMV counts each SE operation, so 10,000 singleton lookups will add 10,000 to the singleton lookup count column, and a table scan that is executed 5 times will add 5 to the range scan count. The Test Rig To explore the behaviour of seeks and scans in detail, we will need to create a test environment. The scripts presented here are best run on SQL Server 2008 Developer Edition, but the majority of the tests will work just fine on SQL Server 2005. A couple of tests use partitioning, but these will be skipped if you are not running an Enterprise-equivalent SKU. Ok, first up we need a database: USE master; GO IF DB_ID('ScansAndSeeks') IS NOT NULL DROP DATABASE ScansAndSeeks; GO CREATE DATABASE ScansAndSeeks; GO USE ScansAndSeeks; GO ALTER DATABASE ScansAndSeeks SET ALLOW_SNAPSHOT_ISOLATION OFF ; ALTER DATABASE ScansAndSeeks SET AUTO_CLOSE OFF, AUTO_SHRINK OFF, AUTO_CREATE_STATISTICS OFF, AUTO_UPDATE_STATISTICS OFF, PARAMETERIZATION SIMPLE, READ_COMMITTED_SNAPSHOT OFF, RESTRICTED_USER ; Notice that several database options are set in particular ways to ensure we get meaningful and reproducible results from the DMVs. In particular, the options to auto-create and update statistics are disabled. There are also three stored procedures, the first of which creates a test table (which may or may not be partitioned). The table is pretty much the same one we used yesterday: The table has 100 rows, and both the key_col and data columns contain the same values – the integers from 1 to 100 inclusive. The table is a heap, with a non-clustered primary key on key_col, and a non-clustered non-unique index on the data column. The only reason I have used a heap here, rather than a clustered table, is so I can demonstrate a seek on a heap later on. The table has an extra column (not shown because I am too lazy to update the diagram from yesterday) called padding – a CHAR(100) column that just contains 100 spaces in every row. It’s just there to discourage SQL Server from choosing table scan over an index + RID lookup in one of the tests. The first stored procedure is called ResetTest: CREATE PROCEDURE dbo.ResetTest @Partitioned BIT = 'false' AS BEGIN SET NOCOUNT ON ; IF OBJECT_ID(N'dbo.Example', N'U') IS NOT NULL BEGIN DROP TABLE dbo.Example; END ; -- Test table is a heap -- Non-clustered primary key on 'key_col' CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, padding CHAR(100) NOT NULL DEFAULT SPACE(100), CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ; IF @Partitioned = 'true' BEGIN -- Enterprise, Trial, or Developer -- required for partitioning tests IF SERVERPROPERTY('EngineEdition') = 3 BEGIN EXECUTE (' DROP TABLE dbo.Example ; IF EXISTS ( SELECT 1 FROM sys.partition_schemes WHERE name = N''PS'' ) DROP PARTITION SCHEME PS ; IF EXISTS ( SELECT 1 FROM sys.partition_functions WHERE name = N''PF'' ) DROP PARTITION FUNCTION PF ; CREATE PARTITION FUNCTION PF (INTEGER) AS RANGE RIGHT FOR VALUES (20, 40, 60, 80, 100) ; CREATE PARTITION SCHEME PS AS PARTITION PF ALL TO ([PRIMARY]) ; CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, padding CHAR(100) NOT NULL DEFAULT SPACE(100), CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ON PS (key_col); '); END ELSE BEGIN RAISERROR('Invalid SKU for partition test', 16, 1); RETURN; END; END ; -- Non-unique non-clustered index on the 'data' column CREATE NONCLUSTERED INDEX [IX dbo.Example data] ON dbo.Example (data) ; -- Add 100 rows INSERT dbo.Example WITH (TABLOCKX) ( key_col, data ) SELECT key_col = V.number, data = V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 100 ; END; GO The second stored procedure, ShowStats, displays information from the Index Usage Stats and Index Operational Stats DMVs: CREATE PROCEDURE dbo.ShowStats @Partitioned BIT = 'false' AS BEGIN -- Index Usage Stats DMV (QE) SELECT index_name = ISNULL(I.name, I.type_desc), scans = IUS.user_scans, seeks = IUS.user_seeks, lookups = IUS.user_lookups FROM sys.dm_db_index_usage_stats AS IUS JOIN sys.indexes AS I ON I.object_id = IUS.object_id AND I.index_id = IUS.index_id WHERE IUS.database_id = DB_ID(N'ScansAndSeeks') AND IUS.object_id = OBJECT_ID(N'dbo.Example', N'U') ORDER BY I.index_id ; -- Index Operational Stats DMV (SE) IF @Partitioned = 'true' SELECT index_name = ISNULL(I.name, I.type_desc), partitions = COUNT(IOS.partition_number), range_scans = SUM(IOS.range_scan_count), single_lookups = SUM(IOS.singleton_lookup_count) FROM sys.dm_db_index_operational_stats ( DB_ID(N'ScansAndSeeks'), OBJECT_ID(N'dbo.Example', N'U'), NULL, NULL ) AS IOS JOIN sys.indexes AS I ON I.object_id = IOS.object_id AND I.index_id = IOS.index_id GROUP BY I.index_id, -- Key I.name, I.type_desc ORDER BY I.index_id; ELSE SELECT index_name = ISNULL(I.name, I.type_desc), range_scans = SUM(IOS.range_scan_count), single_lookups = SUM(IOS.singleton_lookup_count) FROM sys.dm_db_index_operational_stats ( DB_ID(N'ScansAndSeeks'), OBJECT_ID(N'dbo.Example', N'U'), NULL, NULL ) AS IOS JOIN sys.indexes AS I ON I.object_id = IOS.object_id AND I.index_id = IOS.index_id GROUP BY I.index_id, -- Key I.name, I.type_desc ORDER BY I.index_id; END; The final stored procedure, RunTest, executes a query written against the example table: CREATE PROCEDURE dbo.RunTest @SQL VARCHAR(8000), @Partitioned BIT = 'false' AS BEGIN -- No execution plan yet SET STATISTICS XML OFF ; -- Reset the test environment EXECUTE dbo.ResetTest @Partitioned ; -- Previous call will throw an error if a partitioned -- test was requested, but SKU does not support it IF @@ERROR = 0 BEGIN -- IO statistics and plan on SET STATISTICS XML, IO ON ; -- Test statement EXECUTE (@SQL) ; -- Plan and IO statistics off SET STATISTICS XML, IO OFF ; EXECUTE dbo.ShowStats @Partitioned; END; END; The Tests The first test is a simple scan of the heap table: EXECUTE dbo.RunTest @SQL = 'SELECT * FROM Example'; The top result set comes from the Index Usage Stats DMV, so it is the Query Executor’s (QE) view. The lower result is from Index Operational Stats, which shows statistics derived from the actions taken by the Storage Engine (SE). We see that QE performed 1 scan operation on the heap, and SE performed a single range scan. Let’s try a single-value equality seek on a unique index next: EXECUTE dbo.RunTest @SQL = 'SELECT key_col FROM Example WHERE key_col = 32'; This time we see a single seek on the non-clustered primary key from QE, and one singleton lookup on the same index by the SE. Now for a single-value seek on the non-unique non-clustered index: EXECUTE dbo.RunTest @SQL = 'SELECT data FROM Example WHERE data = 32'; QE shows a single seek on the non-clustered non-unique index, but SE shows a single range scan on that index – not the singleton lookup we saw in the previous test. That makes sense because we know that only a single-value seek into a unique index is a singleton seek. A single-value seek into a non-unique index might retrieve any number of rows, if you think about it. The next query is equivalent to the IN list example seen in the first post in this series, but it is written using OR (just for variety, you understand): EXECUTE dbo.RunTest @SQL = 'SELECT data FROM Example WHERE data = 32 OR data = 33'; The plan looks the same, and there’s no difference in the stats recorded by QE, but the SE shows two range scans. Again, these are range scans because we are looking for two values in the data column, which is covered by a non-unique index. I’ve added a snippet from the Properties window to show that the query plan does show two seek predicates, not just one. Now let’s rewrite the query using BETWEEN: EXECUTE dbo.RunTest @SQL = 'SELECT data FROM Example WHERE data BETWEEN 32 AND 33'; Notice the seek operator only has one predicate now – it’s just a single range scan from 32 to 33 in the index – as the SE output shows. For the next test, we will look up four values in the key_col column: EXECUTE dbo.RunTest @SQL = 'SELECT key_col FROM Example WHERE key_col IN (2,4,6,8)'; Just a single seek on the PK from the Query Executor, but four singleton lookups reported by the Storage Engine – and four seek predicates in the Properties window. On to a more complex example: EXECUTE dbo.RunTest @SQL = 'SELECT * FROM Example WITH (INDEX([PK dbo.Example key_col])) WHERE key_col BETWEEN 1 AND 8'; This time we are forcing use of the non-clustered primary key to return eight rows. The index is not covering for this query, so the query plan includes an RID lookup into the heap to fetch the data and padding columns. The QE reports a seek on the PK and a lookup on the heap. The SE reports a single range scan on the PK (to find key_col values between 1 and 8), and eight singleton lookups on the heap. Remember that a bookmark lookup (RID or Key) is a seek to a single value in a ‘unique index’ – it finds a row in the heap or cluster from a unique RID or clustering key – so that’s why lookups are always singleton lookups, not range scans. Our next example shows what happens when a query plan operator is not executed at all: EXECUTE dbo.RunTest @SQL = 'SELECT key_col FROM Example WHERE key_col = 8 AND @@TRANCOUNT < 0'; The Filter has a start-up predicate which is always false (if your @@TRANCOUNT is less than zero, call CSS immediately). The index seek is never executed, but QE still records a single seek against the PK because the operator appears once in an executed plan. The SE output shows no activity at all. This next example is 2008 and above only, I’m afraid: EXECUTE dbo.RunTest @SQL = 'SELECT * FROM Example WHERE key_col BETWEEN 1 AND 30', @Partitioned = 'true'; This is the first example to use a partitioned table. QE reports a single seek on the heap (yes – a seek on a heap), and the SE reports two range scans on the heap. SQL Server knows (from the partitioning definition) that it only needs to look at partitions 1 and 2 to find all the rows where key_col is between 1 and 30 – the engine seeks to find the two partitions, and performs a range scan seek on each partition. The final example for today is another seek on a heap – try to work out the output of the query before running it! EXECUTE dbo.RunTest @SQL = 'SELECT TOP (2) WITH TIES * FROM Example WHERE key_col BETWEEN 1 AND 50 ORDER BY $PARTITION.PF(key_col) DESC', @Partitioned = 'true'; Notice the lack of an explicit Sort operator in the query plan to enforce the ORDER BY clause, and the backward range scan. © 2011 Paul White email: [email protected] twitter: @SQL_Kiwi

Read the article

Instructions on how to configure a WebLogic Cluster and use it with Oracle Http Server

- by Laurent Goldsztejn

On October 17th I delivered a webcast on WebLogic Clustering that included a demo with Apache as the proxy server. I realized that many steps are needed to set up the configuration I used during the demo. The purpose of this article is to go through these steps to show how quickly and easily one can define a new cluster and then proxy requests via an Oracle Http Server (OHS). The domain configuration wizard offers the option to create a cluster. The administration console or WLST, the Weblogic scripting tool can also be used to define a new cluster. It can be created at any time but the servers that will participate in it cannot be in a running state. Cluster Creation using the configuration wizard Network and architecture requirements need to be considered while choosing between unicast and multicast. Multicast Vs. Unicast with WebLogic Clustering is of great help to make the best decision between the two messaging modes. In addition, Configure Cluster offers details on each single field displayed above. After this initial configuration page, individual servers could be assigned to this newly created cluster although servers can be added later to the cluster. What is not recommended is for the Admin server to participate in a cluster as the main purpose of the Admin server is to perform the bulk of the processing for the domain. Servers need to stop before being assigned to a cluster. There is also no minimum number of servers that have to participate in the cluster. At this point the configuration should be done and the cluster created successfully. This can easily be verified from the console. Each clustered managed server can be launched to join the cluster. At startup the following messages should be logged for each clustered managed server: <Notice> <WeblogicServer> <BEA-000365> <Server state changed to STARTING> <Notice> <Cluster> <BEA-000197> <Listening for announcements from cluster using messaging_mode cluster messaging> <Notice> <Cluster> <BEA-000133> <Waiting to synchronize with other running members of cluster_name> It's time to try sending requests to the cluster and we will do this with the help of Oracle Http Server to play the role of a proxy server to demonstrate load balancing. Proxy Server configuration The first step is to download Weblogic Server Web Server Plugin that will enhance the web server by handling requests aimed at being sent to the Weblogic cluster. For our test Oracle Http Server (OHS) will be used. However plug-ins are also available for Apache Http server, Microsoft Internet Information Server (IIS), Oracle iPlanet Webserver or even WebLogic Server with the HttpClusterServlet. Once OHS is installed on the system, the configuration file, mod_wl_ohs.conf, will need to be altered to include Weblogic proxy specifics. First of all, add the following directive to instruct Apache to load the Weblogic shared object module extracted from the plugins file just downloaded. LoadModule weblogic_module modules/mod_wl_ohs.so and then create an IfModule directive to encapsulate the following location block so that proxy will be enabled by path (each request including /wls will be directed directly to the WebLogic Cluster). You could also proxy requests by MIME type using MatchExpression in the Location block. <IfModule weblogic_module> <Location /wls> SetHandler weblogic-handler PathTrim /wls WebLogicCluster MS1_URL:port,MS2_URL:port Debug ON WLLogFile c:/tmp/global_proxy.log WLTempDir "c:/myTemp" DebugConfigInfo On </Location> </IfModule> SetHandler specifies the handler for the plug-in module PathTrim will instruct the plug-in to trim /w ls from the URL before forwarding the request to the cluster. The list of WebLogic Servers defined in WeblogicCluster could contain a mixed set of clustered and single servers. However, the dynamic list returned for this parameter will only contain valid clustered servers and may contain more servers if not all clustered servers are listed in WeblogicCluster. Testing proxy and load balancing It's time to start OHS web server which should at this point be configured correctly to proxy requests to the clustered servers. By default round-robin is the load balancing strategy set by WebLogic. Testing the load balancing can be easily done by disabling cookies on your browser given that a request containing a cookie attempts to connect to the primary server. If that attempt fails, the plug-in attempts to make a connection to the next available server in the list in a round-robin fashion. With cookies enabled, you could use two different browsers to test the load balancing with a JSP page that contains the following: <%@ page contentType="text/html; charset=iso-8859-1" language="java" %> <% String path = request.getContextPath(); String getProtocol=request.getScheme(); String getDomain=request.getServerName(); String getPort=Integer.toString(request.getLocalPort()); String getPath = getProtocol+"://"+getDomain+":"+getPort+path+"/"; %> <html> <body> Receiving Server <%=getPath%> </body> </html> Assuming that you name the JSP page Test.jsp and the webapp that contains it TestApp, your browsers should open the following URL: http://localhost/wls/TestApp/Test.jsp Each browser should connect to a different clustered server and this simple JSP should confirm that. The webapp that contains the JSP needs to be deployed to the cluster. You can also verify that the load is correctly balanced by looking at the proxy log file. Each request generates a set of log entries that starts with : timestamp ================New Request: Each request is associated with a primary server and a secondary server if one is available. For our test request, the following entries should appear in the log as well:Using Uri /wls/TestApp/Test.jsp After trimming path: '/TestApp/Test.jsp' The final request string is '/TestApp/Test.jsp' If an exception occurs, it should also be logged in the proxy log file with the prefix:timestamp *******Exception type WeblogicBridgeConfig DebugConfigInfo enables runtime statistics and the production of configuration information. For security purposes, this parameter should be turned off in production. http://webserver_host:port/path/xyz.jsp?__WebLogicBridgeConfig will display a proxy bridge page detailing the plugin configuration followed by runtime statistics which could help in diagnosing issues along with the analyzing of the proxy log file. In our example the url would be: http://localhost/wls/TestApp/Test.jsp?__WebLogicBridgeConfig Here is how the top section of the screen can look like: The bottom part of the page contains runtime statistics, here is a snippet of it (unrelated with the previous JSP example). This entire plugin configuration should be very similar with other web servers, what varies is the name of the proxy server configuration file. So, as you can see, it only takes a few minutes to configure a Weblogic cluster and get servers to join it.

Read the article

When is a Seek not a Seek?

- by Paul White

The following script creates a single-column clustered table containing the integers from 1 to 1,000 inclusive. IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test ; GO CREATE TABLE #Test ( id INTEGER PRIMARY KEY CLUSTERED ); ; INSERT #Test (id) SELECT V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 1000 ; Let’s say we need to find the rows with values from 100 to 170, excluding any values that divide exactly by 10. One way to write that query would be: SELECT T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; That query produces a pretty efficient-looking query plan: Knowing that the source column is defined as an INTEGER, we could also express the query this way: SELECT T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; We get a similar-looking plan: If you look closely, you might notice that the line connecting the two icons is a little thinner than before. The first query is estimated to produce 61.9167 rows – very close to the 63 rows we know the query will return. The second query presents a tougher challenge for SQL Server because it doesn’t know how to predict the selectivity of the modulo expression (T.id % 10 > 0). Without that last line, the second query is estimated to produce 68.1667 rows – a slight overestimate. Adding the opaque modulo expression results in SQL Server guessing at the selectivity. As you may know, the selectivity guess for a greater-than operation is 30%, so the final estimate is 30% of 68.1667, which comes to 20.45 rows. The second difference is that the Clustered Index Seek is costed at 99% of the estimated total for the statement. For some reason, the final SELECT operator is assigned a small cost of 0.0000484 units; I have absolutely no idea why this is so, or what it models. Nevertheless, we can compare the total cost for both queries: the first one comes in at 0.0033501 units, and the second at 0.0034054. The important point is that the second query is costed very slightly higher than the first, even though it is expected to produce many fewer rows (20.45 versus 61.9167). If you run the two queries, they produce exactly the same results, and both complete so quickly that it is impossible to measure CPU usage for a single execution. We can, however, compare the I/O statistics for a single run by running the queries with STATISTICS IO ON: Table '#Test'. Scan count 63, logical reads 126, physical reads 0. Table '#Test'. Scan count 01, logical reads 002, physical reads 0. The query with the IN list uses 126 logical reads (and has a ‘scan count’ of 63), while the second query form completes with just 2 logical reads (and a ‘scan count’ of 1). It is no coincidence that 126 = 63 * 2, by the way. It is almost as if the first query is doing 63 seeks, compared to one for the second query. In fact, that is exactly what it is doing. There is no indication of this in the graphical plan, or the tool-tip that appears when you hover your mouse over the Clustered Index Seek icon. To see the 63 seek operations, you have click on the Seek icon and look in the Properties window (press F4, or right-click and choose from the menu): The Seek Predicates list shows a total of 63 seek operations – one for each of the values from the IN list contained in the first query. I have expanded the first seek node to show the details; it is seeking down the clustered index to find the entry with the value 101. Each of the other 62 nodes expands similarly, and the same information is contained (even more verbosely) in the XML form of the plan. Each of the 63 seek operations starts at the root of the clustered index B-tree and navigates down to the leaf page that contains the sought key value. Our table is just large enough to need a separate root page, so each seek incurs 2 logical reads (one for the root, and one for the leaf). We can see the index depth using the INDEXPROPERTY function, or by using the a DMV: SELECT S.index_type_desc, S.index_depth FROM sys.dm_db_index_physical_stats ( DB_ID(N'tempdb'), OBJECT_ID(N'tempdb..#Test', N'U'), 1, 1, DEFAULT ) AS S ; Let’s look now at the Properties window when the Clustered Index Seek from the second query is selected: There is just one seek operation, which starts at the root of the index and navigates the B-tree looking for the first key that matches the Start range condition (id >= 101). It then continues to read records at the leaf level of the index (following links between leaf-level pages if necessary) until it finds a row that does not meet the End range condition (id <= 169). Every row that meets the seek range condition is also tested against the Residual Predicate highlighted above (id % 10 > 0), and is only returned if it matches that as well. You will not be surprised that the single seek (with a range scan and residual predicate) is much more efficient than 63 singleton seeks. It is not 63 times more efficient (as the logical reads comparison would suggest), but it is around three times faster. Let’s run both query forms 10,000 times and measure the elapsed time: DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON; SET STATISTICS XML OFF; ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; GO DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; On my laptop, running SQL Server 2008 build 4272 (SP2 CU2), the IN form of the query takes around 830ms and the range query about 300ms. The main point of this post is not performance, however – it is meant as an introduction to the next few parts in this mini-series that will continue to explore scans and seeks in detail. When is a seek not a seek? When it is 63 seeks © Paul White 2011 email: [email protected] twitter: @SQL_kiwi

Read the article

BizTalk Server 2009 - Architecture Options

- by StuartBrierley

I recently needed to put forward a proposal for a BizTalk 2009 implementation and as a part of this needed to describe some of the basic architecture options available for consideration. While I already had an idea of the type of environment that I would be looking to recommend, I felt that presenting a range of options while trying to explain some of the strengths and weaknesses of those options was a good place to start. These outline architecture options should be equally valid for any version of BizTalk Server from 2004, through 2006 and R2, up to 2009. The following diagram shows a crude representation of the common implementation options to consider when designing a BizTalk environment. Each of these options provides differing levels of resilience in the case of failure or disaster, with the later options also providing more scope for performance tuning and scalability. Some of the options presented above make use of clustering. Clustering may best be described as a technology that automatically allows one physical server to take over the tasks and responsibilities of another physical server that has failed. Given that all computer hardware and software will eventually fail, the goal of clustering is to ensure that mission-critical applications will have little or no downtime when such a failure occurs. Clustering can also be configured to provide load balancing, which should generally lead to performance gains and increased capacity and throughput. (A) Single Servers This option is the most basic BizTalk implementation that should be considered. It involves the deployment of a single BizTalk server in conjunction with a single SQL server. This configuration does not provide for any resilience in the case of the failure of either server. It is however the cheapest and easiest to implement option of those available. Using a single BizTalk server does not provide for the level of performance tuning that is otherwise available when using more than one BizTalk server in a cluster. The common edition of BizTalk used in single server implementations is the standard edition. It should be noted however that if future demand requires increased capacity for a solution, this BizTalk edition is limited to scaling up the implementation and not scaling out the number of servers in use. Any need to scale out the solution would require an upgrade to the enterprise edition of BizTalk. (B) Single BizTalk Server with Clustered SQL Servers This option uses a single BizTalk server with a cluster of SQL servers. By utilising clustered SQL servers we can ensure that there is some resilience to the implementation in respect of the databases that BizTalk relies on to operate. The clustering of two SQL servers is possible with the standard edition but to go beyond this would require the enterprise level edition. While this option offers improved resilience over option (A) it does still present a potential single point of failure at the BizTalk server. Using a single BizTalk server does not provide for the level of performance tuning that is otherwise available when using more than one BizTalk server in a cluster. The common edition of BizTalk used in single server implementations is the standard edition. It should be noted however that if future demand requires increased capacity for a solution, this BizTalk edition is limited to scaling up the implementation and not scaling out the number of servers in use. You are also unable to take advantage of multiple message boxes, which would allow us to balance the SQL load in the event of any bottlenecks in this area of the implementation. Any need to scale out the solution would require an upgrade to the enterprise edition of BizTalk. (C) Clustered BizTalk Servers with Clustered SQL Servers This option makes use of a cluster of BizTalk servers with a cluster of SQL servers to offer high availability and resilience in the case of failure of either of the server types involved. Clustering of BizTalk is only available with the enterprise edition of the product. Clustering of two SQL servers is possible with the standard edition but to go beyond this would require the enterprise level edition. The use of a BizTalk cluster also provides for the ability to balance load across the servers and gives more scope for performance tuning any implemented solutions. It is also possible to add more BizTalk servers to an existing cluster, giving scope for scaling out the solution as future demand requires. This might be seen as the middle cost option, providing a good level of protection in the case of failure, a decent level of future proofing, but at a higher cost than the single BizTalk server implementations. (D) Clustered BizTalk Servers with Clustered SQL Servers – with disaster recovery/service continuity This option is similar to that offered by (C) and makes use of a cluster of BizTalk servers with a cluster of SQL servers to offer high availability and resilience in case of failure of either of the server types involved. Clustering of BizTalk is only available with the enterprise edition of the product. Clustering of two SQL servers is possible with the standard edition but to go beyond this would require the enterprise level edition. As with (C) the use of a BizTalk cluster also provides for the ability to balance load across the servers and gives more scope for performance tuning the implemented solution. It is also possible to add more BizTalk servers to an existing cluster, giving scope for scaling the solution out as future demand requires. In this scenario however, we would be including some form of disaster recovery or service continuity. An example of this would be making use of multiple sites, with the BizTalk server cluster operating across sites to offer resilience in case of the loss of one or more sites. In this scenario there are options available for the SQL implementation depending on the network implementation; making use of either one cluster per site or a single SQL cluster across the network. A multi-site SQL implementation would require some form of data replication across the sites involved. This is obviously an expensive and complex option, but does provide an extraordinary amount of protection in the case of failure.

Read the article

Cardinality Estimation Bug with Lookups in SQL Server 2008 onward

- by Paul White

Cost-based optimization stands or falls on the quality of cardinality estimates (expected row counts). If the optimizer has incorrect information to start with, it is quite unlikely to produce good quality execution plans except by chance. There are many ways we can provide good starting information to the optimizer, and even more ways for cardinality estimation to go wrong. Good database people know this, and work hard to write optimizer-friendly queries with a schema and metadata (e.g. statistics) that reduce the chances of poor cardinality estimation producing a sub-optimal plan. Today, I am going to look at a case where poor cardinality estimation is Microsoft’s fault, and not yours. SQL Server 2005 SELECT th.ProductID, th.TransactionID, th.TransactionDate FROM Production.TransactionHistory AS th WHERE th.ProductID = 1 AND th.TransactionDate BETWEEN '20030901' AND '20031231'; The query plan on SQL Server 2005 is as follows (if you are using a more recent version of AdventureWorks, you will need to change the year on the date range from 2003 to 2007): There is an Index Seek on ProductID = 1, followed by a Key Lookup to find the Transaction Date for each row, and finally a Filter to restrict the results to only those rows where Transaction Date falls in the range specified. The cardinality estimate of 45 rows at the Index Seek is exactly correct. The table is not very large, there are up-to-date statistics associated with the index, so this is as expected. The estimate for the Key Lookup is also exactly right. Each lookup into the Clustered Index to find the Transaction Date is guaranteed to return exactly one row. The plan shows that the Key Lookup is expected to be executed 45 times. The estimate for the Inner Join output is also correct – 45 rows from the seek joining to one row each time, gives 45 rows as output. The Filter estimate is also very good: the optimizer estimates 16.9951 rows will match the specified range of transaction dates. Eleven rows are produced by this query, but that small difference is quite normal and certainly nothing to worry about here. All good so far. SQL Server 2008 onward The same query executed against an identical copy of AdventureWorks on SQL Server 2008 produces a different execution plan: The optimizer has pushed the Filter conditions seen in the 2005 plan down to the Key Lookup. This is a good optimization – it makes sense to filter rows out as early as possible. Unfortunately, it has made a bit of a mess of the cardinality estimates. The post-Filter estimate of 16.9951 rows seen in the 2005 plan has moved with the predicate on Transaction Date. Instead of estimating one row, the plan now suggests that 16.9951 rows will be produced by each clustered index lookup – clearly not right! This misinformation also confuses SQL Sentry Plan Explorer: Plan Explorer shows 765 rows expected from the Key Lookup (it multiplies a rounded estimate of 17 rows by 45 expected executions to give 765 rows total). Workarounds One workaround is to provide a covering non-clustered index (avoiding the lookup avoids the problem of course): CREATE INDEX nc1 ON Production.TransactionHistory (ProductID) INCLUDE (TransactionDate); With the Transaction Date filter applied as a residual predicate in the same operator as the seek, the estimate is again as expected: We could also force the use of the ultimate covering index (the clustered one): SELECT th.ProductID, th.TransactionID, th.TransactionDate FROM Production.TransactionHistory AS th WITH (INDEX(1)) WHERE th.ProductID = 1 AND th.TransactionDate BETWEEN '20030901' AND '20031231'; Summary Providing a covering non-clustered index for all possible queries is not always practical, and scanning the clustered index will rarely be optimal. Nevertheless, these are the best workarounds we have today. In the meantime, watch out for poor cardinality estimates when a predicate is applied as part of a lookup. The worst thing is that the estimate after the lookup join in the 2008+ plans is wrong. It’s not hopelessly wrong in this particular case (45 versus 16.9951 is not the end of the world) but it easily can be much worse, and there’s not much you can do about it. Any decisions made by the optimizer after such a lookup could be based on very wrong information – which can only be bad news. If you think this situation should be improved, please vote for this Connect item. © 2012 Paul White – All Rights Reserved twitter: @SQL_Kiwi email: [email protected]

Read the article

SQL Table stored as a Heap - the dangers within

- by MikeD

Nearly all of the time I create a table, I include a primary key, and often that PK is implemented as a clustered index. Those two don't always have to go together, but in my world they almost always do. On a recent project, I was working on a data warehouse and a set of SSIS packages to import data from an OLTP database into my data warehouse. The data I was importing from the business database into the warehouse was mostly new rows, sometimes updates to existing rows, and sometimes deletes. I decided to use the MERGE statement to implement the insert, update or delete in the data warehouse, I found it quite performant to have a stored procedure that extracted all the new, updated, and deleted rows from the source database and dump it into a working table in my data warehouse, then run a stored proc in the warehouse that was the MERGE statement that took the rows from the working table and updated the real fact table. Use Warehouse CREATE TABLE Integration.MergePolicy (PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date, Operation varchar(5)) CREATE TABLE fact.Policy (PolicyKey int identity primary key, PolicyId int, PolicyTypeKey int, Premium money, Deductible money, EffectiveDate date) CREATE PROC Integration.MergePolicy as begin begin tran Merge fact.Policy as tgtUsing Integration.MergePolicy as SrcOn (tgt.PolicyId = Src.PolicyId) When not matched by Target then Insert (PolicyId, PolicyTypeKey, Premium, Deductible, EffectiveDate)values (src.PolicyId, src.PolicyTypeKey, src.Premium, src.Deductible, src.EffectiveDate) When matched and src.Operation = 'U' then Update set PolicyTypeKey = src.PolicyTypeKey,Premium = src.Premium,Deductible = src.Deductible,EffectiveDate = src.EffectiveDate When matched and src.Operation = 'D' then Delete ;delete from Integration.WorkPolicy commit end Notice that my worktable (Integration.MergePolicy) doesn't have any primary key or clustered index. I didn't think this would be a problem, since it was relatively small table and was empty after each time I ran the stored proc. For one of the work tables, during the initial loads of the warehouse, it was getting about 1.5 million rows inserted, processed, then deleted. Also, because of a bug in the extraction process, the same 1.5 million rows (plus a few hundred more each time) was getting inserted, processed, and deleted. This was being sone on a fairly hefty server that was otherwise unused, and no one was paying any attention to the time it was taking. This week I received a backup of this database and loaded it on my laptop to troubleshoot the problem, and of course it took a good ten minutes or more to run the process. However, what seemed strange to me was that after I fixed the problem and happened to run the merge sproc when the work table was completely empty, it still took almost ten minutes to complete. I immediately looked back at the MERGE statement to see if I had some sort of outer join that meant it would be scanning the target table (which had about 2 million rows in it), then turned on the execution plan output to see what was happening under the hood. Running the stored procedure again took a long time, and the plan output didn't show me much - 55% on the MERGE statement, and 45% on the DELETE statement, and table scans on the work table in both places. I was surprised at the relative cost of the DELETE statement, because there were really 0 rows to delete, but I was expecting to see the table scans. (I was beginning now to suspect that my problem was because the work table was being stored as a heap.) Then I turned on STATS_IO and ran the sproc again. The output was quite interesting.Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'Policy'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.Table 'MergePolicy'. Scan count 1, logical reads 433276, physical reads 60, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. I've reproduced the above from memory, the details aren't exact, but the essential bit was the very high number of logical reads on the table stored as a heap. Even just doing a SELECT Count(*) from Integration.MergePolicy incurred that sort of output, even though the result was always 0. I suppose I should research more on the allocation and deallocation of pages to tables stored as a heap, but I haven't, and my original assumption that a table stored as a heap with no rows would only need to read one page to answer any query was definitely proven wrong. It's likely that some sort of physical defragmentation of the table may have cleaned that up, but it seemed that the easiest answer was to put a clustered index on the table. After doing so, the execution plan showed a cluster index scan, and the IO stats showed only a single page read. (I aborted my first attempt at adding a clustered index on the table because it was taking too long - instead I ran TRUNCATE TABLE Integration.MergePolicy first and added the clustered index, both of which took very little time). I suspect I may not have noticed this if I had used TRUNCATE TABLE Integration.MergePolicy instead of DELETE FROM Integration.MergePolicy, since I'm guessing that the truncate operation does some rather quick releasing of pages allocated to the heap table. In the future, I will likely be much more careful to have a clustered index on every table I use, even the working tables. Mike

Read the article

No proper kmeans clustering of images in matlab

- by user3237134

I am having 1200 face images in my training set.There are 2989 test face images. I am using eigen faces (PCA) for feature extraction. I am using kmeans clustering. Source code I tried: IDX = kmeans(z,5); clustercount=accumarray(IDX, ones(size(IDX))); disp(clustercount); Problem: Images are not clustered properly. Same faces should be clustered. But different faces are being clustered. Questions: Should I have to use still more face images for training? How accuracy of clustering can be achieved? What is the solution?

Read the article

So…is it a Seek or a Scan?

- by Paul White

You’re probably most familiar with the terms ‘Seek’ and ‘Scan’ from the graphical plans produced by SQL Server Management Studio (SSMS). The image to the left shows the most common ones, with the three types of scan at the top, followed by four types of seek. You might look to the SSMS tool-tip descriptions to explain the differences between them: Not hugely helpful are they? Both mention scans and ranges (nothing about seeks) and the Index Seek description implies that it will not scan the index entirely (which isn’t necessarily true). Recall also yesterday’s post where we saw two Clustered Index Seek operations doing very different things. The first Seek performed 63 single-row seeking operations; and the second performed a ‘Range Scan’ (more on those later in this post). I hope you agree that those were two very different operations, and perhaps you are wondering why there aren’t different graphical plan icons for Range Scans and Seeks? I have often wondered about that, and the first person to mention it after yesterday’s post was Erin Stellato (twitter | blog): Before we go on to make sense of all this, let’s look at another example of how SQL Server confusingly mixes the terms ‘Scan’ and ‘Seek’ in different contexts. The diagram below shows a very simple heap table with two columns, one of which is the non-clustered Primary Key, and the other has a non-unique non-clustered index defined on it. The right hand side of the diagram shows a simple query, it’s associated query plan, and a couple of extracts from the SSMS tool-tip and Properties windows. Notice the ‘scan direction’ entry in the Properties window snippet. Is this a seek or a scan? The different references to Scans and Seeks are even more pronounced in the XML plan output that the graphical plan is based on. This fragment is what lies behind the single Index Seek icon shown above: You’ll find the same confusing references to Seeks and Scans throughout the product and its documentation. Making Sense of Seeks Let’s forget all about scans for a moment, and think purely about seeks. Loosely speaking, a seek is the process of navigating an index B-tree to find a particular index record, most often at the leaf level. A seek starts at the root and navigates down through the levels of the index to find the point of interest: Singleton Lookups The simplest sort of seek predicate performs this traversal to find (at most) a single record. This is the case when we search for a single value using a unique index and an equality predicate. It should be readily apparent that this type of search will either find one record, or none at all. This operation is known as a singleton lookup. Given the example table from before, the following query is an example of a singleton lookup seek: Sadly, there’s nothing in the graphical plan or XML output to show that this is a singleton lookup – you have to infer it from the fact that this is a single-value equality seek on a unique index. The other common examples of a singleton lookup are bookmark lookups – both the RID and Key Lookup forms are singleton lookups (an RID lookup finds a single record in a heap from the unique row locator, and a Key Lookup does much the same thing on a clustered table). If you happen to run your query with STATISTICS IO ON, you will notice that ‘Scan Count’ is always zero for a singleton lookup. Range Scans The other type of seek predicate is a ‘seek plus range scan’, which I will refer to simply as a range scan. The seek operation makes an initial descent into the index structure to find the first leaf row that qualifies, and then performs a range scan (either backwards or forwards in the index) until it reaches the end of the scan range. The ability of a range scan to proceed in either direction comes about because index pages at the same level are connected by a doubly-linked list – each page has a pointer to the previous page (in logical key order) as well as a pointer to the following page. The doubly-linked list is represented by the green and red dotted arrows in the index diagram presented earlier. One subtle (but important) point is that the notion of a ‘forward’ or ‘backward’ scan applies to the logical key order defined when the index was built. In the present case, the non-clustered primary key index was created as follows: CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col ASC) ) ; Notice that the primary key index specifies an ascending sort order for the single key column. This means that a forward scan of the index will retrieve keys in ascending order, while a backward scan would retrieve keys in descending key order. If the index had been created instead on key_col DESC, a forward scan would retrieve keys in descending order, and a backward scan would return keys in ascending order. A range scan seek predicate may have a Start condition, an End condition, or both. Where one is missing, the scan starts (or ends) at one extreme end of the index, depending on the scan direction. Some examples might help clarify that: the following diagram shows four queries, each of which performs a single seek against a column holding every integer from 1 to 100 inclusive. The results from each query are shown in the blue columns, and relevant attributes from the Properties window appear on the right: Query 1 specifies that all key_col values less than 5 should be returned in ascending order. The query plan achieves this by seeking to the start of the index leaf (there is no explicit starting value) and scanning forward until the End condition (key_col < 5) is no longer satisfied (SQL Server knows it can stop looking as soon as it finds a key_col value that isn’t less than 5 because all later index entries are guaranteed to sort higher). Query 2 asks for key_col values greater than 95, in descending order. SQL Server returns these results by seeking to the end of the index, and scanning backwards (in descending key order) until it comes across a row that isn’t greater than 95. Sharp-eyed readers may notice that the end-of-scan condition is shown as a Start range value. This is a bug in the XML show plan which bubbles up to the Properties window – when a backward scan is performed, the roles of the Start and End values are reversed, but the plan does not reflect that. Oh well. Query 3 looks for key_col values that are greater than or equal to 10, and less than 15, in ascending order. This time, SQL Server seeks to the first index record that matches the Start condition (key_col >= 10) and then scans forward through the leaf pages until the End condition (key_col < 15) is no longer met. Query 4 performs much the same sort of operation as Query 3, but requests the output in descending order. Again, we have to mentally reverse the Start and End conditions because of the bug, but otherwise the process is the same as always: SQL Server finds the highest-sorting record that meets the condition ‘key_col < 25’ and scans backward until ‘key_col >= 20’ is no longer true. One final point to note: seek operations always have the Ordered: True attribute. This means that the operator always produces rows in a sorted order, either ascending or descending depending on how the index was defined, and whether the scan part of the operation is forward or backward. You cannot rely on this sort order in your queries of course (you must always specify an ORDER BY clause if order is important) but SQL Server can make use of the sort order internally. In the four queries above, the query optimizer was able to avoid an explicit Sort operator to honour the ORDER BY clause, for example. Multiple Seek Predicates As we saw yesterday, a single index seek plan operator can contain one or more seek predicates. These seek predicates can either be all singleton seeks or all range scans – SQL Server does not mix them. For example, you might expect the following query to contain two seek predicates, a singleton seek to find the single record in the unique index where key_col = 10, and a range scan to find the key_col values between 15 and 20: SELECT key_col FROM dbo.Example WHERE key_col = 10 OR key_col BETWEEN 15 AND 20 ORDER BY key_col ASC ; In fact, SQL Server transforms the singleton seek (key_col = 10) to the equivalent range scan, Start:[key_col >= 10], End:[key_col <= 10]. This allows both range scans to be evaluated by a single seek operator. To be clear, this query results in two range scans: one from 10 to 10, and one from 15 to 20. Final Thoughts That’s it for today – tomorrow we’ll look at monitoring singleton lookups and range scans, and I’ll show you a seek on a heap table. Yes, a seek. On a heap. Not an index! If you would like to run the queries in this post for yourself, there’s a script below. Thanks for reading! IF OBJECT_ID(N'dbo.Example', N'U') IS NOT NULL BEGIN DROP TABLE dbo.Example; END ; -- Test table is a heap -- Non-clustered primary key on 'key_col' CREATE TABLE dbo.Example ( key_col INTEGER NOT NULL, data INTEGER NOT NULL, CONSTRAINT [PK dbo.Example key_col] PRIMARY KEY NONCLUSTERED (key_col) ) ; -- Non-unique non-clustered index on the 'data' column CREATE NONCLUSTERED INDEX [IX dbo.Example data] ON dbo.Example (data) ; -- Add 100 rows INSERT dbo.Example WITH (TABLOCKX) ( key_col, data ) SELECT key_col = V.number, data = V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 100 ; -- ================ -- Singleton lookup -- ================ ; -- Single value equality seek in a unique index -- Scan count = 0 when STATISTIS IO is ON -- Check the XML SHOWPLAN SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col = 32 ; -- =========== -- Range Scans -- =========== ; -- Query 1 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col <= 5 ORDER BY E.key_col ASC ; -- Query 2 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col > 95 ORDER BY E.key_col DESC ; -- Query 3 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col >= 10 AND E.key_col < 15 ORDER BY E.key_col ASC ; -- Query 4 SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col >= 20 AND E.key_col < 25 ORDER BY E.key_col DESC ; -- Final query (singleton + range = 2 range scans) SELECT E.key_col FROM dbo.Example AS E WHERE E.key_col = 10 OR E.key_col BETWEEN 15 AND 20 ORDER BY E.key_col ASC ; -- === TIDY UP === DROP TABLE dbo.Example; © 2011 Paul White email: [email protected] twitter: @SQL_Kiwi

Read the article

SQL 2005 indexed queries slower than unindexed queries

- by uos??

Adding a seemingly perfectly index is having an unexpectedly adverse affect on a query performance... -- [Data] has a predictable structure and a simple clustered index of the primary key: ALTER TABLE [dbo].[Data] ADD PRIMARY KEY CLUSTERED ( [ID] ) -- My query, joins on itself looking for a certain kind of "overlapping" records SELECT DISTINCT [Data].ID AS [ID] FROM dbo.[Data] AS [Data] JOIN dbo.[Data] AS [Compared] ON [Data].[A] = [Compared].[A] AND [Data].[B] = [Compared].[B] AND [Data].[C] = [Compared].[C] AND ([Data].[D] = [Compared].[D] OR [Data].[E] = [Compared].[E]) AND [Data].[F] <> [Compared].[F] WHERE 1=1 AND [Data].[A] = @A AND @CS <= [Data].[C] AND [Data].[C] < @CE -- Between a range [Data] has about a quarter-million records so far, 10% to 50% of the data satisfies the where clause depending on @A, @CS, and @CE. As is, the query takes 1 second to return about 300 rows when querying 10%, and 30 seconds to return 3000 rows when querying 50% of the data. Curiously, the estimated/actual execution plan indicates two parallel Clustered Index Scans, but the clustered index is only of the ID, which isn't part of the conditions of the query, only the output. ?? If I add this hand-crafted [IDX_A_B_C_D_E_F] index which I fully expected to improve performance, the query slows down by a factor of 8 (8 seconds for 10% & 4 minutes for 50%). The estimated/actual execution plans show an Index Seek, which seems like the right thing to be doing, but why so slow?? CREATE UNIQUE INDEX [IDX_A_B_C_D_E_F] ON [dbo].[Data] ([A], [B], [C], [D], [E], [F]) INCLUDE ([ID], [X], [Y], [Z]); The Data Engine Tuning wizard suggests a similar index with no noticeable difference in performance from this one. Moving AND [Data].[F] <> [Compared].[F] from the join condition to the where clause makes no difference in performance. I need these and other indexes for other queries. I'm sure I could hint that the query should refer to the Clustered Index, since that's currently winning - but we all know it is not as optimized as it could be, and without a proper index, I can expect the performance will get much worse with additional data. What gives?

Read the article

Tables with no Primary Key

- by Matt Hamilton

I have several tables whose only unique data is a uniqueidentifier (a Guid) column. Because guids are non-sequential (and they're client-side generated so I can't use newsequentialid()), I have made a non-primary, non-clustered index on this ID field rather than giving the tables a clustered primary key. I'm wondering what the performance implications are for this approach. I've seen some people suggest that tables should have an auto-incrementing ("identity") int as a clustered primary key even if it doesn't have any meaning, as it means that the database engine itself can use that value to quickly look up a row instead of having to use a bookmark. My database is merge-replicated across a bunch of servers, so I've shied away from identity int columns as they're a bit hairy to get right in replication. What are your thoughts? Should tables have primary keys? Or is it ok to not have any clustered indexes if there are no sensible columns to index that way?

Read the article

Meaning of Primary Key to Microsoft SQL Server 2008

- by usr

What meaning does the concept of a primary key have to the database engine of SQL Server? I don't mean the clustered/nonclustered index created on the "ID" column, i mean the constraint object "primary key". Does it matter if it exists or not? Alternatives: alter table add primary key clustered alter table create clustered index Does it make a difference?

Read the article

OCFS2 Now Certified for E-Business Suite Release 12 Application Tiers

- by sergio.leunissen

Steven Chan writes that OCFS2 is now certified for use as a clustered filesystem for sharing files between all of your E-Business Suite application tier servers. OCFS2 (Oracle Cluster File System 2) is a free, open source, general-purpose, extent-based clustered file system which Oracle developed and contributed to the Linux community. It was accepted into Linux kernel 2.6.16.OCFS2 is included in Oracle Enterprise Linux (OEL) and supported under Unbreakable Linux support.

Read the article

DBCC CHECKDB on VVLDB and latches (Or: My Pain is Your Gain)

- by Argenis

Does your CHECKDB hurt, Argenis? There is a classic blog series by Paul Randal [blog|twitter] called “CHECKDB From Every Angle” which is pretty much mandatory reading for anybody who’s even remotely considering going for the MCM certification, or its replacement (the Microsoft Certified Solutions Master: Data Platform – makes my fingers hurt just from typing it). Of particular interest is the post “Consistency Options for a VLDB” – on it, Paul provides solid, timeless advice (I use the word “timeless” because it was written in 2007, and it all applies today!) on how to perform checks on very large databases. Well, here I was trying to figure out how to make CHECKDB run faster on a restored copy of one of our databases, which happens to exceed 7TB in size. The whole thing was taking several days on multiple systems, regardless of the storage used – SAS, SATA or even SSD…and I actually didn’t pay much attention to how long it was taking, or even bothered to look at the reasons why - as long as it was finishing okay and found no consistency errors. Yes – I know. That was a huge mistake, as corruption found in a database several days after taking place could only allow for further spread of the corruption – and potentially large data loss. In the last two weeks I increased my attention towards this problem, as we noticed that CHECKDB was taking EVEN LONGER on brand new all-flash storage in the SAN! I couldn’t really explain it, and were almost ready to blame the storage vendor. The vendor told us that they could initially see the server driving decent I/O – around 450Mb/sec, and then it would settle at a very slow rate of 10Mb/sec or so. “Hum”, I thought – “CHECKDB is just not pushing the I/O subsystem hard enough”. Perfmon confirmed the vendor’s observations. Dreaded @BlobEater What was CHECKDB doing all the time while doing so little I/O? Eating Blobs. It turns out that CHECKDB was taking an extremely long time on one of our frankentables, which happens to be have 35 billion rows (yup, with a b) and sucks up several terabytes of space in the database. We do have a project ongoing to purge/split/partition this table, so it’s just a matter of time before we deal with it. But the reality today is that CHECKDB is coming to a screeching halt in performance when dealing with this particular table. Checking sys.dm_os_waiting_tasks and sys.dm_os_latch_stats showed that LATCH_EX (DBCC_OBJECT_METADATA) was by far the top wait type. I remembered hearing recently about that wait from another post that Paul Randal made, but that was related to computed-column indexes, and in fact, Paul himself reminded me of his article via twitter. But alas, our pathologic table had no non-clustered indexes on computed columns. I knew that latches are used by the database engine to do internal synchronization – but how could I help speed this up? After all, this is stuff that doesn’t have a lot of knobs to tweak. (There’s a fantastic level 500 talk by Bob Ward from Microsoft CSS [blog|twitter] called “Inside SQL Server Latches” given at PASS 2010 – and you can check it out here. DISCLAIMER: I assume no responsibility for any brain melting that might ensue from watching Bob’s talk!) Failed Hypotheses Earlier on this week I flew down to Palo Alto, CA, to visit our Headquarters – and after having a great time with my Monkey peers, I was relaxing on the plane back to Seattle watching a great talk by SQL Server MVP and fellow MCM Maciej Pilecki [twitter] called “Masterclass: A Day in the Life of a Database Transaction” where he discusses many different topics related to transaction management inside SQL Server. Very good stuff, and when I got home it was a little late – that slow DBCC CHECKDB that I had been dealing with was way in the back of my head. As I was looking at the problem at hand earlier on this week, I thought “How about I set the database to read-only?” I remembered one of the things Maciej had (jokingly) said in his talk: “if you don’t want locking and blocking, set the database to read-only” (or something to that effect, pardon my loose memory). I immediately killed the CHECKDB which had been running painfully for days, and set the database to read-only mode. Then I ran DBCC CHECKDB against it. It started going really fast (even a bit faster than before), and then throttled down again to around 10Mb/sec. All sorts of expletives went through my head at the time. Sure enough, the same latching scenario was present. Oh well. I even spent some time trying to figure out if NUMA was hurting performance. Folks on Twitter made suggestions in this regard (thanks, Lonny! [twitter]) …Eureka? This past Friday I was still scratching my head about the whole thing; I was ready to start profiling with XPERF to see if I could figure out which part of the engine was to blame and then get Microsoft to look at the evidence. After getting a bunch of good news I’ll blog about separately, I sat down for a figurative smack down with CHECKDB before the weekend. And then the light bulb went on. A sparse column. I thought that I couldn’t possibly be experiencing the same scenario that Paul blogged about back in March showing extreme latching with non-clustered indexes on computed columns. Did I even have a non-clustered index on my sparse column? As it turns out, I did. I had one filtered non-clustered index – with the sparse column as the index key (and only column). To prove that this was the problem, I went and setup a test. Yup, that'll do it The repro is very simple for this issue: I tested it on the latest public builds of SQL Server 2008 R2 SP2 (CU6) and SQL Server 2012 SP1 (CU4). First, create a test database and a test table, which only needs to contain a sparse column: CREATE DATABASE SparseColTest; GO USE SparseColTest; GO CREATE TABLE testTable (testCol smalldatetime SPARSE NULL); GO INSERT INTO testTable (testCol) VALUES (NULL); GO 1000000 That’s 1 million rows, and even though you’re inserting NULLs, that’s going to take a while. In my laptop, it took 3 minutes and 31 seconds. Next, we run DBCC CHECKDB against the database: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; This runs extremely fast, as least on my test rig – 198 milliseconds. Now let’s create a filtered non-clustered index on the sparse column: CREATE NONCLUSTERED INDEX [badBadIndex] ON testTable (testCol) WHERE testCol IS NOT NULL; With the index in place now, let’s run DBCC CHECKDB one more time: DBCC CHECKDB('SparseColTest') WITH NO_INFOMSGS, ALL_ERRORMSGS; In my test system this statement completed in 11433 milliseconds. 11.43 full seconds. Quite the jump from 198 milliseconds. I went ahead and dropped the filtered non-clustered indexes on the restored copy of our production database, and ran CHECKDB against that. We went down from 7+ days to 19 hours and 20 minutes. Cue the “Argenis is not impressed” meme, please, Mr. LaRock. My pain is your gain, folks. Go check to see if you have any of such indexes – they’re likely causing your consistency checks to run very, very slow. Happy CHECKDBing, -Argenis ps: I plan to file a Connect item for this issue – I consider it a pretty serious bug in the engine. After all, filtered indexes were invented BECAUSE of the sparse column feature – and it makes a lot of sense to use them together. Watch this space and my twitter timeline for a link.

Search Results

Search found 544 results on 22 pages for 'clustered'.

Page 4/22 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by David

- by chris

- by Bob

- by MoSlo

- by Dinesh Asanka

- by gregturn

- by JohnMcG

- by pinaldave

- by Renso

- by pinaldave

- by Raj

- by Paul White

- by Paul White

- by Laurent Goldsztejn

- by Paul White

- by StuartBrierley

- by Paul White

- by MikeD

- by user3237134

- by Paul White

- by uos??

- by Matt Hamilton

- by usr

- by sergio.leunissen

- by Argenis

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >