Search Results

Search found 4685 results on 188 pages for 'queries'.

Page 43/188 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >

SQL SERVER – Weekly Series – Memory Lane – #048

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Order of Result Set of SELECT Statement on Clustered Indexed Table When ORDER BY is Not Used Above theory is true in most of the cases. However SQL Server does not use that logic when returning the resultset. SQL Server always returns the resultset which it can return fastest.In most of the cases the resultset which can be returned fastest is the resultset which is returned using clustered index. Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT One of the Jr. Developer asked me this question (What will be the Effect of TRANSACTION on Local Variable – After ROLLBACK and After COMMIT?) while I was rushing to an important meeting. I was getting late so I asked him to talk with his Application Tech Lead. When I came back from meeting both of them were looking for me. They said they are confused. I quickly wrote down following example for them. 2008 SQL SERVER – Guidelines and Coding Standards Complete List Download Coding standards and guidelines are very important for any developer on the path of a successful career. A coding standard is a set of guidelines, rules and regulations on how to write code. Coding standards should be flexible enough or should take care of the situation where they should not prevent best practices for coding. They are basically the guidelines that one should follow for better understanding. Download Guidelines and Coding Standards complete List Download Get Answer in Float When Dividing of Two Integer Many times we have requirements of some calculations amongst different fields in Tables. One of the software developers here was trying to calculate some fields having integer values and divide it which gave incorrect results in integer where accurate results including decimals was expected. Puzzle – Computed Columns Datatype Explanation SQL Server automatically does a cast to the data type having the highest precedence. So the result of INT and INT will be INT, but INT and FLOAT will be FLOAT because FLOAT has a higher precedence. If you want a different data type, you need to do an EXPLICIT cast. Renaming SP is Not Good Idea – Renaming Stored Procedure Does Not Update sys.procedures I have written many articles about renaming a tables, columns and procedures SQL SERVER – How to Rename a Column Name or Table Name, here I found something interesting about renaming the stored procedures and felt like sharing it with you all. The interesting fact is that when we rename a stored procedure using SP_Rename command, the Stored Procedure is successfully renamed. But when we try to test the procedure using sp_helptext, the procedure will be having the old name instead of new names. 2009 Insert Values of Stored Procedure in Table – Use Table Valued Function It is clear from the result set that , where I have converted stored procedure logic into the table valued function, is much better in terms of logic as it saves a large number of operations. However, this option should be used carefully. The performance of the stored procedure is “usually” better than that of functions. Interesting Observation – Index on Index View Used in Similar Query Recently, I was working on an optimization project for one of the largest organizations. While working on one of the queries, we came across a very interesting observation. We found that there was a query on the base table and when the query was run, it used the index, which did not exist in the base table. On careful examination, we found that the query was using the index that was on another view. This was very interesting as I have personally never experienced a scenario like this. In simple words, “Query on the base table can use the index created on the indexed view of the same base table.” Interesting Observation – Execution Plan and Results of Aggregate Concatenation Queries Working with SQL Server has never seemed to be monotonous – no matter how long one has worked with it. Quite often, I come across some excellent comments that I feel like acknowledging them as blog posts. Recently, I wrote an article on SQL SERVER – Execution Plan and Results of Aggregate Concatenation Queries Depend Upon Expression Location, which is well received in the community. 2010 I encourage all of you to go through complete series and write your own on the subject. If you write an article and send it to me, I will publish it on this blog with due credit to you. If you write on your own blog, I will update this blog post pointing to your blog post. SQL SERVER – ORDER BY Does Not Work – Limitation of the View 1 SQL SERVER – Adding Column is Expensive by Joining Table Outside View – Limitation of the View 2 SQL SERVER – Index Created on View not Used Often – Limitation of the View 3 SQL SERVER – SELECT * and Adding Column Issue in View – Limitation of the View 4 SQL SERVER – COUNT(*) Not Allowed but COUNT_BIG(*) Allowed – Limitation of the View 5 SQL SERVER – UNION Not Allowed but OR Allowed in Index View – Limitation of the View 6 SQL SERVER – Cross Database Queries Not Allowed in Indexed View – Limitation of the View 7 SQL SERVER – Outer Join Not Allowed in Indexed Views – Limitation of the View 8 SQL SERVER – SELF JOIN Not Allowed in Indexed View – Limitation of the View 9 SQL SERVER – Keywords View Definition Must Not Contain for Indexed View – Limitation of the View 10 SQL SERVER – View Over the View Not Possible with Index View – Limitations of the View 11 2011 Startup Parameters Easy to Configure If you are a regular reader of this blog, you must be aware that I have written about SQL Server Denali recently. Here is the quickest way to reach into the screen where we can change the startup parameters. Go to SQL Server Configuration Manager >> SQL Server Services >> Right Click on the Server >> Properties >> Startup Parameters 2012 Validating Unique Columnname Across Whole Database I sometimes come across very strange requirements and often I do not receive a proper explanation of the same. Here is the one of those examples. For example “Our business requirement is when we add new column we want it unique across current database.” Read the solution to this strange request in this blog post. Excel Losing Decimal Values When Value Pasted from SSMS ResultSet It is very common when users are coping the resultset to Excel, the floating point or decimals are missed. The solution is very much simple and it requires a small adjustment in the Excel. By default Excel is very smart and when it detects the value which is getting pasted is numeric it changes the column format to accommodate that. Basic Calculation and PEMDAS Order of Operation Read this interesting blog post for fantastic conversation about the subject. Copy Column Headers from Resultset – SQL in Sixty Seconds #027 – Video http://www.youtube.com/watch?v=x_-3tLqTRv0 Delete From Multiple Table – Update Multiple Table in Single Statement There are two questions which I get every single day multiple times. In my gmail, I have created standard canned reply for them. Let us see the questions here. I want to delete from multiple table in a single statement how will I do it? I want to update multiple table in a single statement how will I do it? Read the answer in the blog post. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Weekly Series – Memory Lane – #050

- by Pinal Dave

Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 Executing Remote Stored Procedure – Calling Stored Procedure on Linked Server In this example we see two different methods of how to call Stored Procedures remotely. Connection Property of SQL Server Management Studio SSMS A very simple example of the how to build connection properties for SQL Server with the help of SSMS. Sample Example of RANKING Functions – ROW_NUMBER, RANK, DENSE_RANK, NTILE SQL Server has a total of 4 ranking functions. Ranking functions return a ranking value for each row in a partition. All the ranking functions are non-deterministic. T-SQL Script to Add Clustered Primary Key Jr. DBA asked me three times in a day, how to create Clustered Primary Key. I gave him following sample example. That was the last time he asked “How to create Clustered Primary Key to table?” 2008 2008 – TRIM() Function – User Defined Function SQL Server does not have functions which can trim leading or trailing spaces of any string at the same time. SQL does have LTRIM() and RTRIM() which can trim leading and trailing spaces respectively. SQL Server 2008 also does not have TRIM() function. User can easily use LTRIM() and RTRIM() together and simulate TRIM() functionality. http://www.youtube.com/watch?v=1-hhApy6MHM 2009 Earlier I have written two different articles on the subject Remove Bookmark Lookup. This article is as part 3 of original article. Please read the first two articles here before continuing reading this article. Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup – Part 2 Query Optimization – Remove Bookmark Lookup – Remove RID Lookup – Remove Key Lookup – Part 3 Interesting Observation – Query Hint – FORCE ORDER SQL Server never stops to amaze me. As regular readers of this blog already know that besides conducting corporate training, I work on large-scale projects on query optimizations and server tuning projects. In one of the recent projects, I have noticed that a Junior Database Developer used the query hint Force Order; when I asked for details, I found out that the basic concept was not properly understood by him. Queries Waiting for Memory Allocation to Execute In one of the recent projects, I was asked to create a report of queries that are waiting for memory allocation. The reason was that we were doubtful regarding whether the memory was sufficient for the application. The following query can be useful in similar cases. Queries that do not have to wait on a memory grant will not appear in the result set of following query. 2010 Quickest Way to Identify Blocking Query and Resolution – Dirty Solution As the title suggests, this is quite a dirty solution; it’s not as elegant as you expect. However, it works totally fine. Simple Explanation of Data Type Precedence While I was working on creating a question for SQL SERVER – SQL Quiz – The View, The Table and The Clustered Index Confusion, I had actually created yet another question along with this question. However, I felt that the one which is posted on the SQL Quiz is much better than this one because what makes that more challenging question is that it has a multiple answer. Encrypted Stored Procedure and Activity Monitor I recently had received questionable if any stored procedure is encrypted can we see its definition in Activity Monitor.Answer is - No. Let us do a quick test. Let us create following Stored Procedure and then launch the Activity Monitor and check the text. Indexed View always Use Index on Table A single table can have maximum 249 non clustered indexes and 1 clustered index. In SQL Server 2008, a single table can have maximum 999 non clustered indexes and 1 clustered index. It is widely believed that a table can have only 1 clustered index, and this belief is true. I have some questions for all of you. Let us assume that I am creating view from the table itself and then create a clustered index on it. In my view, I am selecting the complete table itself. 2011 Detecting Database Case Sensitive Property using fn_helpcollations() I received a question on how to determine the case sensitivity of the database. The quick answer to this is to identify the collation of the database and check the properties of the collation. I have previously written how one can identify database collation. Once you have figured out the collation of the database, you can put that in the WHERE condition of the following T-SQL and then check the case sensitivity from the description. Server Side Paging in SQL Server CE (Compact Edition) SQL Server Denali is coming up with new T-SQL of Paging. I have written about the same earlier.SQL SERVER – Server Side Paging in SQL Server Denali – A Better Alternative, SQL SERVER – Server Side Paging in SQL Server Denali Performance Comparison, SQL SERVER – Server Side Paging in SQL Server Denali – Part2 What is very interesting is that SQL Server CE 4.0 have the same feature introduced. Here is the quick example of the same. To run the script in the example, you will have to do installWebmatrix 4.0 and download sample database. Once done you can run following script. Why I am Going to Attend PASS Summit Unite 2011 The four-day event will be marked by a lot of learning, sharing, and networking, which will help me increase both my knowledge and contacts. Every year, PASS Summit provides me a golden opportunity to build my network as well as to identify and meet potential customers or employees. 2012 Manage Help Settings – CTRL + ALT + F1 This is very interesting read as my daughter once accidently came across a screen in SQL Server Management Studio. It took me 2-3 minutes to figure out how she has created the same screen. Recover the Accidentally Renamed Table “I accidentally renamed table in my SSMS. I was scrolling very fast and I made mistakes. It was either because I double clicked or clicked on F2 (shortcut key for renaming). However, I have made the mistake and now I have no idea how to fix this. If you have renamed the table, I think you pretty much is out of luck. Here are few things which you can do which can give you an idea about what your table name can be if you are lucky. Identify Numbers of Non Clustered Index on Tables for Entire Database Here is the script which will give you numbers of non clustered indexes on any table in entire database. Identify Most Resource Intensive Queries – SQL in Sixty Seconds #029 – Video Here is the complete complete script which I have used in the SQL in Sixty Seconds Video. Thanks Harsh for important Tip in the comment. http://www.youtube.com/watch?v=3kDHC_Tjrns Advanced Data Quality Services with Melissa Data – Azure Data Market For the purposes of the review, I used a database I had in an Excel spreadsheet with name and address information. Upon a cursory inspection, there are miscellaneous problems with these records; some addresses are missing ZIP codes, others missing a city, and some records are slightly misspelled or have unparsed suites. With DQS, I can easily add a knowledge base to help standardize my values, such as for state abbreviations. But how do I know that my address is correct? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
Query optimization using composite indexes

- by xmarch

Many times, during the process of creating a new Coherence application, developers do not pay attention to the way cache queries are constructed; they only check that these queries comply with functional specs. Later, performance testing shows that these perform poorly and it is then when developers start working on improvements until the non-functional performance requirements are met. This post describes the optimization process of a real-life scenario, where using a composite attribute index has brought a radical improvement in query execution times. The execution times went down from 4 seconds to 2 milliseconds! E-commerce solution based on Oracle ATG – Endeca In the context of a new e-commerce solution based on Oracle ATG – Endeca, Oracle Coherence has been used to calculate and store SKU prices. In this architecture, a Coherence cache stores the final SKU prices used for Endeca baseline indexing. Each SKU price is calculated from a base SKU price and a series of calculations based on information from corporate global discounts. Corporate global discounts information is stored in an auxiliary Coherence cache with over 800.000 entries. In particular, to obtain each price the process needs to execute six queries over the global discount cache. After the implementation was finished, we discovered that the most expensive steps in the price calculation discount process were the global discounts cache query. This query has 10 parameters and is executed 6 times for each SKU price calculation. The steps taken to optimise this query are described below; Starting point Initial query was: String filter = "levelId = :iLevelId AND salesCompanyId = :iSalesCompanyId AND salesChannelId = :iSalesChannelId "+ "AND departmentId = :iDepartmentId AND familyId = :iFamilyId AND brand = :iBrand AND manufacturer = :iManufacturer "+ "AND areaId = :iAreaId AND endDate >= :iEndDate AND startDate <= :iStartDate"; Map<String, Object> params = new HashMap<String, Object>(10); // Fill all parameters. params.put("iLevelId", xxxx); // Executing filter. Filter globalDiscountsFilter = QueryHelper.createFilter(filter, params); NamedCache globalDiscountsCache = CacheFactory.getCache(CacheConstants.GLOBAL_DISCOUNTS_CACHE_NAME); Set applicableDiscounts = globalDiscountsCache.entrySet(globalDiscountsFilter); With the small dataset used for development the cache queries performed very well. However, when carrying out performance testing with a real-world sample size of 800,000 entries, each query execution was taking more than 4 seconds. First round of optimizations The first optimisation step was the creation of separate Coherence index for each of the 10 attributes used by the filter. This avoided object deserialization while executing the query. Each index was created as follows: globalDiscountsCache.addIndex(new ReflectionExtractor("getXXX" ) , false, null); After adding these indexes the query execution time was reduced to between 450 ms and 1s. However, these execution times were still not good enough. Second round of optimizations In this optimisation phase a Coherence query explain plan was used to identify how many entires each index reduced the results set by, along with the cost in ms of executing that part of the query. Though the explain plan showed that all the indexes for the query were being used, it also showed that the ordering of the query parameters was "sub-optimal". Parameters associated to object attributes with high-cardinality should appear at the beginning of the filter, or more specifically, the attributes that filters out the highest of number records should be placed at the beginning. But examining corporate global discount data we realized that depending on the values of the parameters used in the query the “good” order for the attributes was different. In particular, if the attributes brand and family had specific values it was more optimal to have a different query changing the order of the attributes. Ultimately, we ended up with three different optimal variants of the query that were used in its relevant cases: String filter = "brand = :iBrand AND familyId = :iFamilyId AND departmentId = :iDepartmentId AND levelId = :iLevelId "+ "AND manufacturer = :iManufacturer AND endDate >= :iEndDate AND salesCompanyId = :iSalesCompanyId "+ "AND areaId = :iAreaId AND salesChannelId = :iSalesChannelId AND startDate <= :iStartDate"; String filter = "familyId = :iFamilyId AND departmentId = :iDepartmentId AND levelId = :iLevelId AND brand = :iBrand "+ "AND manufacturer = :iManufacturer AND endDate >= :iEndDate AND salesCompanyId = :iSalesCompanyId "+ "AND areaId = :iAreaId AND salesChannelId = :iSalesChannelId AND startDate <= :iStartDate"; String filter = "brand = :iBrand AND departmentId = :iDepartmentId AND familyId = :iFamilyId AND levelId = :iLevelId "+ "AND manufacturer = :iManufacturer AND endDate >= :iEndDate AND salesCompanyId = :iSalesCompanyId "+ "AND areaId = :iAreaId AND salesChannelId = :iSalesChannelId AND startDate <= :iStartDate"; Using the appropriate query depending on the value of brand and family parameters the query execution time dropped to between 100 ms and 150 ms. But these these execution times were still not good enough and the solution was cumbersome. Third and last round of optimizations The third and final optimization was to introduce a composite index. However, this did mean that it was not possible to use the Coherence Query Language (CohQL), as composite indexes are not currently supporte in CohQL. As the original query had 8 parameters using EqualsFilter, 1 using GreaterEqualsFilter and 1 using LessEqualsFilter, the composite index was built for the 8 attributes using EqualsFilter. The final query had an EqualsFilter for the multiple extractor, a GreaterEqualsFilter and a LessEqualsFilter for the 2 remaining attributes. All individual indexes were dropped except the ones being used for LessEqualsFilter and GreaterEqualsFilter. We were now running in an scenario with an 8-attributes composite filter and 2 single attribute filters. The composite index created was as follows: ValueExtractor[] ve = { new ReflectionExtractor("getSalesChannelId" ), new ReflectionExtractor("getLevelId" ), new ReflectionExtractor("getAreaId" ), new ReflectionExtractor("getDepartmentId" ), new ReflectionExtractor("getFamilyId" ), new ReflectionExtractor("getManufacturer" ), new ReflectionExtractor("getBrand" ), new ReflectionExtractor("getSalesCompanyId" )}; MultiExtractor me = new MultiExtractor(ve); NamedCache globalDiscountsCache = CacheFactory.getCache(CacheConstants.GLOBAL_DISCOUNTS_CACHE_NAME); globalDiscountsCache.addIndex(me, false, null); And the final query was: ValueExtractor[] ve = { new ReflectionExtractor("getSalesChannelId" ), new ReflectionExtractor("getLevelId" ), new ReflectionExtractor("getAreaId" ), new ReflectionExtractor("getDepartmentId" ), new ReflectionExtractor("getFamilyId" ), new ReflectionExtractor("getManufacturer" ), new ReflectionExtractor("getBrand" ), new ReflectionExtractor("getSalesCompanyId" )}; MultiExtractor me = new MultiExtractor(ve); // Fill composite parameters.String SalesCompanyId = xxxx;...AndFilter composite = new AndFilter(new EqualsFilter(me, Arrays.asList(iSalesChannelId, iLevelId, iAreaId, iDepartmentId, iFamilyId, iManufacturer, iBrand, SalesCompanyId)), new GreaterEqualsFilter(new ReflectionExtractor("getEndDate" ), iEndDate)); AndFilter finalFilter = new AndFilter(composite, new LessEqualsFilter(new ReflectionExtractor("getStartDate" ), iStartDate)); NamedCache globalDiscountsCache = CacheFactory.getCache(CacheConstants.GLOBAL_DISCOUNTS_CACHE_NAME); Set applicableDiscounts = globalDiscountsCache.entrySet(finalFilter); Using this composite index the query improved dramatically and the execution time dropped to between 2 ms and 4 ms. These execution times completely met the non-functional performance requirements . It should be noticed than when using the composite index the order of the attributes inside the ValueExtractor was not relevant.

Read the article
“Query cost (relative to the batch)” <> Query cost relative to batch

- by Dave Ballantyne

OK, so that is quite a contradictory title, but unfortunately it is true that a common misconception is that the query with the highest percentage relative to batch is the worst performing. Simply put, it is a lie, or more accurately we dont understand what these figures mean. Consider the two below simple queries: SELECT * FROM Person.BusinessEntity JOIN Person.BusinessEntityAddress ON Person.BusinessEntity.BusinessEntityID = Person.BusinessEntityAddress.BusinessEntityID go SELECT * FROM Sales.SalesOrderDetail JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID After executing these and looking at the plans, I see this : So, a 13% / 87% split , but 13% / 87% of WHAT ? CPU ? Duration ? Reads ? Writes ? or some magical weighted algorithm ? In a Profiler trace of the two we can find the metrics we are interested in. CPU and duration are well out but what about reads (210 and 1935)? To save you doing the maths, though you are more than welcome to, that’s a 90.2% / 9.8% split. Close, but no cigar. Lets try a different tact. Looking at the execution plan the “Estimated Subtree cost” of query 1 is 0.29449 and query 2 its 1.96596. Again to save you the maths that works out to 13.03% and 86.97%, round those and thats the figures we are after. But, what is the worrying word there ? “Estimated”. So these are not “actual” execution costs, but what’s the problem in comparing the estimated costs to derive a meaning of “Most Costly”. Well, in the case of simple queries such as the above , probably not a lot. In more complicated queries , a fair bit. By modifying the second query to also show the total number of lines on each order SELECT *,COUNT(*) OVER (PARTITION BY Sales.SalesOrderDetail.SalesOrderID) FROM Sales.SalesOrderDetail JOIN Sales.SalesOrderHeader ON Sales.SalesOrderDetail.SalesOrderID = Sales.SalesOrderHeader.SalesOrderID The split in percentages is now 6% / 94% and the profiler metrics are : Even more of a discrepancy. Estimates can be out with actuals for a whole host of reasons, scalar UDF’s are a particular bug bear of mine and in-fact the cost of a udf call is entirely hidden inside the execution plan. It always estimates to 0 (well, a very small number). Take for instance the following udf Create Function dbo.udfSumSalesForCustomer(@CustomerId integer) returns money as begin Declare @Sum money Select @Sum= SUM(SalesOrderHeader.TotalDue) from Sales.SalesOrderHeader where CustomerID = @CustomerId return @Sum end If we have two statements , one that fires the udf and another that doesn't: Select CustomerID from Sales.Customer order by CustomerID go Select CustomerID,dbo.udfSumSalesForCustomer(Customer.CustomerID) from Sales.Customer order by CustomerID The costs relative to batch is a 50/50 split, but the has to be an actual cost of firing the udf. Indeed profiler shows us : No where even remotely near 50/50!!!! Moving forward to window framing functionality in SQL Server 2012 the optimizer sees ROWS and RANGE ( see here for their functional differences) as the same ‘cost’ too SELECT SalesOrderDetailID,SalesOrderId, SUM(LineTotal) OVER(PARTITION BY salesorderid ORDER BY Salesorderdetailid RANGE unbounded preceding) from Sales.SalesOrderdetail go SELECT SalesOrderDetailID,SalesOrderId, SUM(LineTotal) OVER(PARTITION BY salesorderid ORDER BY Salesorderdetailid Rows unbounded preceding) from Sales.SalesOrderdetail By now it wont be a great display to show you the Profiler trace reads a *tiny* bit different. So moral of the story, Percentage relative to batch can give a rough ‘finger in the air’ measurement, but dont rely on it as fact.

Read the article
More SQL Smells

- by Nick Harrison

Let's continue exploring some of the SQL Smells from Phil's list. He has been putting together. Datatype mis-matches in predicates that rely on implicit conversion.(Plamen Ratchev) This is a great example poking holes in the whole theory of "If it works it's not broken" Queries will this probably will generally work and give the correct response. In fact, without careful analysis, you probably may be completely oblivious that there is even a problem. This subtle little problem will needlessly complicate queries and slow them down regardless of the indexes applied. Consider this example: CREATE TABLE [dbo].[Page]( [PageId] [int] IDENTITY(1,1) NOT NULL, [Title] [varchar](75) NOT NULL, [Sequence] [int] NOT NULL, [ThemeId] [int] NOT NULL, [CustomCss] [text] NOT NULL, [CustomScript] [text] NOT NULL, [PageGroupId] [int] NOT NULL; CREATE PROCEDURE PageSelectBySequence ( @sequenceMin smallint , @sequenceMax smallint ) AS BEGIN SELECT [PageId] , [Title] , [Sequence] , [ThemeId] , [CustomCss] , [CustomScript] , [PageGroupId] FROM [CMS].[dbo].[Page] WHERE Sequence BETWEEN @sequenceMin AND @SequenceMax END Note that the Sequence column is defined as int while the sequence parameter is defined as a small int. The problem is that the database may have to do a lot of type conversions to evaluate the query. In some cases, this may even negate the indexes that you have in place. Using Correlated subqueries instead of a join (Dave_Levy/ Plamen Ratchev) There are two main problems here. The first is a little subjective, since this is a non-standard way of expressing the query, it is harder to understand. The other problem is much more objective and potentially problematic. You are taking much of the control away from the optimizer. Written properly, such a query may well out perform a corresponding query written with traditional joins. More likely than not, performance will degrade. Whenever you assume that you know better than the optimizer, you will most likely be wrong. This is the fundmental problem with any hint. Consider a query like this: SELECT Page.Title , Page.Sequence , Page.ThemeId , Page.CustomCss , Page.CustomScript , PageEffectParams.Name , PageEffectParams.Value , ( SELECT EffectName FROM dbo.Effect WHERE EffectId = dbo.PageEffects.EffectId ) AS EffectName FROM Page INNER JOIN PageEffect ON Page.PageId = PageEffects.PageId INNER JOIN PageEffectParam ON PageEffects.PageEffectId = PageEffectParams.PageEffectId This can and should be written as: SELECT Page.Title , Page.Sequence , Page.ThemeId , Page.CustomCss , Page.CustomScript , PageEffectParams.Name , PageEffectParams.Value , EffectName FROM Page INNER JOIN PageEffect ON Page.PageId = PageEffects.PageId INNER JOIN PageEffectParam ON PageEffects.PageEffectId = PageEffectParams.PageEffectId INNER JOIN dbo.Effect ON dbo.Effects.EffectId = dbo.PageEffects.EffectId The correlated query may just as easily show up in the where clause. It's not a good idea in the select clause or the where clause. Few or No comments. This one is a bit more complicated and controversial. All comments are not created equal. Some comments are helpful and need to be included. Other comments are not necessary and may indicate a problem. I tend to follow the rule of thumb that comments that explain why are good. Comments that explain how are bad. Many people may be shocked to hear the idea of a bad comment, but hear me out. If a comment is needed to explain what is going on or how it works, the logic is too complex and needs to be simplified. Comments that explain why are good. Comments may explain why the sql is needed are good. Comments that explain where the sql is used are good. Comments that explain how tables are related should not be needed if the sql is well written. If they are needed, you need to consider reworking the sql or simplify your data model. Use of functions in a WHERE clause. (Anil Das) Calling a function in the where clause will often negate the indexing strategy. The function will be called for every record considered. This will often a force a full table scan on the tables affected. Calling a function will not guarantee that there is a full table scan, but there is a good chance that it will. If you find that you often need to write queries using a particular function, you may need to add a column to the table that has the function already applied.

Read the article
SQL SERVER – How to Ignore Columnstore Index Usage in Query

- by pinaldave

Earlier I wrote about SQL SERVER – Fundamentals of Columnstore Index and very first question I received in email was as following. “We are using SQL Server 2012 CTP3 and so far so good. In our data warehouse solution we have created 1 non-clustered columnstore index on our large fact table. We have very unique situation but your article did not cover it. We are running few queries on our fact table which is working very efficiently but there is one query which earlier was running very fine but after creating this non-clustered columnstore index this query is running very slow. We dropped the columnstore index and suddenly this one query is running fast but other queries which were benefited by this columnstore index it is running slow. Any workaround in this situation?” In summary the question in simple words “How can we ignore using columnstore index in selective queries?” Very interesting question – you can use I can understand there may be the cases when columnstore index is not ideal and needs to be ignored the same. You can use the query hint IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX to ignore the columnstore index. SQL Server Engine will use any other index which is best after ignoring the columnstore index. Here is the quick script to prove the same. We will first create sample database and then create columnstore index on the same. Once columnstore index is created we will write simple query. This query will use columnstore index. We will then show the usage of the query hint. USE AdventureWorks GO -- Create New Table CREATE TABLE [dbo].[MySalesOrderDetail]( [SalesOrderID] [int] NOT NULL, [SalesOrderDetailID] [int] NOT NULL, [CarrierTrackingNumber] [nvarchar](25) NULL, [OrderQty] [smallint] NOT NULL, [ProductID] [int] NOT NULL, [SpecialOfferID] [int] NOT NULL, [UnitPrice] [money] NOT NULL, [UnitPriceDiscount] [money] NOT NULL, [LineTotal] [numeric](38, 6) NOT NULL, [rowguid] [uniqueidentifier] NOT NULL, [ModifiedDate] [datetime] NOT NULL ) ON [PRIMARY] GO -- Create clustered index CREATE CLUSTERED INDEX [CL_MySalesOrderDetail] ON [dbo].[MySalesOrderDetail] ( [SalesOrderDetailID]) GO -- Create Sample Data Table -- WARNING: This Query may run upto 2-10 minutes based on your systems resources INSERT INTO [dbo].[MySalesOrderDetail] SELECT S1.* FROM Sales.SalesOrderDetail S1 GO 100 -- Create ColumnStore Index CREATE NONCLUSTERED COLUMNSTORE INDEX [IX_MySalesOrderDetail_ColumnStore] ON [MySalesOrderDetail] (UnitPrice, OrderQty, ProductID) GO Now we have created columnstore index so if we run following query it will use for sure the same index. -- Select Table with regular Index SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM [dbo].[MySalesOrderDetail] GROUP BY ProductID ORDER BY ProductID GO We can specify Query Hint IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX as described in following query and it will not use columnstore index. -- Select Table with regular Index SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM [dbo].[MySalesOrderDetail] GROUP BY ProductID ORDER BY ProductID OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) GO Let us clean up the database. -- Cleanup DROP INDEX [IX_MySalesOrderDetail_ColumnStore] ON [dbo].[MySalesOrderDetail] GO TRUNCATE TABLE dbo.MySalesOrderDetail GO DROP TABLE dbo.MySalesOrderDetail GO Again, make sure that you use hint sparingly and understanding the proper implication of the same. Make sure that you test it with and without hint and select the best option after review of your administrator. Here is the question for you – have you started to use SQL Server 2012 for your validation and development (not on production)? It will be interesting to know the answer. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Index, SQL Optimization, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
July, the 31 Days of SQL Server DMO’s – Day 19 (sys.dm_exec_query_stats)

- by Tamarick Hill

The sys.dm_exec_query_stats DMV is one of the most useful DMV’s out there when it comes to performance tuning. If you have been keeping up with this blog series this month, you know that I started out on Day 1 reviewing many of the DMV’s within the ‘exec’ namespace. I’m not sure how I missed this one considering how valuable it is, but hey, they say it’s better late than never right?? On Day 7 and Day 8 we reviewed the sys.dm_exec_procedure_stats and sys.dm_exec_trigger_stats respectively. This sys.dm_exec_query_stats DMV is very similar to these two. As a matter of fact, this DMV will return all of the information you saw in the other two DMV’s, but in addition to that, you can see stats for all queries that have cached execution plans on your server. You can even see stats for statements that are ran Ad-Hoc as long as they are still cached in the buffer pool. To better illustrate this DMV, let have a quick look at it: SELECT * FROM sys.dm_exec_query_stats As you can see, there is a lot of information returned from this DMV. I wont go into detail about each and every one of these columns, but I will touch on a few of them briefly. The first column is the ‘sql_handle’, which if you remember from Day 4 of our blog series, I explained how you can use this column to extract the actual SQL text that was executed. The next columns statement_start_offset and statement_end_offset provide you a way of extracting the exact SQL statement that was executed as part of a batch. The plan_handle column is used to extract the Execution plan that was used, which we talked about during Day 5 of this blog series. Later in the result set, you have columns to identify how many times a particular statement was executed, how much CPU time it used, how many reads/writes it performed, the duration, how many rows were returned, etc. These columns provide you with a solid avenue to begin your performance optimization. The last column I will touch on is the query_plan_hash column. A lot of times when you have Dynamic SQL running on your server, you have similar statements with different parameter values being passed in. Many times these types of statements will get similar execution plans and then a Binary hash value can be generated based on these similar plans. This query plan hash can be used to find the cost of all queries that have similar execution plans and then you can tune based on that plan to improve the performance of all of the individual queries. This is a very powerful way of identifying and tuning Ad-hoc statements that run on your server. As I stated earlier, this sys.dm_exec_query_stats DMV is a very powerful and recommended DMV for performance tuning. You are able to quickly identify statements that are running on your server and analyze their impact on system resources. Using this DMV to track down the biggest performance killers on your server will allow you to make the biggest gains once you focus your tuning efforts on those top offenders. For more information about this DMV, please see the below Books Online link: http://msdn.microsoft.com/en-us/library/ms189741.aspx Follow me on Twitter @PrimeTimeDBA

Read the article
SQLAuthority News – Amazon Gift Card Raffle for Beta Tester Feedback for NuoDB

- by pinaldave

As regular readers know I’ve been spending some time working with the NuoDB beta software. They contacted me last week and asked if I would give you a chance to try their new web-based console for their scalable, SQL-compliant database. They have just put out their final beta release, Beta 9. It contains a preview of a new web-based “NuoConsole” that will replace and extend the functionality of their current desktop version. I haven’t spent any time with the new console yet but a really quick look tells me it should make it easier to do deeper monitoring than the older one. It also looks like they have added query-level reporting through the console. I will try to play with it soon. NuoDB is doing a last, big push to get some more feedback from developers before they release their 1.0 product sometime in the next several weeks. Since the console is new, they are especially interested in some quick feedback on it before general availability. For SQLAuthority readers only, NuoDB will raffle off three $50 Amazon gift cards in exchange for your feedback on the NuoConsole preview. Here’s how to Enter Download NuoDBeta 9 here You must build a domain before you can start the console. Launch the Web Console. Windows Code: start java -jar jarnuodbwebconsole.jar Mac, Linux, Solaris, Unix Code: java -jar jar/nuodbwebconsole.jar Access the Web Console: Code: http://localhost:8080 When you have tried it out, go to a short (8 question) survey to enter the raffle Click here for the survey You must complete the survey before midnight EDT on October 17, 2012. Here’s what else they are saying about this last beta before general availability: Beta 9 now supports the Zend PHP framework so that PHP developers can directly integrate web applications with NuoDB. Multi-threaded HDFS support – NuoDB Storage Managers can now be configured to persist data to the high performance Hadoop distributed file system (HDFS). Beta 9 optimizes for multi-thread I/O streams at maximum performance. This enhancement allows users to make Hadoop their core storage with no extra effort which is a pretty cool idea. Improved Performance –On a single transaction node, Beta 9 offers performance comparable with MySQL and MariaDB. As additional nodes are added, NuoDB performance improves significantly at near linear scale. Query & Explain Plan Logging – Beta 9 introduces SQL explain plans for your queries. Qualify queries with the word “EXPLAIN” and NuoDB will respond with the details of the execution plan allowing performance optimization to SQL. Through the NuoConsole, you can now kill hung or long running queries. Java App Server Support – Beta 9 now supports leading Web JEE app servers including JBoss, Tomcat, and ColdFusion. They’ve also reported: Improved PHP/PDO drivers Support for Drupal Faster Ruby on Rails driver The Hibernate Dialect supports version 4.1 And good news for my readers: numerous SQL enhancements They will share the results of the web console feedback with me. I’ll let you know how it goes. Also the winner of their last contest was Jaime Martínez Lafargue! Do leave a comment here once you complete the survey. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: SQL Authority Tagged: NuoDB

Read the article
Big Data – Data Mining with Hive – What is Hive? – What is HiveQL (HQL)? – Day 15 of 21

- by Pinal Dave

In yesterday’s blog post we learned the importance of the operational database in Big Data Story. In this article we will understand what is Hive and HQL in Big Data Story. Yahoo started working on PIG (we will understand that in the next blog post) for their application deployment on Hadoop. The goal of Yahoo to manage their unstructured data. Similarly Facebook started deploying their warehouse solutions on Hadoop which has resulted in HIVE. The reason for going with HIVE is because the traditional warehousing solutions are getting very expensive. What is HIVE? Hive is a datawarehouseing infrastructure for Hadoop. The primary responsibility is to provide data summarization, query and analysis. It supports analysis of large datasets stored in Hadoop’s HDFS as well as on the Amazon S3 filesystem. The best part of HIVE is that it supports SQL-Like access to structured data which is known as HiveQL (or HQL) as well as big data analysis with the help of MapReduce. Hive is not built to get a quick response to queries but it it is built for data mining applications. Data mining applications can take from several minutes to several hours to analysis the data and HIVE is primarily used there. HIVE Organization The data are organized in three different formats in HIVE. Tables: They are very similar to RDBMS tables and contains rows and tables. Hive is just layered over the Hadoop File System (HDFS), hence tables are directly mapped to directories of the filesystems. It also supports tables stored in other native file systems. Partitions: Hive tables can have more than one partition. They are mapped to subdirectories and file systems as well. Buckets: In Hive data may be divided into buckets. Buckets are stored as files in partition in the underlying file system. Hive also has metastore which stores all the metadata. It is a relational database containing various information related to Hive Schema (column types, owners, key-value data, statistics etc.). We can use MySQL database over here. What is HiveSQL (HQL)? Hive query language provides the basic SQL like operations. Here are few of the tasks which HQL can do easily. Create and manage tables and partitions Support various Relational, Arithmetic and Logical Operators Evaluate functions Download the contents of a table to a local directory or result of queries to HDFS directory Here is the example of the HQL Query: SELECT upper(name), salesprice FROM sales; SELECT category, count(1) FROM products GROUP BY category; When you look at the above query, you can see they are very similar to SQL like queries. Tomorrow In tomorrow’s blog post we will discuss about very important components of the Big Data Ecosystem – Pig. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
Getting Started with StreamInsight 2.1

- by Roman Schindlauer

If you're just beginning to get familiar with StreamInsight, you may be looking for a way to get started. What are the basics? How can I get my first StreamInsight application running so I can see how it works? Where is the 'front door' that will get me going? If that describes you, then this blog entry might be just what you need. If you're already a StreamInsight wiz, keep reading anyway - you may find some helpful links here that you weren't aware of. But here's what we'd like from you experienced readers in particular: if you know of other good resources that we missed, please feel free to add them in the comments below. We appreciate you sharing your expertise. The Book The basic documentation for StreamInsight is located in the MSDN Library (Microsoft StreamInsight 2.1). You'll notice that previous versions of StreamInsight are still there (1.2 and 2.0), but if you're just getting started you can stick to the 2.1 section. The documentation has been organized to function as reference material, which is fine after you're familiar with the technology. But if you're trying to learn the basics, you might want to take a different path instead of just starting at the top. The following is one map you can use. What Is StreamInsight? Here is a sequence of topics that should give you a good overview of what StreamInsight is and how it works: Overview answers the question, "what is it?" StreamInsight Server Architecture gives you a quick look at a high-level architectural drawing StreamInsight Concepts lays out an overview of the basic components Deploying StreamInsight Entities to a StreamInsight Server describes the mechanics of how these components work together Getting an Example Running Once you have this background, go ahead and install StreamInsight and get a basic example up and running: Installation download and install the software StreamInsight Examples walk through a set of 3 simple StreamInsight applications that work together to demonstrate what you learned in the topics above; you can copy and paste the code into Visual Studio, compile, and run That's it - you now have a real, functioning StreamInsight system! Now that you have a handle on the basics, you might want to start digging deeper. Digging Deeper Here's a suggested path through the documentation to help you understand the next layer of StreamInsight technologies: Using Event Sources and Event Sinks sources supply data and sinks consume it; this topic gives you an overview of how they work Publishing and Connecting to the StreamInsight Server practical details on how to set up a StreamInsight server A Hitchhiker’s Guide to StreamInsight 2.1 Queries queries are the heart of how StreamInsight performs data analytics, and this whitepaper will help you really understand how they work Using StreamInsight LINQ root through this section for technical details on specific query components Using the StreamInsight Event Flow Debugger in addition to troubleshooting, the debugger is a great way to learn more about what goes on inside a StreamInsight application And Even Deeper Finally, to get a handle on some of the more complex things you can do with StreamInsight, dig into these: Input and Output Adapters adapters can be useful for handling more complex sources and sinks Building Resilient StreamInsight Applications a resilient application is able to recover from system failures Operations this section will help you monitor and troubleshoot a running StreamInsight system The StreamInsight Community As you're designing and developing your StreamInsight solutions, you probably will find it helpful to see working examples or to learn tips and tricks from others. Or maybe you need a place to post a vexing question. Here are some community resources that we have found useful. If you know of others, please add them in the comments below. Code samples and tools Official StreamInsight code samples Introduction to LinqPad Driver for StreamInsight 2.1 - LinqPad is a very useful tool for developing queries The following case studies are based on earlier versions of StreamInsight, but they still are useful examples: Microsoft Media Analytics - real-time monitoring and analytic Edgenet - responding to information from multiple source ICONICS - managing energy usage Blogs Microsoft StreamInsight Ruminations of J.net Richard Seroter's Architecture Musings pluralsight Forums MSDN StreamInsight Forum stackoverflow Training Microsoft StreamInsight Fundamentals (“Introducing StreamInsight” is free) from pluralsight Twitter @streaminsight You’re a StreamInsight Expert That should get you going. Please add any other resources you have found useful in the comments below. Regards, The StreamInsight Team

Read the article
Looking under the hood of SSRS

- by Jim Giercyk

SSRS is a powerful tool, but there is very little available to measure it’s performance or view the SSRS execution log or catalog in detail. Here are a few simple queries that will give you insight to the system that you never had before. ACTIVE REPORTS: Have you ever seen your SQL Server performance take a nose dive due to a long-running report? If the SPID is executing under a generic Report ID, or it is a scheduled job, you may have no way to tell which report is killing your server. Running this query will show you which reports are executing at a given time, and WHO is executing them. USE ReportServerNative SELECT runningjobs.computername, runningjobs.requestname, runningjobs.startdate, users.username, Datediff(s,runningjobs.startdate, Getdate()) / 60 AS 'Active Minutes' FROM runningjobs INNER JOIN users ON runningjobs.userid = users.userid ORDER BY runningjobs.startdate SSRS CATALOG: We have all asked “What was the last thing that changed”, or better yet, “Who in the world did that!”. Here is a query that will show all of the reports in your SSRS catalog, when they were created and changed, and by who. USE ReportServerNative SELECT DISTINCT catalog.PATH, catalog.name, users.username AS [Created By], catalog.creationdate, users_1.username AS [Modified By], catalog.modifieddate FROM catalog INNER JOIN users ON catalog.createdbyid = users.userid INNER JOIN users AS users_1 ON catalog.modifiedbyid = users_1.userid INNER JOIN executionlogstorage ON catalog.itemid = executionlogstorage.reportid WHERE ( catalog.name <> '' ) SSRS EXECUTION LOG: Sometimes we need to know what was happening on the SSRS report server at a given time in the past. This query will help you do just that. You will need to set the timestart and timeend in the WHERE clause to suit your needs. USE ReportServerNative SELECT catalog.name AS report, executionlogstorage.username AS [User], executionlogstorage.timestart, executionlogstorage.timeend, Datediff(mi,e.timestart,e.timeend) AS ‘Time In Minutes', catalog.modifieddate AS [Report Last Modified], users.username FROM catalog (nolock) INNER JOIN executionlogstorage e (nolock) ON catalog.itemid = executionlogstorage.reportid INNER JOIN users (nolock) ON catalog.modifiedbyid = users.userid WHERE executionlogstorage.timestart >= Dateadd(s, -1, '03/31/2012') AND executionlogstorage.timeend <= Dateadd(DAY, 1, '04/02/2012') LONG RUNNING REPORTS: This query will show the longest running reports over a given time period. Note that the “>5” in the WHERE clause sets the report threshold at 5 minutes, so anything that ran less than 5 minutes will not appear in the result set. Adjust the threshold and start/end times to your liking. With this information in hand, you can better optimize your system by tweaking the longest running reports first. USE ReportServerNative SELECT executionlogstorage.instancename, catalog.PATH, catalog.name, executionlogstorage.username, executionlogstorage.timestart, executionlogstorage.timeend, Datediff(mi, e.timestart, e.timeend) AS 'Minutes', executionlogstorage.timedataretrieval, executionlogstorage.timeprocessing, executionlogstorage.timerendering, executionlogstorage.[RowCount], users_1.username AS createdby, CONVERT(VARCHAR(10), catalog.creationdate, 101) AS 'Creation Date', users.username AS modifiedby, CONVERT(VARCHAR(10), catalog.modifieddate, 101) AS 'Modified Date' FROM executionlogstorage e INNER JOIN catalog ON executionlogstorage.reportid = catalog.itemid INNER JOIN users ON catalog.modifiedbyid = users.userid INNER JOIN users AS users_1 ON catalog.createdbyid = users_1.userid WHERE ( e.timestart > '03/31/2012' ) AND ( e.timestart <= '04/02/2012' ) AND Datediff(mi, e.timestart, e.timeend) > 5 AND catalog.name <> '' ORDER BY 'Minutes' DESC I have used these queries to build SSRS reports that I can refer to quickly, and export to Excel if I need to report or quantify my findings. I encourage you to look at the data in the ReportServerNative database on your report server to understand the queries and create some of your own. For instance, you may want a query to determine which reports are using which shared data sources. Work smarter, not harder!

Read the article
Tuning Red Gate: #1 of Many

- by Grant Fritchey

Everyone runs into performance issues at some point. Same thing goes for Red Gate software. Some of our internal systems were running into some serious bottlenecks. It just so happens that we have this nice little SQL Server monitoring tool. What if I were to, oh, I don't know, use the monitoring tool to identify the bottlenecks, figure out the causes and then apply a fix (where possible) and then start the whole thing all over again? Just a crazy thought. OK, I was asked to. This is my first time looking through these servers, so here's how I'd go about using SQL Monitor to get a quick health check, sort of like checking the vitals on a patient. First time opening up our internal SQL Monitor instance and I was greeted with this: Oh my. Maybe I need to get our internal guys to read my blog. Anyway, I know that there are two servers where most of the load is. I'll drill down on the first. I'm selecting the server, not the instance, by clicking on the server name. That opens up the Global Overview page for the server. The information here much more applicable to the "oh my gosh, I have a problem now" type of monitoring. But, looking at this, I am seeing something immediately. There are four(4) drives on the system. The C:\ has an average read time of 16.9ms, more than double the others. Is that a problem? Not sure, but it's something I'll look at. It's write time is higher too. I'll keep drilling down, first, to the unclosed alerts on the server. Now things get interesting. SQL Monitor has a number of different types of alerts, some related to error states, others to service status, and then some related to performance. Guess what I'm seeing a bunch of right here: Long running queries and long job durations. If you check the dates, they're all recent, within the last 24 hours. If they had just been old, uncleared alerts, I wouldn't be that concerned. But with all these, all performance related, and all in the last 24 hours, yeah, I'm concerned. At this point, I could just start responding to the Alerts. If I click on one of the the Long-running query alerts, I'll get all kinds of cool data that can help me determine why the query ran long. But, I'm not in a reactive mode here yet. I'm still gathering data, trying to understand how the server works. I have the information that we're generating a lot of performance alerts, let's sock that away for the moment. Instead, I'm going to back up and look at the Global Overview for the SQL Instance. It shows all the databases on the server and their status. Then it shows a number of basic metrics about the SQL Server instance, again for that "what's happening now" view or things. Then, down at the bottom, there is the Top 10 expensive queries list: This is great stuff. And no, not because I can see the top queries for the last 5 minutes, but because I can adjust that out 3 days. Now I can see where some serious pain is occurring over the last few days. Databases have been blocked out to protect the guilty. That's it for the moment. I have enough knowledge of what's going on in the system that I can start to try to figure out why the system is running slowly. But, I want to look a little more at some historical data, to understand better how this server is behaving. More next time.

Read the article
SQL SERVER – Puzzle #1 – Querying Pattern Ranges and Wild Cards

- by Pinal Dave

Note: Read at the end of the blog post how you can get five Joes 2 Pros Book #1 and a surprise gift. I have been blogging for almost 7 years and every other day I receive questions about Querying Pattern Ranges. The most common way to solve the problem is to use Wild Cards. However, not everyone knows how to use wild card properly. SQL Queries 2012 Joes 2 Pros Volume 1 – The SQL Queries 2012 Hands-On Tutorial for Beginners Book On Amazon | Book On Flipkart Learn SQL Server get all the five parts combo kit Kit on Amazon | Kit on Flipkart Many people know wildcards are great for finding patterns in character data. There are also some special sequences with wildcards that can give you even more power. This series from SQL Queries 2012 Joes 2 Pros® Volume 1 will show you some of these cool tricks. All supporting files are available with a free download from the www.Joes2Pros.com web site. This example is from the SQL 2012 series Volume 1 in the file SQLQueries2012Vol1Chapter2.2Setup.sql. If you need help setting up then look in the “Free Videos” section on Joes2Pros under “Getting Started” called “How to install your labs” Querying Pattern Ranges The % wildcard character represents any number of characters of any length. Let’s find all first names that end in the letter ‘A’. By using the percentage ‘%’ sign with the letter ‘A’, we achieve this goal using the code sample below: SELECT * FROM Employee WHERE FirstName LIKE '%A' To find all FirstName values beginning with the letters ‘A’ or ‘B’ we can use two predicates in our WHERE clause, by separating them with the OR statement. Finding names beginning with an ‘A’ or ‘B’ is easy and this works fine until we want a larger range of letters as in the example below for ‘A’ thru ‘K’: SELECT * FROM Employee WHERE FirstName LIKE 'A%' OR FirstName LIKE 'B%' OR FirstName LIKE 'C%' OR FirstName LIKE 'D%' OR FirstName LIKE 'E%' OR FirstName LIKE 'F%' OR FirstName LIKE 'G%' OR FirstName LIKE 'H%' OR FirstName LIKE 'I%' OR FirstName LIKE 'J%' OR FirstName LIKE 'K%' The previous query does find FirstName values beginning with the letters ‘A’ thru ‘K’. However, when a query requires a large range of letters, the LIKE operator has an even better option. Since the first letter of the FirstName field can be ‘A’, ‘B’, ‘C’, ‘D’, ‘E’, ‘F’, ‘G’, ‘H’, ‘I’, ‘J’ or ‘K’, simply list all these choices inside a set of square brackets followed by the ‘%’ wildcard, as in the example below: SELECT * FROM Employee WHERE FirstName LIKE '[ABCDEFGHIJK]%' A more elegant example of this technique recognizes that all these letters are in a continuous range, so we really only need to list the first and last letter of the range inside the square brackets, followed by the ‘%’ wildcard allowing for any number of characters after the first letter in the range. Note: A predicate that uses a range will not work with the ‘=’ operator (equals sign). It will neither raise an error, nor produce a result set. --Bad query (will not error or return any records) SELECT * FROM Employee WHERE FirstName = '[A-K]%' Question: You want to find all first names that start with the letters A-M in your Customer table and end with the letter Z. Which SQL code would you use? a. SELECT * FROM Customer WHERE FirstName LIKE 'm%z' b. SELECT * FROM Customer WHERE FirstName LIKE 'a-m%z' c. SELECT * FROM Customer WHERE FirstName LIKE 'a-m%z' d. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]%z' e. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]z%' f. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]%z' g. SELECT * FROM Customer WHERE FirstName LIKE '[a-m]z%' Contest Leave a valid answer before June 18, 2013 in the comment section. 5 winners will be selected from all the valid answers and will receive Joes 2 Pros Book #1. 1 Lucky person will get a surprise gift from Joes 2 Pros. The contest is open for all the countries where Amazon ships the book (USA, UK, Canada, India and many others). Special Note: Read all the options before you provide valid answer as there is a small trick hidden in answers. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Joes 2 Pros, PostADay, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
PASS Summit 2012: keynote and Mobile BI announcements #sqlpass

- by Marco Russo (SQLBI)

Today at PASS Summit 2012 there have been several announcements during the keynote. Moreover, other news have not been highlighted in the keynote but are equally if not more important for the BI community. Let’s start from the big news in the keynote (other details on SQL Server Blog): Hekaton: this is the codename for in-memory OLTP technology that will appear (I suppose) in the next release of the SQL Server relational engine. The improvement in performance and scalability is impressive and it enables new scenarios. I’m curious to see whether it can be used also to improve ETL performance and how it differs from using SSD technology. Updates on Columnstore: In the next major release of SQL Server the columnstore indexes will be updatable and it will be possible to create a clustered index with Columnstore index. This is really a great news for near real-time reporting needs! Polybase: in 2013 it will debut SQL Server 2012 Parallel Data Warehouse (PDW), which will include the Polybase technology. By using Polybase a single T-SQL query will run queries across relational data and Hadoop data. A single query language for both. Sounds really interesting for using BigData in a more integrated way with existing relational databases. And, of course, to load a data warehouse using BigData, which is the ultimate goal that we all BI Pro have, right? SQL Server 2012 SP1: the Service Pack 1 for SQL Server 2012 is available now and it enable the use of PowerPivot for SharePoint and Power View on a SharePoint 2013 installation with Excel 2013. Power View works with Multidimensional cube: the long-awaited feature of being able to use PowerPivot with Multidimensional cubes has been shown by Amir Netz in an amazing demonstration during the keynote. The interesting thing is that the data model behind was based on a many-to-many relationship (something that is not fully supported by Power View with Tabular models). Another interesting aspect is that it is Analysis Services 2012 that supports DAX queries run on a Multidimensional model, enabling the use of any future tool generating DAX queries on top of a Multidimensional model. There are still no info about availability by now, but this is *not* included in SQL Server 2012 SP1. So what about Mobile BI? Well, even if not announced during the keynote, there is a dedicated session on this topic and there are very important news in this area: iOS, Android and Microsoft mobile platforms: the commitment is to get data exploration and visualization capabilities working within June 2013. This should impact at least Power View and SharePoint/Excel Services. This is the type of UI experience we are all waiting for, in order to satisfy the requests coming from users and customers. The important news here is that native applications will be available for both iOS and Windows 8 so it seems that Android will be supported initially only through the web. Unfortunately we haven’t seen any demo, so it’s not clear what will be the offline navigation experience (and whether there will be one). But at least we know that Microsoft is working on native applications in this area. I’m not too surprised that HTML5 is not the magic bullet for all the platforms. The next PASS Business Analytics conference in 2013 seems a good place to see this in action, even if I hope we don’t have to wait other six months before seeing some demo of native BI applications on mobile platforms! Viewing Reporting Services reports on iPad is supported starting with SQL Server 2012 SP1, which has been released today. This is another good reason to install SP1 on SQL Server 2012. If you are at PASS Summit 2012, come and join me, Alberto Ferrari and Chris Webb at our book signing event tomorrow, Thursday 8 2012, at the bookstore between 12:00pm and 12:30pm, or follow one of our sessions!

Read the article
SQL Server Optimizer Malfunction?

- by Tony Davis

There was a sharp intake of breath from the audience when Adam Machanic declared the SQL Server optimizer to be essentially "stuck in 1997". It was during his fascinating "Query Tuning Mastery: Manhandling Parallelism" session at the recent PASS SQL Summit. Paraphrasing somewhat, Adam (blog | @AdamMachanic) offered a convincing argument that the optimizer often delivers flawed plans based on assumptions that are no longer valid with today’s hardware. In 1997, when Microsoft engineers re-designed the database engine for SQL Server 7.0, SQL Server got its initial implementation of a cost-based optimizer. Up to SQL Server 2000, the developer often had to deploy a steady stream of hints in SQL statements to combat the occasionally wilful plan choices made by the optimizer. However, with each successive release, the optimizer has evolved and improved in its decision-making. It is still prone to the occasional stumble when we tackle difficult problems, join large numbers of tables, perform complex aggregations, and so on, but for most of us, most of the time, the optimizer purrs along efficiently in the background. Adam, however, challenged further any assumption that the current optimizer is competent at providing the most efficient plans for our more complex analytical queries, and in particular of offering up correctly parallelized plans. He painted a picture of a present where complex analytical queries have become ever more prevalent; where disk IO is ever faster so that reads from disk come into buffer cache faster than ever; where the improving RAM-to-data ratio means that we have a better chance of finding our data in cache. Most importantly, we have more CPUs at our disposal than ever before. To get these queries to perform, we not only need to have the right indexes, but also to be able to split the data up into subsets and spread its processing evenly across all these available CPUs. Improvements such as support for ColumnStore indexes are taking things in the right direction, but, unfortunately, deficiencies in the current Optimizer mean that SQL Server is yet to be able to exploit properly all those extra CPUs. Adam’s contention was that the current optimizer uses essentially the same costing model for many of its core operations as it did back in the days of SQL Server 7, based on assumptions that are no longer valid. One example he gave was a "slow disk" bias that may have been valid back in 1997 but certainly is not on modern disk systems. Essentially, the optimizer assesses the relative cost of serial versus parallel plans based on the assumption that there is no IO cost benefit from parallelization, only CPU. It assumes that a single request will saturate the IO channel, and so a query would not run any faster if we parallelized IO because the disk system simply wouldn’t be able to handle the extra pressure. As such, the optimizer often decides that a serial plan is lower cost, often in cases where a parallel plan would improve performance dramatically. It was challenging and thought provoking stuff, as were his techniques for driving parallelism through query logic based on subsets of rows that define the "grain" of the query. I highly recommend you catch the session if you missed it. I’m interested to hear though, when and how often people feel the force of the optimizer’s shortcomings. Barring mistakes, such as stale statistics, how often do you feel the Optimizer fails to find the plan you think it should, and what are the most common causes? Is it fighting to induce it toward parallelism? Combating unexpected plans, arising from table partitioning? Something altogether more prosaic? Cheers, Tony.

Read the article
The five steps of business intelligence adoption: where are you?

- by Red Gate Software BI Tools Team

When I was in Orlando and New York last month, I spoke to a lot of business intelligence users. What they told me suggested a path of BI adoption. The user’s place on the path depends on the size and sophistication of their organisation. Step 1: A company with a database of customer transactions will often want to examine particular data, like revenue and unit sales over the last period for each product and territory. To do this, they probably use simple SQL queries or stored procedures to produce data on demand. Step 2: The results from step one are saved in an Excel document, so business users can analyse them with filters or pivot tables. Alternatively, SQL Server Reporting Services (SSRS) might be used to generate a report of the SQL query for display on an intranet page. Step 3: If these queries are run frequently, or business users want to explore data from multiple sources more freely, it may become necessary to create a new database structured for analysis rather than CRUD (create, retrieve, update, and delete). For example, data from more than one system — plus external information — may be incorporated into a data warehouse. This can become ‘one source of truth’ for the business’s operational activities. The warehouse will probably have a simple ‘star’ schema, with fact tables representing the measures to be analysed (e.g. unit sales, revenue) and dimension tables defining how this data is aggregated (e.g. by time, region or product). Reports can be generated from the warehouse with Excel, SSRS or other tools. Step 4: Not too long ago, Microsoft introduced an Excel plug-in, PowerPivot, which allows users to bring larger volumes of data into Excel documents and create links between multiple tables. These BISM Tabular documents can be created by the database owners or other expert Excel users and viewed by anyone with Excel PowerPivot. Sometimes, business users may use PowerPivot to create reports directly from the primary database, bypassing the need for a data warehouse. This can introduce problems when there are misunderstandings of the database structure or no single ‘source of truth’ for key data. Step 5: Steps three or four are often enough to satisfy business intelligence needs, especially if users are sophisticated enough to work with the warehouse in Excel or SSRS. However, sometimes the relationships between data are too complex or the queries which aggregate across periods, regions etc are too slow. In these cases, it can be necessary to formalise how the data is analysed and pre-build some of the aggregations. To do this, a business intelligence professional will typically use SQL Server Analysis Services (SSAS) to create a multidimensional model — or “cube” — that more simply represents key measures and aggregates them across specified dimensions. Step five is where our tool, SSAS Compare, becomes useful, as it helps review and deploy changes from development to production. For us at Red Gate, the primary value of SSAS Compare is to establish a dialog with BI users, so we can develop a portfolio of products that support creation and deployment across a range of report and model types. For example, PowerPivot and the new BISM Tabular model create a potential customer base for tools that extend beyond BI professionals. We’re interested in learning where people are in this story, so we’ve created a six-question survey to find out. Whether you’re at step one or step five, we’d love to know how you use BI so we can decide how to build tools that solve your problems. So if you have a sixty seconds to spare, tell us on the survey!

Read the article
How to get decent MySQL driver perfomance in Ruby

- by Zombies

I notice that I am getting very poor performance for either or both inserts and queries. The queries themselves are basic and can execute with no delay directly from mysql. The ruby script that I wrote is only 1 thread, so only 1 connection is being used, and never closed unless the script is terminated. Pretty basic, I am just trying to insert a lot of rows. There is a look-up or two to get a surrogate key, or to check for duplicates, but the complexity is just O(n). Also, it isn't like there are millions of records, so again the queries themselves take no time to run. I am using: Ruby 1.9.1 Gem/driver:ruby-mysql 2.9.2 MySQL 5.1.37-1ubuntu5.1 ^ all 32 bit versions on a 32bit ubuntu distro I am getting about 1-2 inserts per second, pretty slow. I know a lot of people will suggest to change drivers, but that means I have some refactoring and resting to do. So I would really appreciate any help, but please if you do recomend that at least say why you do (eg: if you have used ruby-mysql x.x.x before and found another mysql driver to be better).ruby-mysql 2.9.2 What I would like to know: How can I improve performance with ruby-mysql 2.9.2 If and only if I cannot do this with ruby-mysql 2.9.2, what should I do?

Read the article
Lazy loading of Blob properties of one class

- by Khosro

Hi, My class contains "summary" and "title" properties those are Blob and other properties. Code:(I write some part of class) public class News extends BaseEntity{ @Lob @Basic(fetch = FetchType.LAZY) public String getSummary() { return summary; } @Lob @Basic(fetch = FetchType.LAZY) public String getTitle() { return title; } @Temporal(TemporalType.TIMESTAMP) public Date getPublishDate() { return publishDate; } } I instrument this class to lazy load of Blob properties using this class "org.hibernate.tool.instrument.javassist.InstrumentTask". When i write this code to retrieve only summary of new , newsDAO.findByid(1L).getSummary(); Hibernate generates these queries: Hibernate: select news0_.id as id1_, news0_.entityVersion as entityVe2_1_, news0_.publishDate as publish15_1_, news0_.url as url1_ from News news0_ Hibernate: select news_.summary as summary1_, news_.title as title1_ from News news_ where news_.id=? I have two qurestions: 1.I only want to retrieve "summary" property not "title" property,but Hibernate queries show that it also retrieve "title" property,Why this happens(i want to lazy load of "property")? It seems when i load one of Blob property ,Hibernate loads all of them.Why?(This is my main question) 2.Why Hibernate generates two queries for retrieving only "summary" property of news? Khosro.

Read the article
Alternatives to decompiling MS Access MDE files

- by booyaa

I've been tasked with finding a suitable tool to decompile MDE files. The MDEs were created by staff who have since left (familar story eh?) and we do not have access to the originally MDB files. The reason we need access to the original code is that the data source is changing (the backend as well as some of the table and queries) and we need a way to update queries. An example of a change, in a SELECT statement where is the WHERE clause looks for zero as a string ("0") rather than an integer. I'm aware that unless you use the services of people like EverythingAccess.com its unlikely you will ever get the source code back. My main query is to ask for alternative methods to decompiling code. An example of the kinds of methods I'm thinking about is to spy on the traffic between the app the the ODBC DSN using tcpdump. I might then be able to write code to translate the data source queries between the old and new systems. Ideally I'd prefer a solution that is application centric rather than one that analyses all network traffic. I should add one caveat, no doubt most of you are thinking the best solution is to rewrite the code, based on its perceived functionality. This is the option we're not considering (at the moment).

Read the article
Help with fql.multiQuery

- by Daniel Schaffer

I'm playing around with the Facebook API's fql.multiQuery method. I'm just using the API Test Console, and trying to get a successful response but can't seem to figure out exactly what it wants. Here's the text I'm entering into the "queries" field: {"tags" : "select subject from photo_tag where subject != 601599551 and pid in ( select pid from photo_tag where subject = 601599551 ) and subject in ( select uid2 from friend where uid1 = 601599551 )", "foo" : "select uid from user where uid = 601599551"} All it'll give me is a queries parameter: array expected. error. I've also tried just about every permutation I could think of involving wrapping the name/query pairs in their own curly braces, adding brackets, adding whitespace, removing whitespace in case it didn't want an associative array (for those watching the edits, I just found out about these wonderful things now... oy), all to no avail. Is there something painfully obvious I'm missing here, or do I need to make like Chuck Norris Jon Skeet and simply will it to do my bidding? Update: A note to anyone finding this question now: The fql.multiquery test console appears to be broken. You can test your query by clicking on the generated url in the test console and manually adding the "queries" parameter into the querystring.

Read the article
Several Small, Specific, MySQL Query Cache Questions

- by Robbie

I've look all over the web and in the questions asked here about MySQL caching and most of them seem very non-specific about a couple of questions that I have about performance and MySQL query caching. Specifically I want answers to these questions, assume for all questions that I have the query cache enabled and it is of type 2, or "DEMAND": Is the query cache per table, per database, or per server? Meaning if I have the cache size set to X and have T tables and D databases will I be caching TX, DX, or X amount of data? If I have table T1 which I regularly use the SQL_CACHE hint on for SELECT queries and table T2 which I never do, when I query T2 with a SELECT query will it check through the cache first before performing the query? *Note: I don't want to use the SQL_NO_CACHE for all T2 queries.* Assume the same situation as in question 2. If I alter (INSERT, DELETE) table T2 will any processing be done on the cache? For answers to 2 and 3, is this processing time negligible if T2 is constantly being altered and is the target of a majority of my SELECT queries?

Read the article
dynamically horizontal scalable key value store

- by Zubair

Hi, Is there a key value store that will give me the following: Allow me to simply add and remove nodes and will redstribute the data automatically Allow me to remove nodes and still have 2 extra data nodes to provide redundancy Allow me to store text or images up to 1GB in size Can store small size data up to 100TB of data Fast (so will allow queries to be performed on top of it) Make all this transparent to the client Works on Ubuntu/FreeBSD or Mac Free or open source I basically want something I can use a "single", and not have to worry about having memcached, a db, and several storage components so yes, I do want a database "silver bullet" you could say. Thanks Zubair Answers so far: MogileFS on top of BackBlaze - As far as I can see this is just a filesystem, and after some research it only seems to be appropriate for large image files Tokyo Tyrant - Needs lightcloud. This doesn't auto scale as you add new nodes. I did look into this and it seems it is very fast for queries which fit onto a single node though Riak - This is one I am looking into myself, but I don't have any results yet Amazon S3 - Is anyone using this as their sole persistance layer in production? From what I have seen it seems to be used for storage of images as complex queries are too expensive @shaman suggested Cassandra - definitely one I am looking into So far it seems that there is no database or key value store that fulfills the criteria I mentioned, not even after offering a bounty of 100 points did the question get answered!

Read the article
Should we have a database independent SQL like query language in Django? [closed]

- by Yugal Jindle

Note : I know we have Django ORM already that keeps things database independent and converts to the database specific SQL queries. Once things starts getting complicated it is preferred to write raw SQL queries for better efficiency. When you write raw sql queries your code gets trapped with the database you are using. I also understand its important to use the full power of your database that can-not be achieved with the django orm alone. My Question : Until I use any database specific feature, why should one be trapped with the database. For instance : We have a query with multiple joins and we decided to write a raw sql query. Now, that makes my website postgres specific. Even when I have not used any postgres specific feature. I feel there should be some fake sql language which can translate to any database's sql query. Even Django's ORM can be built over it. So, that if you go out of ORM but not database specific - you can still remain database independent. I asked the same question to Jacob Kaplan Moss (In person) : He advised me to stay with the database that I like and endure its whole power, to which I agree. But my point was not that we should be database independent. My point is we should be database independent until we use a database specific feature. Please explain, why should be there a fake sql layer over the actual sql ?

Read the article
How best to use XPath with very large XML files in .NET?

- by glenatron

I need to do some processing on fairly large XML files ( large here being potentially upwards of a gigabyte ) in C# including performing some complex xpath queries. The problem I have is that the standard way I would normally do this through the System.XML libraries likes to load the whole file into memory before it does anything with it, which can cause memory problems with files of this size. I don't need to be updating the files at all just reading them and querying the data contained in them. Some of the XPath queries are quite involved and go across several levels of parent-child type relationship - I'm not sure whether this will affect the ability to use a stream reader rather than loading the data into memory as a block. One way I can see of making it work is to perform the simple analysis using a stream-based approach and perhaps wrapping the XPath statements into XSLT transformations that I could run across the files afterward, although it seems a little convoluted. Alternately I know that there are some elements that the XPath queries will not run across, so I guess I could break the document up into a series of smaller fragments based on it's original tree structure, which could perhaps be small enough to process in memory without causing too much havoc. I've tried to explain my objective here so if I'm barking up totally the wrong tree in terms of general approach I'm sure you folks can set me right...

Read the article
SQL Server insert performance

- by Jose

I have an insert query that gets generated like this INSERT INTO InvoiceDetail (LegacyId,InvoiceId,DetailTypeId,Fee,FeeTax,Investigatorid,SalespersonId,CreateDate,CreatedById,IsChargeBack,Expense,RepoAgentId,PayeeName,ExpensePaymentId,AdjustDetailId) VALUES(1,1,2,1500.0000,0.0000,163,1002,'11/30/2001 12:00:00 AM',1116,0,550.0000,850,NULL,@ExpensePay1,NULL); DECLARE @InvDetail1 INT; SET @InvDetail1 = (SELECT @@IDENTITY); This query is generated for only 110K rows. It takes 30 minutes for all of these query's to execute I checked the query plan and the largest % nodes are A Clustered Index Insert at 57% query cost which has a long xml that I don't want to post. A Table Spool which is 38% query cost <RelOp AvgRowSize="35" EstimateCPU="5.01038E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Eager Spool" NodeId="80" Parallel="false" PhysicalOp="Table Spool" EstimatedTotalSubtreeCost="0.0466109"> <OutputList> <ColumnReference Database="[SkipPro]" Schema="[dbo]" Table="[InvoiceDetail]" Column="InvoiceId" /> <ColumnReference Database="[SkipPro]" Schema="[dbo]" Table="[InvoiceDetail]" Column="InvestigatorId" /> <ColumnReference Column="Expr1054" /> <ColumnReference Column="Expr1055" /> </OutputList> <Spool PrimaryNodeId="3" /> </RelOp> So my question is what is there that I can do to improve the speed of this thing? I already run ALTER TABLE TABLENAME NOCHECK CONSTRAINTS ALL Before the queries and then ALTER TABLE TABLENAME NOCHECK CONSTRAINTS ALL after the queries. And that didn't shave off hardly anything off of the time. Know I am running these queries in a .NET application that uses a SqlCommand object to send the query. I then tried to output the sql commands to a file and then execute it using sqlcmd, but I wasn't getting any updates on how it was doing, so I gave up on that. Any ideas or hints or help?

Read the article

Search Results

Search found 4685 results on 188 pages for 'queries'.

Page 43/188 | < Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >

- by Pinal Dave

- by Pinal Dave

- by xmarch

- by Dave Ballantyne

- by Nick Harrison

- by pinaldave

- by Tamarick Hill

- by pinaldave

- by Pinal Dave

- by Roman Schindlauer

- by Jim Giercyk

- by Grant Fritchey

- by Pinal Dave

- by Marco Russo (SQLBI)

- by Tony Davis

- by Red Gate Software BI Tools Team

- by Zombies

- by Khosro

- by booyaa

- by Daniel Schaffer

- by Robbie

- by Zubair

- by Yugal Jindle

- by glenatron

- by Jose

< Previous Page | 39 40 41 42 43 44 45 46 47 48 49 50 | Next Page >