Search Results

Search found 4788 results on 192 pages for 'adhoc queries'.

Page 59/192 | < Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66  | Next Page >

  • The SSIS tuning tip that everyone misses

    - by Rob Farley
    I know that everyone misses this, because I’m yet to find someone who doesn’t have a bit of an epiphany when I describe this. When tuning Data Flows in SQL Server Integration Services, people see the Data Flow as moving from the Source to the Destination, passing through a number of transformations. What people don’t consider is the Source, getting the data out of a database. Remember, the source of data for your Data Flow is not your Source Component. It’s wherever the data is, within your database, probably on a disk somewhere. You need to tune your query to optimise it for SSIS, and this is what most people fail to do. I’m not suggesting that people don’t tune their queries – there’s plenty of information out there about making sure that your queries run as fast as possible. But for SSIS, it’s not about how fast your query runs. Let me say that again, but in bolder text: The speed of an SSIS Source is not about how fast your query runs. If your query is used in a Source component for SSIS, the thing that matters is how fast it starts returning data. In particular, those first 10,000 rows to populate that first buffer, ready to pass down the rest of the transformations on its way to the Destination. Let’s look at a very simple query as an example, using the AdventureWorks database: We’re picking the different Weight values out of the Product table, and it’s doing this by scanning the table and doing a Sort. It’s a Distinct Sort, which means that the duplicates are discarded. It'll be no surprise to see that the data produced is sorted. Obvious, I know, but I'm making a comparison to what I'll do later. Before I explain the problem here, let me jump back into the SSIS world... If you’ve investigated how to tune an SSIS flow, then you’ll know that some SSIS Data Flow Transformations are known to be Blocking, some are Partially Blocking, and some are simply Row transformations. Take the SSIS Sort transformation, for example. I’m using a larger data set for this, because my small list of Weights won’t demonstrate it well enough. Seven buffers of data came out of the source, but none of them could be pushed past the Sort operator, just in case the last buffer contained the data that would be sorted into the first buffer. This is a blocking operation. Back in the land of T-SQL, we consider our Distinct Sort operator. It’s also blocking. It won’t let data through until it’s seen all of it. If you weren’t okay with blocking operations in SSIS, why would you be happy with them in an execution plan? The source of your data is not your OLE DB Source. Remember this. The source of your data is the NCIX/CIX/Heap from which it’s being pulled. Picture it like this... the data flowing from the Clustered Index, through the Distinct Sort operator, into the SELECT operator, where a series of SSIS Buffers are populated, flowing (as they get full) down through the SSIS transformations. Alright, I know that I’m taking some liberties here, because the two queries aren’t the same, but consider the visual. The data is flowing from your disk and through your execution plan before it reaches SSIS, so you could easily find that a blocking operation in your plan is just as painful as a blocking operation in your SSIS Data Flow. Luckily, T-SQL gives us a brilliant query hint to help avoid this. OPTION (FAST 10000) This hint means that it will choose a query which will optimise for the first 10,000 rows – the default SSIS buffer size. And the effect can be quite significant. First let’s consider a simple example, then we’ll look at a larger one. Consider our weights. We don’t have 10,000, so I’m going to use OPTION (FAST 1) instead. You’ll notice that the query is more expensive, using a Flow Distinct operator instead of the Distinct Sort. This operator is consuming 84% of the query, instead of the 59% we saw from the Distinct Sort. But the first row could be returned quicker – a Flow Distinct operator is non-blocking. The data here isn’t sorted, of course. It’s in the same order that it came out of the index, just with duplicates removed. As soon as a Flow Distinct sees a value that it hasn’t come across before, it pushes it out to the operator on its left. It still has to maintain the list of what it’s seen so far, but by handling it one row at a time, it can push rows through quicker. Overall, it’s a lot more work than the Distinct Sort, but if the priority is the first few rows, then perhaps that’s exactly what we want. The Query Optimizer seems to do this by optimising the query as if there were only one row coming through: This 1 row estimation is caused by the Query Optimizer imagining the SELECT operation saying “Give me one row” first, and this message being passed all the way along. The request might not make it all the way back to the source, but in my simple example, it does. I hope this simple example has helped you understand the significance of the blocking operator. Now I’m going to show you an example on a much larger data set. This data was fetching about 780,000 rows, and these are the Estimated Plans. The data needed to be Sorted, to support further SSIS operations that needed that. First, without the hint. ...and now with OPTION (FAST 10000): A very different plan, I’m sure you’ll agree. In case you’re curious, those arrows in the top one are 780,000 rows in size. In the second, they’re estimated to be 10,000, although the Actual figures end up being 780,000. The top one definitely runs faster. It finished several times faster than the second one. With the amount of data being considered, these numbers were in minutes. Look at the second one – it’s doing Nested Loops, across 780,000 rows! That’s not generally recommended at all. That’s “Go and make yourself a coffee” time. In this case, it was about six or seven minutes. The faster one finished in about a minute. But in SSIS-land, things are different. The particular data flow that was consuming this data was significant. It was being pumped into a Script Component to process each row based on previous rows, creating about a dozen different flows. The data flow would take roughly ten minutes to run – ten minutes from when the data first appeared. The query that completes faster – chosen by the Query Optimizer with no hints, based on accurate statistics (rather than pretending the numbers are smaller) – would take a minute to start getting the data into SSIS, at which point the ten-minute flow would start, taking eleven minutes to complete. The query that took longer – chosen by the Query Optimizer pretending it only wanted the first 10,000 rows – would take only ten seconds to fill the first buffer. Despite the fact that it might have taken the database another six or seven minutes to get the data out, SSIS didn’t care. Every time it wanted the next buffer of data, it was already available, and the whole process finished in about ten minutes and ten seconds. When debugging SSIS, you run the package, and sit there waiting to see the Debug information start appearing. You look for the numbers on the data flow, and seeing operators going Yellow and Green. Without the hint, I’d sit there for a minute. With the hint, just ten seconds. You can imagine which one I preferred. By adding this hint, it felt like a magic wand had been waved across the query, to make it run several times faster. It wasn’t the case at all – but it felt like it to SSIS.

    Read the article

  • SQL Server CTE Basics

    The CTE was introduced into standard SQL in order to simplify various classes of SQL Queries for which a derived table just wasn't suitable. For some reason, it can be difficult to grasp the techniques of using it. Well, that's before Rob Sheldon explained it all so clearly for us.

    Read the article

  • Useful utilities - LINQPAD

    - by TATWORTH
    Recently I came across LINQPAD at http://www.linqpad.net/ a free utility by Joseph Alabahari. This is an excellent tool for developing and testing LINQ queries before you incorporate them into your C# programs. If you get stuck as I did at one point recently there is the MSDN Linq forum at http://forums.microsoft.com/MSDN/ShowForum.aspx?siteid=1&ForumID=123 where  you can ask for help.

    Read the article

  • Interesting links week #7

    - by erwin21
    Below a list of interesting links that I found this week: Frontend: HTML5 Peeks, Pokes and Pointers HTML 5 Markup that Gracefully Degrades Mobile Sites vs. Media Queries Development: Register your HTTP modules at runtime without config mobl - Open Source Language For Mobile Development PageMethod an easier and faster approach for Asp.Net AJAX Interested in more interesting links follow me at twitter http://twitter.com/erwingriekspoor

    Read the article

  • Decoding the SQL Server Index Structure

    A deep dive into the implementation of indexes in SQL Server 2008 R2. This is information that you must know in order to tune your queries for optimum performance. Partial scans of indexes are now possible! SQL Server monitoring made easy "Keeping an eye on our many SQL Server instances is much easier with SQL Response." Mike Lile.Download a free trial of SQL Response now.

    Read the article

  • Temp Table Recompiles

    - by Derek D.
    If you landed on this article, then you most likely know that temp tables can cause recompilation. This happens because temp tables are treated just like regular tables by the SQL Server Engine. When the tables (in which underlying queries rely on) change significantly, SQL Server detects this change (using auto update statistics) [...]

    Read the article

  • Responsive Design: Media query fix for IE10 on Windows Phone 8

    - by ihaynes
    Originally posted on: http://geekswithblogs.net/ihaynes/archive/2013/07/01/responsive-design-media-query-fix-for-ie10-on--windows.aspxThe version of IE10 on Windows Phone 8 apparently has a bug which results in media queries not seeing the correct device width.This post from Devhammer explains all.http://devhammer.net/responsive-design-fix-for-windows-phone-8-device-adaptationI'd not noticed this on the WP8 Emulator which proves yet again that testing on real devices is essential.

    Read the article

  • Is there any way to test how will the site perform under load

    - by Pankaj Upadhyay
    I have made an Asp.net MVC website and hosted it on a shared hosting provider. Since my website surrounds a very generic idea, it might have number of concurrent users sometime in future. So, I was thinking of a way to test my website for on-load performance. Like how will the site perform when 100 or 1000 users are online at the same time and surfing the website. This will also make me understand whether my LINQ queries are well written or not.

    Read the article

  • After 10 Years, MySQL Still the Right Choice for ScienceLogic's "Best Network Monitoring System on the Planet"

    - by Rebecca Hansen
    ScienceLogic has a pretty fantastic network monitoring appliance.  So good in fact that InfoWorld gave it their "2013 Best Network Monitoring System on the Planet" award.  Inside their "ultraflexible, ultrascalable, carrier-grade" enterprise appliance, ScienceLogic relies on MySQL and has since their start in 2003.  Check out some of the things they've been able to do with MySQL and their reasons for continuing to use MySQL in these highlights from our new MySQL ScienceLogic case study. Science Logic's larger customers use their appliance to monitor and manage  20,000+ devices, each of which generates a steady stream of data and a workload that is 85% write. On a large system, the MySQL database: Averages 8,000 queries every second or about 1 billion queries a day Can reach 175,000 tables and up to 20 million rows in a single table Is 2 terabytes on average and up to 6 terabytes "We told our customers they could add more and more devices. With MySQL, we haven't had any problems. When our customers have problems, we get calls. Not getting calls is a huge benefit." Matt Luebke, ScienceLogic Chief Software Architect.? ScienceLogic was approached by a number of Big Data / NoSQL vendors, but decided against using a NoSQL-only solution. Said Matt, "There are times when you really need SQL. NoSQL can't show me the top 10 users of CPU, or show me the bottom ten consumer of hard disk. That's why we weren't interested in changing and why we are very interested in MySQL 5.6. It's great that it can do relational and key-value using memcached." The ScienceLogic team is very cautious about putting only very stable technology into their product, and according to Matt, MySQL has been very stable: "We've been using MySQL for 10 years and we have never had any reliability problems. Ever." ScienceLogic now uses SSDs for their write-intensive appliance and that change alone has helped them achieve a 5x performance increase. Learn more>> ScienceLogic MySQL Case Study MySQL 5.6 InnoDB Compression options for better SSD performance Tuning MySQL 5.6 for Great Product Performance - on demand webinar Developer and DBA Guide to MySQL 5.6 white paper Guide to MySQL and NoSQL: The Best of Both Worlds white paper

    Read the article

  • Does Your Customer Engagement Create an Ah Feeling?

    - by Richard Lefebvre
    An (Oracle CX Blog) article by Christina McKeon Companies that successfully engage customers all have one thing in common. They make it seem easy for the customer to get what they need. No one would argue that brands don’t want to leave customers with this “ah” feeling. Since 94% of customers who have a low-effort service experience will buy from that company again, it makes financial sense for brands.1 Some brands are thinking differently about how they engage their customers to create ah feelings. How do they do it? Toyota is a great example of using smart assistance technology to understand customer intent and answer questions before customers hit the submit button online. What is unique in this situation is that Toyota captures intent while customers are filling out email forms. Toyota analyzes the data in the form and suggests responses before the customer sends the email. The customer gets the right answer, and the email never makes it to your contact center — which makes you and the customer happy. Most brands are fully aware of chat as a service channel, but some brands take chat to a whole new level. Beauty.com, part of the drugstore.com and Walgreens family of brands, uses live chat to replicate the personal experience that one would find at high-end department store cosmetic counters. Trained beauty advisors, all with esthetician or beauty counter experience, engage in live chat sessions with online shoppers to share immediate advice on the best products for their personal needs. Agents can watch customer activity online and determine the right time to reach out and offer help, just as help would be offered in a brick-and-mortar store. And, agents can co-browse along with the customer helping customers with online check-out. These personal chat discussions also give Beauty.com the opportunity to present products, advertise promotions, and resolve customer issues when they arise. Beauty.com converts approximately 25% of chat sessions into product orders. Photobox, the European market leader in online photo services, wanted to deliver personal and responsive service to its 24 million members. It ensures customer inquiries on personalized photo products are routed based on agent knowledge so customers get what they need from the company experts. By using a queuing system to ensure that the agent with the most appropriate knowledge handles the query, agent productivity increased while response times to 1,500 customer queries per day decreased. A real-time dashboard prevents agents from being overloaded with queries. This approach has produced financial results with a 15% increase in sales to existing customers and a 45% increase in orders from newly referred customers.

    Read the article

  • know more on /etc/hosts

    - by Habi
    Can somebody explain what does this mean? Explanation to each line will be helpful. I have mentioned some of my queries in comments too. 127.0.0.1 localhost //According to @Dave, it's machine ip. 127.0.1.1 dell-Inspiron-342 // then what is this ip of? The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters

    Read the article

  • SQL SERVER – Get 2 of My Books FREE at Koenig Tech Day – Where Technologies Converge!

    - by pinaldave
    As a regular reader of my blog – you must be aware of that I love to write books and talk about various subjects of my book. The founders of Koenig Solutions are my very old friends, I know them for many years. They have been my biggest supporter of my books. Coming weekend they have a technology event at their Bangalore Location. Every attendee of the technology event will get a set of two books worth Rs. 450 – ‘SQL Server Interview Questions And Answers‘ and ‘SQL Wait Stats Joes 2 Pros‘. I am going to cover a couple of topics of the books and present  as well. I am very confident that every attendee will be having a great time. I will be covering following subjects: SQL Server Tricks and Tips for Blazing Fast Performance Slow Running Queries (SQL) are the most common problem that developers face while working with SQL Server. While it is easy to blame the SQL Server for unsatisfactory performance, however the issue often persists with the way queries have been written, and how SQL Server has been set up. The session will focus on the ways of identifying problems that slow down SQL Servers, and tricks to fix them. Developers will walk out with scripts and knowledge that can be applied to their servers, immediately post the session. After the session is over – I will point to what exact location in the book where you can continue for the further learning. I am pretty excited, this is more like book reading but in entire different format. The one day event will cover four technologies in four separate interactive sessions on: Microsoft SQL Server Security VMware/Virtualization ASP.NET MVC Date of the event: Dec 15, 2012 9 AM to 6PM. Location of the event:  Koenig Solutions Ltd. # 47, 4th Block, 100 feet Road, 3rd Floor, Opp to Shanthi Sagar, Koramangala, Bangalore- 560034 Mobile : 09008096122 Office : 080- 41127140 Organizers have informed me that there are very limited seats for this event and technical session based on my book will start at Sharp 9 AM. If you show up late there are chances that you will not get any seats. Registration for the event is a MUST. Please visit this link for further information. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLAuthority Author Visit, SQLAuthority News, T SQL, Technology

    Read the article

  • When is a Seek not a Seek?

    - by Paul White
    The following script creates a single-column clustered table containing the integers from 1 to 1,000 inclusive. IF OBJECT_ID(N'tempdb..#Test', N'U') IS NOT NULL DROP TABLE #Test ; GO CREATE TABLE #Test ( id INTEGER PRIMARY KEY CLUSTERED ); ; INSERT #Test (id) SELECT V.number FROM master.dbo.spt_values AS V WHERE V.[type] = N'P' AND V.number BETWEEN 1 AND 1000 ; Let’s say we need to find the rows with values from 100 to 170, excluding any values that divide exactly by 10.  One way to write that query would be: SELECT T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; That query produces a pretty efficient-looking query plan: Knowing that the source column is defined as an INTEGER, we could also express the query this way: SELECT T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; We get a similar-looking plan: If you look closely, you might notice that the line connecting the two icons is a little thinner than before.  The first query is estimated to produce 61.9167 rows – very close to the 63 rows we know the query will return.  The second query presents a tougher challenge for SQL Server because it doesn’t know how to predict the selectivity of the modulo expression (T.id % 10 > 0).  Without that last line, the second query is estimated to produce 68.1667 rows – a slight overestimate.  Adding the opaque modulo expression results in SQL Server guessing at the selectivity.  As you may know, the selectivity guess for a greater-than operation is 30%, so the final estimate is 30% of 68.1667, which comes to 20.45 rows. The second difference is that the Clustered Index Seek is costed at 99% of the estimated total for the statement.  For some reason, the final SELECT operator is assigned a small cost of 0.0000484 units; I have absolutely no idea why this is so, or what it models.  Nevertheless, we can compare the total cost for both queries: the first one comes in at 0.0033501 units, and the second at 0.0034054.  The important point is that the second query is costed very slightly higher than the first, even though it is expected to produce many fewer rows (20.45 versus 61.9167). If you run the two queries, they produce exactly the same results, and both complete so quickly that it is impossible to measure CPU usage for a single execution.  We can, however, compare the I/O statistics for a single run by running the queries with STATISTICS IO ON: Table '#Test'. Scan count 63, logical reads 126, physical reads 0. Table '#Test'. Scan count 01, logical reads 002, physical reads 0. The query with the IN list uses 126 logical reads (and has a ‘scan count’ of 63), while the second query form completes with just 2 logical reads (and a ‘scan count’ of 1).  It is no coincidence that 126 = 63 * 2, by the way.  It is almost as if the first query is doing 63 seeks, compared to one for the second query. In fact, that is exactly what it is doing.  There is no indication of this in the graphical plan, or the tool-tip that appears when you hover your mouse over the Clustered Index Seek icon.  To see the 63 seek operations, you have click on the Seek icon and look in the Properties window (press F4, or right-click and choose from the menu): The Seek Predicates list shows a total of 63 seek operations – one for each of the values from the IN list contained in the first query.  I have expanded the first seek node to show the details; it is seeking down the clustered index to find the entry with the value 101.  Each of the other 62 nodes expands similarly, and the same information is contained (even more verbosely) in the XML form of the plan. Each of the 63 seek operations starts at the root of the clustered index B-tree and navigates down to the leaf page that contains the sought key value.  Our table is just large enough to need a separate root page, so each seek incurs 2 logical reads (one for the root, and one for the leaf).  We can see the index depth using the INDEXPROPERTY function, or by using the a DMV: SELECT S.index_type_desc, S.index_depth FROM sys.dm_db_index_physical_stats ( DB_ID(N'tempdb'), OBJECT_ID(N'tempdb..#Test', N'U'), 1, 1, DEFAULT ) AS S ; Let’s look now at the Properties window when the Clustered Index Seek from the second query is selected: There is just one seek operation, which starts at the root of the index and navigates the B-tree looking for the first key that matches the Start range condition (id >= 101).  It then continues to read records at the leaf level of the index (following links between leaf-level pages if necessary) until it finds a row that does not meet the End range condition (id <= 169).  Every row that meets the seek range condition is also tested against the Residual Predicate highlighted above (id % 10 > 0), and is only returned if it matches that as well. You will not be surprised that the single seek (with a range scan and residual predicate) is much more efficient than 63 singleton seeks.  It is not 63 times more efficient (as the logical reads comparison would suggest), but it is around three times faster.  Let’s run both query forms 10,000 times and measure the elapsed time: DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON; SET STATISTICS XML OFF; ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id IN ( 101,102,103,104,105,106,107,108,109, 111,112,113,114,115,116,117,118,119, 121,122,123,124,125,126,127,128,129, 131,132,133,134,135,136,137,138,139, 141,142,143,144,145,146,147,148,149, 151,152,153,154,155,156,157,158,159, 161,162,163,164,165,166,167,168,169 ) ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; GO DECLARE @i INTEGER, @n INTEGER = 10000, @s DATETIME = GETDATE() ; SET NOCOUNT ON ; WHILE @n > 0 BEGIN SELECT @i = T.id FROM #Test AS T WHERE T.id >= 101 AND T.id <= 169 AND T.id % 10 > 0 ; SET @n -= 1; END ; PRINT DATEDIFF(MILLISECOND, @s, GETDATE()) ; On my laptop, running SQL Server 2008 build 4272 (SP2 CU2), the IN form of the query takes around 830ms and the range query about 300ms.  The main point of this post is not performance, however – it is meant as an introduction to the next few parts in this mini-series that will continue to explore scans and seeks in detail. When is a seek not a seek?  When it is 63 seeks © Paul White 2011 email: [email protected] twitter: @SQL_kiwi

    Read the article

  • Sql Server Express Profiler

    - by csharp-source.net
    Sql Server Express Profiler is a profiler for MS SQL Server 2005 Express . SQL Server Express Edition Profiler provides the most of functionality standard profiler does, such as choosing events to profile, setting filters, etc. But it doesn't provide professional tools for profiling sql queries. This project is a .NET WinForms Application and in future AJAX-enabled web site which provides functionality of Microsoft SQL Profiler.

    Read the article

  • #DAX Query Plan in SQL Server 2012 #Tabular

    - by Marco Russo (SQLBI)
    The SQL Server Profiler provides you many information regarding the internal behavior of DAX queries sent to a BISM Tabular model. Similar to MDX, also in DAX there is a Formula Engine (FE) and a Storage Engine (SE). The SE is usually handled by Vertipaq (unless you are using DirectQuery mode) and Vertipaq SE Query classes of events gives you a SQL-like syntax that represents the query sent to the storage engine. Another interesting class of events is the DAX Query Plan , which contains a couple...(read more)

    Read the article

  • SQLAuthority News – Download Whitepaper – SQL Server Analysis Services to Hive

    - by pinaldave
    The SQL Server Analysis Service is a very interesting subject and I always have enjoyed learning about it. You can read my earlier article over here. Big Data is my new interest and I have been exploring it recently. During this weekend this blog post caught my attention and I enjoyed reading it. Big Data is the next big thing. The growth is predicted to be 60% per year till 2016. There is no single solution to the growing need of the big data available in the market right now as well there is no one solution in the business intelligence eco-system available as well. However, the need of the solution is ever increasing. I am personally Klout user. You can see my Klout profile over. I do understand what Klout is trying to achieve – a single place to measure the influence of the person. However, it works a bit mysteriously. There are plenty of social media available currently in the internet world. The biggest problem all the social media faces is that everybody opens an account but hardly people logs back in. To overcome this issue and have returned visitors Klout has come up with the system where visitors can give 5/10 K+ to other users in a particular area. Looking at all the activities Klout is doing it is indeed big consumer of the Big Data as well it is early adopter of the big data and Hadoop based system.  Klout has to 1 trillion rows of data to be analyzed as well have nearly thousand terabyte warehouse. Hive the language used for Big Data supports Ad-Hoc Queries using HiveQL there are always better solutions. The alternate solution would be using SQL Server Analysis Services (SSAS) along with HiveQL. As there is no direct method to achieve there are few common workarounds already in place. A new ODBC driver from Klout has broken through the limitation and SQL Server Relation Engine can be used as an intermediate stage before SSAS. In this white paper the same solutions have been discussed in the depth. The white paper discusses following important concepts. The Klout Big Data solution Big Data Analytics based on Analysis Services Hadoop/Hive and Analysis Services integration Limitations of direct connectivity Pass-through queries to linked servers Best practices and lessons learned This white paper discussed all the important concepts which have enabled Klout to go go to the next level with all the offerings as well helped efficiency by offering a few out of the box solutions. I personally enjoy reading this white paper and I encourage all of you to do so. SQL Server Analysis Services to Hive Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL White Papers, T SQL, Technology

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

  • SQL Azure Database Size Calculator

    - by kaleidoscope
    A neat trick on how to measure your database size in SQL Azure.  Here are the exact queries you can run to do it: Select Sum (reserved_page_count) * 8.0 / 1024 From sys.dm_db_partition_stats GO Select sys.objects.name, sum (reserved_page_count) * 8.0 / 1024 From sys.dm_db_partition_stats, sys.objects Where sys.dm_db_partition_stats.object_id = sys.objects.object_id Group by sys.objects.name The first one will give you the size of your database in MB and the second one will do the same, but break it out for each object in your database. http://www.azurejournal.com/2010/03/sql-azure-database-size-calculator/   Ritesh, D

    Read the article

  • Good SEO Depends on Your Use of Good Keywords

    In SEO, keywords are of highest significance. Keywords are words or phrases that search engines use in order to correspond internet pages with search queries. It is vital to improve your web site with strategic keywords in order to maximise aimed at traffic. You'll use keywords in both your on-page and off-page optimization.

    Read the article

  • Slow in the Application but Fast in SQL Server Management Studio - from Erland

    - by Greg Low
    Our MVP buddy Erland Sommarskog doesn't post articles that often but when he does, you should read them. His latest post is here: http://www.sommarskog.se/query-plan-mysteries.html It talks about why a query might be slow when sent from an application but fast when you execute it in SSMS. But it covers way more than that. There is a great deal of good info on how queries are executed and query plans generated. Highly recommended!...(read more)

    Read the article

  • More on PHP and Oracle 11gR2 Improvements to Client Result Caching

    - by christopher.jones
    Oracle 11.2 brought several improvements to Client Result Caching. CRC is way for the results of queries to be cached in the database client process for reuse.  In an Oracle OpenWorld presentation "Best Practices for Developing Performant Application" my colleague Luxi Chidambaran had a (non-PHP generated) graph for the Niles benchmark that shows a DB CPU reduction up to 600% and response times up to 22% faster when using CRC. Sometimes CRC is called the "Consistent Client Cache" because Oracle automatically invalidates the cache if table data is changed.  This makes it easy to use without needing application logic rewrites. There are a few simple database settings to turn on and tune CRC, so management is also easy. PHP OCI8 as a "client" of the database can use CRC.  The cache is per-process, so plan carefully before caching large data sets.  Tables that are candidates for caching are look-up tables where the network transfer cost dominates. CRC is really easy in 11.2 - I'll get to that in a moment.  It was also pretty easy in Oracle 11.1 but it needed some tiny application changes.  In PHP it was used like: $s = oci_parse($c, "select /*+ result_cache */ * from employees"); oci_execute($s, OCI_NO_AUTO_COMMIT); // Use OCI_DEFAULT in OCI8 <= 1.3 oci_fetch_all($s, $res); I blogged about this in the past.  The query had to include a specific hint that you wanted the results cached, and you needed to turn off auto committing during execution either with the OCI_DEFAULT flag or its new, better-named alias OCI_NO_AUTO_COMMIT.  The no-commit flag rule didn't seem reasonable to me because most people wouldn't be specific about the commit state for a query. Now in Oracle 11.2, DBAs can now nominate tables for caching, either with CREATE TABLE or ALTER TABLE.  That means you don't need the query hint anymore.  As well, the no-commit flag requirement has been lifted.  Your code can now look like: $s = oci_parse($c, "select * from employees"); oci_execute($s); oci_fetch_all($s, $res); Since your code probably already looks like this, your DBA can find the top queries in the database and simply tune the system by turning on CRC in the database and issuing an ALTER TABLE statement for candidate tables.  Voila. Another CRC improvement in Oracle 11.2 is that it works with DRCP connection pooling. There is some fine print about what is and isn't cached, check the Oracle manuals for details.  If you're using 11.1 or non-DRCP "dedicated servers" then make sure you use oci_pconnect() persistent connections.  Also in PHP don't bind strings in the query, although binding as SQLT_INT is OK.

    Read the article

  • SQL SERVER Subquery or Join Various Options SQL Server Engine knows the Best

    This is followup post of my earlier article SQL SERVER Convert IN to EXISTS Performance Talk, after reading all the comments I have received I felt that I could write more on the same subject to clear few things out. First let us run following four queries, all of them are giving exactly [...]...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

    Read the article

  • What does the impression and ctr means in google webmaster

    - by KoolKabin
    I am checking google webmaster tools. I entered the search queries section. There i found alot keywords and their impression and ctr etc. I clicked on one of the query keyword there it shows the keyword and position in search result, but when i go to google.com and type the specified keyword it shows no impressions too... how do i measure find my site's impression on google.com my site: http://www.trekkingandtoursnepal.com keyword: trekking nepal

    Read the article

  • Getting Started with Columnstore Index in SQL Server 2014 – Part 1

    Column Store Index, which improves performance of data warehouse queries several folds, was first introduced in SQL Server 2012. In this article series Arshad Ali talks about how you can get started with using enhanced columnstore index features in SQL Server 2014 and do some performance tests to understand the benefits. Deployment Manager 2 is now free!The new version includes tons of new features and we've launched a completely free Starter Edition! Get Deployment Manager here

    Read the article

  • Synonyms made easy

    The Custom Search team is always working to provide more relevant results, and improving user queries is a big part of that goal. We've shown you how to...

    Read the article

< Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66  | Next Page >