Search Results

Search found 754 results on 31 pages for 'aggregate'.

Page 6/31 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • LINQ aggregate left join on SQL CE

    - by P Daddy
    What I need is such a simple, easy query, it blows me away how much work I've done just trying to do it in LINQ. In T-SQL, it would be: SELECT I.InvoiceID, I.CustomerID, I.Amount AS AmountInvoiced, I.Date AS InvoiceDate, ISNULL(SUM(P.Amount), 0) AS AmountPaid, I.Amount - ISNULL(SUM(P.Amount), 0) AS AmountDue FROM Invoices I LEFT JOIN Payments P ON I.InvoiceID = P.InvoiceID WHERE I.Date between @start and @end GROUP BY I.InvoiceID, I.CustomerID, I.Amount, I.Date ORDER BY AmountDue DESC The best equivalent LINQ expression I've come up with, took me much longer to do: var invoices = ( from I in Invoices where I.Date >= start && I.Date <= end join P in Payments on I.InvoiceID equals P.InvoiceID into payments select new{ I.InvoiceID, I.CustomerID, AmountInvoiced = I.Amount, InvoiceDate = I.Date, AmountPaid = ((decimal?)payments.Select(P=>P.Amount).Sum()).GetValueOrDefault(), AmountDue = I.Amount - ((decimal?)payments.Select(P=>P.Amount).Sum()).GetValueOrDefault() } ).OrderByDescending(row=>row.AmountDue); This gets an equivalent result set when run against SQL Server. Using a SQL CE database, however, changes things. The T-SQL stays almost the same. I only have to change ISNULL to COALESCE. Using the same LINQ expression, however, results in an error: There was an error parsing the query. [ Token line number = 4, Token line offset = 9,Token in error = SELECT ] So we look at the generated SQL code: SELECT [t3].[InvoiceID], [t3].[CustomerID], [t3].[Amount] AS [AmountInvoiced], [t3].[Date] AS [InvoiceDate], [t3].[value] AS [AmountPaid], [t3].[value2] AS [AmountDue] FROM ( SELECT [t0].[InvoiceID], [t0].[CustomerID], [t0].[Amount], [t0].[Date], COALESCE(( SELECT SUM([t1].[Amount]) FROM [Payments] AS [t1] WHERE [t0].[InvoiceID] = [t1].[InvoiceID] ),0) AS [value], [t0].[Amount] - (COALESCE(( SELECT SUM([t2].[Amount]) FROM [Payments] AS [t2] WHERE [t0].[InvoiceID] = [t2].[InvoiceID] ),0)) AS [value2] FROM [Invoices] AS [t0] ) AS [t3] WHERE ([t3].[Date] >= @p0) AND ([t3].[Date] <= @p1) ORDER BY [t3].[value2] DESC Ugh! Okay, so it's ugly and inefficient when run against SQL Server, but we're not supposed to care, since it's supposed to be quicker to write, and the performance difference shouldn't be that large. But it just doesn't work against SQL CE, which apparently doesn't support subqueries within the SELECT list. In fact, I've tried several different left join queries in LINQ, and they all seem to have the same problem. Even: from I in Invoices join P in Payments on I.InvoiceID equals P.InvoiceID into payments select new{I, payments} generates: SELECT [t0].[InvoiceID], [t0].[CustomerID], [t0].[Amount], [t0].[Date], [t1].[InvoiceID] AS [InvoiceID2], [t1].[Amount] AS [Amount2], [t1].[Date] AS [Date2], ( SELECT COUNT(*) FROM [Payments] AS [t2] WHERE [t0].[InvoiceID] = [t2].[InvoiceID] ) AS [value] FROM [Invoices] AS [t0] LEFT OUTER JOIN [Payments] AS [t1] ON [t0].[InvoiceID] = [t1].[InvoiceID] ORDER BY [t0].[InvoiceID] which also results in the error: There was an error parsing the query. [ Token line number = 2, Token line offset = 5,Token in error = SELECT ] So how can I do a simple left join on a SQL CE database using LINQ? Am I wasting my time?

    Read the article

  • Linqify this: Aggregate a subset of a list

    - by JMarsch
    Suppose I have a list or array of items, and I want to sum a subset of the items in the list. (in my case, it happens to always be a sequential subset). Here's the old-fashioned way: int sum = 0; for(int i = startIndex; i <= stopIndex; i++) sum += myList[i].TheValue; return sum; What's the best way to linqify that code?

    Read the article

  • Acceessing some aggregate functions in a linq datasource in a GridView

    - by Stephen Pellicer
    I am working on a traditional WebForms project. In the project I am trying out some Linq datasources with plans to eventually migrate to an MVC architecture. I am still very new to Linq. I have a GridView using a Linq datasource. The entities I am showing has a to many relationship and I would like to get the maximum value of a column in the many side of the relationship. I can show properties of the base entity in the gridview: <asp:TemplateField HeaderText="Number" SortExpression="tJobBase.tJob.JobNumber"> <ItemTemplate> <asp:Label ID="Label1" runat="server" Text='<%# Bind("tJobBase.tJob.JobNumber") %>'> </asp:Label> </ItemTemplate> </asp:TemplateField> I can also show the count of the many related property: <asp:TemplateField HeaderText="Number" SortExpression="tJobBase.tJob.tHourlies.Count"> <ItemTemplate> <asp:Label ID="Label1" runat="server" Text='<%# Bind("tJobBase.tJob.tHourlies.Count") %>'> </asp:Label> </ItemTemplate> </asp:TemplateField> Is there a way to get the max value of a column called WeekEnding in the tHourlies collection to show in the GridView?

    Read the article

  • Core Data: Multiple conditions inside relational aggregate operations

    - by Uzaak
    I have an SQLite table used by Core Data with the following elements: Name: John LastName: Foobar Age: 23 Name: Bob LastName: Baz Age: 37 Name: Peter LastName: Fooqux Age: 32 Name: John LastName: Bar Age: 29 Those are all in a to-many relationship from another object "Company". I need to query the database and retrieve all Company objects with employees called "John" but whose last name does NOT contain "Foo". I did go as far as to make the following predicate: [NSPredicate predicateWithFormat:@"ANY employee.name = 'John'"]; How do I get to filter only by companies whose Johns don't have "Foo" in their last names?

    Read the article

  • how to aggregate this data in R

    - by stevejb
    Hello, I have a data frame in R with the following structure. > testData date exch.code comm.code oi 1 1997-12-30 CBT 1 468710 2 1997-12-23 CBT 1 457165 3 1997-12-19 CBT 1 461520 4 1997-12-16 CBT 1 444190 5 1997-12-09 CBT 1 446190 6 1997-12-02 CBT 1 443085 .... 77827 2004-10-26 NYME 967 10038 77828 2004-10-19 NYME 967 9910 77829 2004-10-12 NYME 967 10195 77830 2004-09-28 NYME 967 9970 77831 2004-08-31 NYME 967 9155 77832 2004-08-24 NYME 967 8655 What I want to do is produce a table the shows for a given date and commodity the total oi across every exchange code. So, the rows would be made up of unique(testData$date) and the columns would be unique(testData$comm.code) and each cell would be the total oi over all exch.codes on a given day. Thanks,

    Read the article

  • Aggregate path counts using HierarchyID

    - by austincav
    Business problem - understand process fallout using analytics data. Here is what we have done so far: Build a dictionary table with every possible process step Find each process "start" Find the last step for each start Join dictionary table to last step to find path to final step In the final report output we end up with a list of paths for each start to each final step: User Fallout Step HierarchyID.ToString() A 1/1/1 B 1/1/1/1/1 C 1/1/1/1 D 1/1/1 E 1/1 What this means is that five users (A-E) started the process. Assume only User B finished, the other four did not. Since this is a simple example (without branching) we want the output to look as follows: Step Unique Users 1 5 2 5 3 4 4 2 5 1 The easiest solution I could think of is to take each hierarchyID.ToString(), parse that out into a set of subpaths, JOIN back to the dictionary table, and output using GROUP BY. Given the volume of data, I'd like to use the built-in HierarchyID functions, e.g. IsAncestorOf. Any ideas or thoughts how I could write this? Maybe a recursive CTE?

    Read the article

  • Linq Aggregate on object and List

    - by Kris-I
    I do this query with NHibernate: var test = _session.CreateCriteria(typeof(Estimation)) .SetFetchMode("EstimationItems", FetchMode.Eager) .List(); An "Estimation" can have several "EstimationItems" (Quantity, Price and ProductId) I'd like a list of "Estimation" with these constraints : One line by "Estimation" code on the picture (ex : 2011/0001 and 2011/0003) By estimation (means on each line) the number of "EstimationItems" By Estimation (means on each line) the total price (Quantity * Price) for each "EstimationItems" I hope the structure will be clearer with the picture below. Thanks,

    Read the article

  • How cast in XML for aggregate functions

    - by renegm
    In SQL Server 2008. I need execute a query like that: DECLARE @x AS xml SET @x=N'<r><c>First Text</c></r><r><c>Other Text</c></r>' SELECT @x.query('fn:max(r/c)') But return nothing (apparently because convert xdt:untypedAtomic to numeric) How to "cast" r/c to varchar? Something like SELECT @x.query('fn:max(«CAST(r/c «AS varchar(20))»)') Edit: Using Nodes the function MAX is from T-SQL no fn:max function In this code: DECLARE @x xml; SET @x = ''; SELECT @x.query('fn:max((1, 2))'); SELECT @x.query('fn:max(("First Text", "Other Text"))'); both query return expected: 2 and "Other Text" fn:max can evaluate string expression ad hoc. But the first query dont work. How to force string arguments to fn:max?

    Read the article

  • Returning multiple aggregate functions as rows

    - by SDLFunTimes
    I need some help formulating a select statement. I need to select the total quantity shipped for each part with a distinct color. So the result should be a row with the color name and the total. Here's my schema: create table s ( sno char(5) not null, sname char(20) not null, status smallint, city char(15), primary key (sno) ); create table p ( pno char(6) not null, pname char(20) not null, color char(6), weight smallint, city char(15), primary key (pno) ); create table sp ( sno char(5) not null, pno char(6) not null, qty integer not null, primary key (sno, pno) );

    Read the article

  • Linq order by aggregate in the select { }

    - by Joe Pitz
    Here is one I am working on: var fStep = from insp in sq.Inspections where insp.TestTimeStamp > dStartTime && insp.TestTimeStamp < dEndTime && insp.Model == "EP" && insp.TestResults != "P" group insp by new { insp.TestResults, insp.FailStep } into grp select new { FailedCount = (grp.Key.TestResults == "F" ? grp.Count() : 0), CancelCount = (grp.Key.TestResults == "C" ? grp.Count() : 0), grp.Key.TestResults, grp.Key.FailStep, PercentFailed = Convert.ToDecimal(1.0 * grp.Count() /tcount*100) } ; I would like to orderby one or more of the fields in the select projection.

    Read the article

  • Using reshape + cast to aggregate over multiple columns

    - by DamonJW
    In R, I have a data frame with columns for Seat (factor), Party (factor) and Votes (numeric). I want to create a summary data frame with columns for Seat, Winning party, and Vote share. For example, from the data frame df <- data.frame(party=rep(c('Lab','C','LD'),times=4), votes=c(1,12,2,11,3,10,4,9,5,8,6,15), seat=rep(c('A','B','C','D'),each=3)) I want to get the output seat winner voteshare 1 A C 0.8000000 2 B Lab 0.4583333 3 C C 0.5000000 4 D LD 0.5172414 I can figure out how to achieve this. But I'm sure there must be a better way, probably a cunning one-liner using Hadley Wickham's reshape package. Any suggestions? For what it's worth, my solution uses a function from my package djwutils_2.10.zip and is invoked as follows. But there are all sorts of special cases it doesn't deal with, so I'd rather rely on someone else's code. aggregateList(df, by=list(seat=seat), FUN=list(winner=function(x) x$party[which.max(x$votes)], voteshare=function(x) max(x$votes)/sum(x$votes)))

    Read the article

  • SQL aggregate query question

    - by Phil
    Hi, Can anyone help me with a SQL query in Apache Derby SQL to get a "simple" count. Given a table ABC that looks like this... id a b c 1 1 1 1 2 1 1 2 3 2 1 3 4 2 1 1 ** 5 2 1 2 ** ** 6 2 2 1 ** 7 3 1 2 8 3 1 3 9 3 1 1 How can I write a query to get a count of how may distinct values of 'a' have both (b=1 and c=2) AND (b=2 and c=1) to get the correct result of 1. (the two rows marked match the criteria and both have a value of a=2, there is only 1 distinct value of a in this table that match the criteria) The tricky bit is that (b=1 and c=2) AND (b=2 and c=1) are obviously mutually exclusive when applied to a single row. .. so how do I apply that expression across multiple rows of distinct values for a? These queries are wrong but to illustrate what I'm trying to do... "SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 AND b=2 AND c=1 ..." .. (0) no go as mutually exclusive "SELECT DISTINCT COUNT(a) WHERE b=1 AND c=2 OR b=2 AND c=1 ..." .. (3) gets me the wrong result. SELECT COUNT(a) (CASE WHEN b=1 AND c=10 THEN 1 END) FROM ABC WHERE b=2 AND c=1 .. (0) no go as mutually exclusive Cheers, Phil.

    Read the article

  • Need help optimizing this Django aggregate query

    - by Chris Lawlor
    I have the following model class Plugin(models.Model): name = models.CharField(max_length=50) # more fields which represents a plugin that can be downloaded from my site. To track downloads, I have class Download(models.Model): plugin = models.ForiegnKey(Plugin) timestamp = models.DateTimeField(auto_now=True) So to build a view showing plugins sorted by downloads, I have the following query: # pbd is plugins by download - commented here to prevent scrolling pbd = Plugin.objects.annotate(dl_total=Count('download')).order_by('-dl_total') Which works, but is very slow. With only 1,000 plugins, the avg. response is 3.6 - 3.9 seconds (devserver with local PostgreSQL db), where a similar view with a much simpler query (sorting by plugin release date) takes 160 ms or so. I'm looking for suggestions on how to optimize this query. I'd really prefer that the query return Plugin objects (as opposed to using values) since I'm sharing the same template for the other views (Plugins by rating, Plugins by release date, etc.), so the template is expecting Plugin objects - plus I'm not sure how I would get things like the absolute_url without a reference to the plugin object. Or, is my whole approach doomed to failure? Is there a better way to track downloads? I ultimately want to provide users some nice download statistics for the plugins they've uploaded - like downloads per day/week/month. Will I have to calculate and cache Downloads at some point? EDIT: In my test dataset, there are somewhere between 10-20 Download instances per Plugin - in production I expect this number would be much higher for many of the plugins.

    Read the article

  • need help in aggregate select

    - by eugeneK
    Hi, i have a problem with selecting some values from my DB. DB is in design stages so i can redesign it a bit of needed. You can see the Diagram on this image Basically what i want to select is select c.campaignID, ct.campaignTypeName, c.campaignName, c.campaignDailyBudget, c.campaignTotalBudget, c.campaignCPC, c.date, cs.campaignStatusName ***impressions, ***clicks, ***cast(campaignTotalBudget-(clicks*campaignCPC) as decimal(18,1)) as remainingFunds from Campaigns as c left join CampaignTypes as ct on c.campaignTypeID=ct.campaignTypeID left join CampaignStatuses as cs on c.campaignStatusID=cs.campaignStatusID left join CampaignVariants as cv on c.campaignID=cv.campaignID left join CampaignVariants2Visitors as c2v on cv.campaignVariantID=c2v.campaignVariantID left join Visitors as v on c2v.visitorID=v.visitorID ..... order by c.campaignID desc Problem is that Visitors table has column named isClick so i don't know the way to separate what is impression with isClick=false and what is click isClick=true so i can show nice form with all the stuff about campaign and visitors... I don't think to split Visitors to two tables like Impressions and Click is a good idea because again i would need to have Visitors with two more tables thanks

    Read the article

  • T-SQL Self Join in combination with aggregate function

    - by Nick
    Hi, i have the following table. CREATE TABLE [dbo].[Tree]( [AutoID] [int] IDENTITY(1,1) NOT NULL, [Category] [varchar](10) NULL, [Condition] [varchar](10) NULL, [Description] [varchar](50) NULL, CONSTRAINT [PK_Tree] PRIMARY KEY CLUSTERED ( [AutoID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO the data looks like this: INSERT INTO [Test].[dbo].[Tree] ([Category] ,[Condition] ,[Description]) VALUES ('1','Alpha','Type 1') INSERT INTO [Test].[dbo].[Tree] ([Category] ,[Condition] ,[Description]) VALUES ('1','Alpha','Type 1') INSERT INTO [Test].[dbo].[Tree] ([Category] ,[Condition] ,[Description]) VALUES ('2','Alpha','Type 2') INSERT INTO [Test].[dbo].[Tree] ([Category] ,[Condition] ,[Description]) VALUES ('2','Alpha','Type 2') go I try now to do the following: SELECT Category,COUNT(*) as CategoryCount FROM Tree where Condition = 'Alpha' group by Category but i wish also to get the Description for each Element. I tried several subqueries, self joins etc. i always come to the problem that the subquery cannot return more than one record. The problem is caused by a poor database design which i cannot change and i run out of ideas to get this done in a single query ;-(

    Read the article

  • Elegant way to aggregate multi-dimensional array by index key

    - by Stephen J. Fuhry
    How can I recursively find the total value of all children of an array that looks something like this? [0] => Array ( [value] => ? // 8590.25 + 200.5 + 22.4 [children] => Array ( [0] => Array ( [value] => ? // 8590.25 + 200.5 [children] => Array ( [0] => Array ( [value] => 8590.25 // leaf node ) [1] => Array ( [value] => 200.05 // leaf node ) ) ) [1] => Array ( [value] => 22.4 // leaf node ) ) )

    Read the article

  • Elegant way to aggregate multi-demensional array by index key

    - by Stephen J. Fuhry
    How can I recursively find the total value of all children of an array that looks something like this? [0] => Array ( [value] => ? // 8590.25 + 200.5 + 22.4 [children] => Array ( [0] => Array ( [value] => ? // 8590.25 + 200.5 [children] => Array ( [0] => Array ( [value] => 8590.25 // leaf node ) [1] => Array ( [value] => 200.05 // leaf node ) ) ) [1] => Array ( [value] => 22.4 // leaf node ) ) )

    Read the article

  • Aggregate SQL column values by time period

    - by user305688
    I have some numerical data that comes in every 5 minutes (i.e. 288 values per day, and quite a few days worth of data). I need to write a query that can return the sums of all values for each day. So currently the table looks like this: 03/30/2010 00:01:00 -- 553 03/30/2010 00:06:00 -- 558 03/30/2010 00:11:00 -- 565 03/30/2010 00:16:00 -- 565 03/30/2010 00:21:00 -- 558 03/30/2010 00:26:00 -- 566 03/30/2010 00:31:00 -- 553 ... And this goes on for 'x' number of days, I'd like the query to return 'x' number of rows, each of which containing the sum of all the values on each day. Something like this: 03/30/2010 -- <sum> 03/31/2010 -- <sum> 04/01/2010 -- <sum> The query will go inside a Dundas webpart, so unfortunately I can't write custom user functions to assist it. All the logic needs to be in just the one big query. Any help would be appreciated, thanks. I'm trying to get it to work using GROUP BY and DATEPART at the moment, not sure if it's the right way to go about it.

    Read the article

  • how to aggregate information on UIImage?

    - by user1582281
    I want to draw on each drawing cycle 1000 more lines on my UIIMage, right now I do it by : -(void)drawRect { for(int i=0;i<1000;i++) { UIGraphicsBeginImageContext(myImage.size); code to draw line on current context... draw previous info from myImage: [myImage drawInRect:myRect]; //store info from context back to myImage myImage=UIGraphicsGetImageFromCurrentImageContext(); UIGraphicsEndImageContext(); } //append the image on the right side of current context: [myImage drawInRect:myRightRect]; } problem is that I think that drawing entire image each time just for the few lines added is very expensive, anyone has any idea how to optimize it?

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

  • Fun with Aggregates

    - by Paul White
    There are interesting things to be learned from even the simplest queries.  For example, imagine you are given the task of writing a query to list AdventureWorks product names where the product has at least one entry in the transaction history table, but fewer than ten. One possible query to meet that specification is: SELECT p.Name FROM Production.Product AS p JOIN Production.TransactionHistory AS th ON p.ProductID = th.ProductID GROUP BY p.ProductID, p.Name HAVING COUNT_BIG(*) < 10; That query correctly returns 23 rows (execution plan and data sample shown below): The execution plan looks a bit different from the written form of the query: the base tables are accessed in reverse order, and the aggregation is performed before the join.  The general idea is to read all rows from the history table, compute the count of rows grouped by ProductID, merge join the results to the Product table on ProductID, and finally filter to only return rows where the count is less than ten. This ‘fully-optimized’ plan has an estimated cost of around 0.33 units.  The reason for the quote marks there is that this plan is not quite as optimal as it could be – surely it would make sense to push the Filter down past the join too?  To answer that, let’s look at some other ways to formulate this query.  This being SQL, there are any number of ways to write logically-equivalent query specifications, so we’ll just look at a couple of interesting ones.  The first query is an attempt to reverse-engineer T-SQL from the optimized query plan shown above.  It joins the result of pre-aggregating the history table to the Product table before filtering: SELECT p.Name FROM ( SELECT th.ProductID, cnt = COUNT_BIG(*) FROM Production.TransactionHistory AS th GROUP BY th.ProductID ) AS q1 JOIN Production.Product AS p ON p.ProductID = q1.ProductID WHERE q1.cnt < 10; Perhaps a little surprisingly, we get a slightly different execution plan: The results are the same (23 rows) but this time the Filter is pushed below the join!  The optimizer chooses nested loops for the join, because the cardinality estimate for rows passing the Filter is a bit low (estimate 1 versus 23 actual), though you can force a merge join with a hint and the Filter still appears below the join.  In yet another variation, the < 10 predicate can be ‘manually pushed’ by specifying it in a HAVING clause in the “q1” sub-query instead of in the WHERE clause as written above. The reason this predicate can be pushed past the join in this query form, but not in the original formulation is simply an optimizer limitation – it does make efforts (primarily during the simplification phase) to encourage logically-equivalent query specifications to produce the same execution plan, but the implementation is not completely comprehensive. Moving on to a second example, the following query specification results from phrasing the requirement as “list the products where there exists fewer than ten correlated rows in the history table”: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) < 10 ); Unfortunately, this query produces an incorrect result (86 rows): The problem is that it lists products with no history rows, though the reasons are interesting.  The COUNT_BIG(*) in the EXISTS clause is a scalar aggregate (meaning there is no GROUP BY clause) and scalar aggregates always produce a value, even when the input is an empty set.  In the case of the COUNT aggregate, the result of aggregating the empty set is zero (the other standard aggregates produce a NULL).  To make the point really clear, let’s look at product 709, which happens to be one for which no history rows exist: -- Scalar aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709;   -- Vector aggregate SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = 709 GROUP BY th.ProductID; The estimated execution plans for these two statements are almost identical: You might expect the Stream Aggregate to have a Group By for the second statement, but this is not the case.  The query includes an equality comparison to a constant value (709), so all qualified rows are guaranteed to have the same value for ProductID and the Group By is optimized away. In fact there are some minor differences between the two plans (the first is auto-parameterized and qualifies for trivial plan, whereas the second is not auto-parameterized and requires cost-based optimization), but there is nothing to indicate that one is a scalar aggregate and the other is a vector aggregate.  This is something I would like to see exposed in show plan so I suggested it on Connect.  Anyway, the results of running the two queries show the difference at runtime: The scalar aggregate (no GROUP BY) returns a result of zero, whereas the vector aggregate (with a GROUP BY clause) returns nothing at all.  Returning to our EXISTS query, we could ‘fix’ it by changing the HAVING clause to reject rows where the scalar aggregate returns zero: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID HAVING COUNT_BIG(*) BETWEEN 1 AND 9 ); The query now returns the correct 23 rows: Unfortunately, the execution plan is less efficient now – it has an estimated cost of 0.78 compared to 0.33 for the earlier plans.  Let’s try adding a redundant GROUP BY instead of changing the HAVING clause: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY th.ProductID HAVING COUNT_BIG(*) < 10 ); Not only do we now get correct results (23 rows), this is the execution plan: I like to compare that plan to quantum physics: if you don’t find it shocking, you haven’t understood it properly :)  The simple addition of a redundant GROUP BY has resulted in the EXISTS form of the query being transformed into exactly the same optimal plan we found earlier.  What’s more, in SQL Server 2008 and later, we can replace the odd-looking GROUP BY with an explicit GROUP BY on the empty set: SELECT p.Name FROM Production.Product AS p WHERE EXISTS ( SELECT * FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ); I offer that as an alternative because some people find it more intuitive (and it perhaps has more geek value too).  Whichever way you prefer, it’s rather satisfying to note that the result of the sub-query does not exist for a particular correlated value where a vector aggregate is used (the scalar COUNT aggregate always returns a value, even if zero, so it always ‘EXISTS’ regardless which ProductID is logically being evaluated). The following query forms also produce the optimal plan and correct results, so long as a vector aggregate is used (you can probably find more equivalent query forms): WHERE Clause SELECT p.Name FROM Production.Product AS p WHERE ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) < 10; APPLY SELECT p.Name FROM Production.Product AS p CROSS APPLY ( SELECT NULL FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () HAVING COUNT_BIG(*) < 10 ) AS ca (dummy); FROM Clause SELECT q1.Name FROM ( SELECT p.Name, cnt = ( SELECT COUNT_BIG(*) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID GROUP BY () ) FROM Production.Product AS p ) AS q1 WHERE q1.cnt < 10; This last example uses SUM(1) instead of COUNT and does not require a vector aggregate…you should be able to work out why :) SELECT q.Name FROM ( SELECT p.Name, cnt = ( SELECT SUM(1) FROM Production.TransactionHistory AS th WHERE th.ProductID = p.ProductID ) FROM Production.Product AS p ) AS q WHERE q.cnt < 10; The semantics of SQL aggregates are rather odd in places.  It definitely pays to get to know the rules, and to be careful to check whether your queries are using scalar or vector aggregates.  As we have seen, query plans do not show in which ‘mode’ an aggregate is running and getting it wrong can cause poor performance, wrong results, or both. © 2012 Paul White Twitter: @SQL_Kiwi email: [email protected]

    Read the article

  • postgresql weighted average?

    - by milovanderlinden
    say I have a postgresql table with the following values: id | value ---------- 1 | 4 2 | 8 3 | 100 4 | 5 5 | 7 If I use postgresql to calculate the average, it gives me an average of 24.8 because the high value of 100 has great impact on the calculation. While in fact I would like to find an average somewhere around 6 and eliminate the extreme(s). I am looking for a way to eliminate extremes and want to do this "statistically correct". The extreme's cannot be fixed. I cannot say; If a value is over X, it has to be eliminated. I have been bending my head on the postgresql aggregate functions but cannot put my finger on what is right for me to use. Any suggestions?

    Read the article

  • How to do Linq aggregates when there might be an empty set?

    - by Shaul
    I have a Linq collection of Things, where Thing has an Amount (decimal) property. I'm trying to do an aggregate on this for a certain subset of Things: var total = myThings.Sum(t => t.Amount); and that works nicely. But then I added a condition that left me with no Things in the result: var total = myThings.Where(t => t.OtherProperty == 123).Sum(t => t.Amount); And instead of getting total = 0 or null, I get an error: System.InvalidOperationException: The null value cannot be assigned to a member with type System.Decimal which is a non-nullable value type. That is really nasty, because I didn't expect that behavior. I would have expected total to be zero, maybe null - but certainly not to throw an exception! What am I doing wrong? What's the workaround/fix?

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >