Performing Aggregate Functions on Multi-Million Row Tables

Posted by Daniel Short on Stack Overflow See other posts from Stack Overflow or by Daniel Short
Published on 2010-05-12T16:26:55Z Indexed on 2010/05/12 16:54 UTC
Read the original article Hit count: 338

I'm having some serious performance issues with a multi-million row table that I feel I should be able to get results from fairly quick. Here's a run down of what I have, how I'm querying it, and how long it's taking:

  • I'm running SQL Server 2008 Standard, so Partitioning isn't currently an option

  • I'm attempting to aggregate all views for all inventory for a specific account over the last 30 days.

  • All views are stored in the following table:

CREATE TABLE [dbo].[LogInvSearches_Daily](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [Inv_ID] [int] NOT NULL,
    [Site_ID] [int] NOT NULL,
    [LogCount] [int] NOT NULL,
    [LogDay] [smalldatetime] NOT NULL,
 CONSTRAINT [PK_LogInvSearches_Daily] PRIMARY KEY CLUSTERED 
(
    [ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]
  • This table has 132,000,000 records, and is over 4 gigs.

  • A sample of 10 rows from the table:

ID                   Inv_ID      Site_ID     LogCount    LogDay
-------------------- ----------- ----------- ----------- -----------------------
1                    486752      48          14          2009-07-21 00:00:00
2                    119314      51          16          2009-07-21 00:00:00
3                    313678      48          25          2009-07-21 00:00:00
4                    298863      0           1           2009-07-21 00:00:00
5                    119996      0           2           2009-07-21 00:00:00
6                    463777      534         7           2009-07-21 00:00:00
7                    339976      503         2           2009-07-21 00:00:00
8                    333501      570         4           2009-07-21 00:00:00
9                    453955      0           12          2009-07-21 00:00:00
10                   443291      0           4           2009-07-21 00:00:00

(10 row(s) affected)
  • I have the following index on LogInvSearches_Daily:
/****** Object:  Index [IX_LogInvSearches_Daily_LogDay]    Script Date: 05/12/2010 11:08:22 ******/
CREATE NONCLUSTERED INDEX [IX_LogInvSearches_Daily_LogDay] ON [dbo].[LogInvSearches_Daily] 
(
    [LogDay] ASC
)
INCLUDE ( [Inv_ID],
[LogCount]) WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
  • I need to pull inventory only from the Inventory for a specific account id. I have an index on the Inventory as well.

I'm using the following query to aggregate the data and give me the top 5 records. This query is currently taking 24 seconds to return the 5 rows:

StmtText
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SELECT TOP 5
    Sum(LogCount) AS Views
    , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank
    , Inv_ID
FROM LogInvSearches_Daily D (NOLOCK)
WHERE 
    LogDay > DateAdd(d, -30, getdate())
    AND EXISTS(
        SELECT NULL FROM propertyControlCenter.dbo.Inventory (NOLOCK) WHERE Acct_ID = 18731 AND Inv_ID = D.Inv_ID
    )
GROUP BY Inv_ID


(1 row(s) affected)

StmtText
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  |--Top(TOP EXPRESSION:((5)))
       |--Sequence Project(DEFINE:([Expr1007]=dense_rank))
            |--Segment
                 |--Segment
                      |--Sort(ORDER BY:([Expr1006] DESC, [D].[Inv_ID] DESC))
                           |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1006]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount])))
                                |--Sort(ORDER BY:([D].[Inv_ID] ASC))
                                     |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID]))
                                          |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1011], [Expr1012], [Expr1010]))
                                          |    |--Compute Scalar(DEFINE:(([Expr1011],[Expr1012],[Expr1010])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6))))
                                          |    |    |--Constant Scan
                                          |    |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1011] AND [D].[LogDay] < [Expr1012]) ORDERED FORWARD)
                                          |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID]), SEEK:([propertyControlCenter].[dbo].[Inventory].[Acct_ID]=(18731) AND [propertyControlCenter].[dbo].[Inventory].[Inv_ID]=[LOA

(13 row(s) affected)

I tried using a CTE to pick up the rows first and aggregate them, but that didn't run any faster, and gives me essentially the same execution plan.


(1 row(s) affected)
StmtText
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--SET SHOWPLAN_TEXT ON;
WITH getSearches AS (
        SELECT
            LogCount
--          , DENSE_RANK() OVER(ORDER BY Sum(LogCount) DESC, Inv_ID DESC) AS Rank
            , D.Inv_ID
        FROM LogInvSearches_Daily D (NOLOCK)
            INNER JOIN propertyControlCenter.dbo.Inventory I (NOLOCK) ON Acct_ID = 18731 AND I.Inv_ID = D.Inv_ID
        WHERE 
            LogDay > DateAdd(d, -30, getdate())
--      GROUP BY Inv_ID
)

SELECT Sum(LogCount) AS Views, Inv_ID
FROM getSearches
GROUP BY Inv_ID


(1 row(s) affected)

StmtText
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  |--Stream Aggregate(GROUP BY:([D].[Inv_ID]) DEFINE:([Expr1004]=SUM([LOALogs].[dbo].[LogInvSearches_Daily].[LogCount] as [D].[LogCount])))
       |--Sort(ORDER BY:([D].[Inv_ID] ASC))
            |--Nested Loops(Inner Join, OUTER REFERENCES:([D].[Inv_ID]))
                 |--Nested Loops(Inner Join, OUTER REFERENCES:([Expr1008], [Expr1009], [Expr1007]))
                 |    |--Compute Scalar(DEFINE:(([Expr1008],[Expr1009],[Expr1007])=GetRangeWithMismatchedTypes(dateadd(day,(-30),getdate()),NULL,(6))))
                 |    |    |--Constant Scan
                 |    |--Index Seek(OBJECT:([LOALogs].[dbo].[LogInvSearches_Daily].[IX_LogInvSearches_Daily_LogDay] AS [D]), SEEK:([D].[LogDay] > [Expr1008] AND [D].[LogDay] < [Expr1009]) ORDERED FORWARD)
                 |--Index Seek(OBJECT:([propertyControlCenter].[dbo].[Inventory].[IX_Inventory_Acct_ID] AS [I]), SEEK:([I].[Acct_ID]=(18731) AND [I].[Inv_ID]=[LOALogs].[dbo].[LogInvSearches_Daily].[Inv_ID] as [D].[Inv_ID]) ORDERED FORWARD)

(8 row(s) affected)


(1 row(s) affected)

So given that I'm getting good Index Seeks in my execution plan, what can I do to get this running faster?

Thanks,

Dan

© Stack Overflow or respective owner

Related posts about sql-server-2008

Related posts about aggregate-functions