SQL SERVER – Introduction to PERCENTILE_DISC() – Analytic Functions Introduced in SQL Server 2012
- by pinaldave
SQL Server 2012 introduces new analytical function PERCENTILE_DISC().
The book online gives following definition of this function: Computes a specific percentile for sorted values in an entire rowset or within distinct partitions of a rowset in Microsoft SQL Server 2012 Release Candidate 0 (RC 0). For a given percentile value P, PERCENTILE_DISC sorts the values of the expression in the ORDER BY clause and returns the value with the smallest CUME_DIST value (with respect to the same sort specification) that is greater than or equal to P.
If you are clear with understanding of the function – no need to read further. If you got lost here is the same in simple words – find value of the column which is equal or more than CUME_DIST.
Before you continue reading this blog I strongly suggest you read about CUME_DIST function over here Introduction to CUME_DIST – Analytic Functions Introduced in SQL Server 2012.
Now let’s have fun following query:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS PercentileDisc
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO
The above query will give us the following result:
You can see that I have used PERCENTILE_DISC(0.5) in query, which is similar to finding median but not exactly. PERCENTILE_DISC() function takes a percentile as a passing parameters. It returns the value as answer which value is equal or great to the percentile value which is passed into the example. For example in above example we are passing 0.5 into the PERCENTILE_DISC() function. It will go through the resultset and identify which rows has values which are equal to or great than 0.5. In first example it found two rows which are equal to 0.5 and the value of ProductID of that row is the answer of PERCENTILE_DISC(). In some third windowed resultset there is only single row with the CUME_DIST() value as 1 and that is for sure higher than 0.5 making it as a answer.
To make sure that we are clear with this example properly. Here is one more example where I am passing 0.6 as a percentile.
Now let’s have fun following query:
USE AdventureWorks
GO
SELECT SalesOrderID, OrderQty, ProductID,
CUME_DIST() OVER(PARTITION BY SalesOrderID
ORDER BY ProductID ) AS CDist,
PERCENTILE_DISC(0.6) WITHIN GROUP (ORDER BY ProductID)
OVER (PARTITION BY SalesOrderID) AS PercentileDisc
FROM Sales.SalesOrderDetail
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY SalesOrderID DESC
GO
The above query will give us the following result:
The result of the PERCENTILE_DISC(0.6) is ProductID of which CUME_DIST() is more than 0.6. This means for SalesOrderID 43670 has row with CUME_DIST() 0.75 is the qualified row, resulting answer 773 for ProductID.
I hope this explanation makes it further clear.
Reference: Pinal Dave (http://blog.SQLAuthority.com)
Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Function, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology