Challenge 19 – An Explanation of a Query
- by Dave Ballantyne
I have received a number of requests for an explanation of my winning query of TSQL Challenge 19. This involved traversing a hierarchy of employees and rolling a count of orders from subordinates up to superiors. The first concept I shall address is the hierarchyId , which is constructed within the CTE called cteTree. cteTree is a recursive cte that will expand the parent-child hierarchy of the personnel in the table @emp. One useful feature with a recursive cte is that data can be ‘passed’ from the parent to the child data. The hierarchyId column is similar to the hierarchyId data type that was introduced in SQL Server 2008 and represents the position of the person within the organisation. Let us start with a simplistic example Albert manages Bob and Eddie. Bob manages Carl and Dave. The hierarchyId will represent each person’s position in this relationship in a single field. In this simple example we could append the userID together into a varchar field as detailed below. This will enable us to select a branch of the tree by filtering using Where hierarchyId ‘1,2%’ to select Bob and all his subordinates. Naturally, this is not comprehensive enough to provide a full solution, but as opposed to concatenating the Id’s together into a varchar datatyped column, we can apply the same theory to a varbinary. By CASTing the ID’s into a datatype of varbinary(4) ,4 is used as 4 bytes of data are used to store an integer and building a hierarchyId from those. For example: The important point to bear in mind for later in the query is that the binary data generated is 'byte order comparable'. ie We can ORDER a dataset with it and the resulting data, will be in the order required. Now, would probably be a good time to download the example file and, after the cte ‘cteTree’, uncomment the line ‘select * from cteTree’. Mark this and all prior code and execute. This will show you how this theory directly relates to the actual challenge data. The only deviation from the above, is that instead of using the ID of an employee, I have used the row_number() ranking function to order each level by LastName,Firstname. This enables me to order by the HierarchyId in the final result set so that the result set is in the required order. Your output should be something like the below. Notice also the ‘Level’ Column that contains the depth that the employee is within the tree. I would encourage you to ‘play’ with the query, change the order in the row_number() or the length of the cast in the hierarchyId to see how that effects the outcome. The next cte, ‘cteTreeWithOrderCount’, is a join between cteTree and the @ord table, and COUNT’s the number of orders per employee. A LEFT JOIN is employed here to account for the occasion where an employee has made no sales. Executing a ‘Select * from cteTreeWithOrderCount’ will return the result set as below. The order here is unimportant as this is only a staging point of the data and only the final result set in a cte chain needs an Order by clause, unless TOP is utilised. cteExplode joins the above result set to the tally table (Nums) for Level Occurances. So, if level is 2 then 2 rows are required. This is done to expand the dataset, to create a new column (PathInc), which is the (n+1) integers contained within the heirarchyid. For example, with the data for Robert King as given above, the below 3 rows will be returned. From this you can see that the pathinc column now contains the values for Andrew Fuller and Steven Buchanan who are Robert King’s superiors within the tree. Finally cteSumUp, sums the orders for each person and their subordinates using the PathInc generated above, and the final select does the final simple mathematics and filters to restrict the result set to only the ‘original’ row per employee.