Using Recursive SQL and XML trick to PIVOT(OK, concat) a "Document Folder Structure Relationship" table, works like MySQL GROUP_CONCAT
- by Kevin Shyr
I'm in the process of building out a Data Warehouse and encountered this issue along the way.In the environment, there is a table that stores all the folders with the individual level. For example, if a document is created here:{App Path}\Level 1\Level 2\Level 3\{document}, then the DocumentFolder table would look like this:IDID_ParentFolderName1NULLLevel 121Level 232Level 3To my understanding, the table was built so that:Each proposal can have multiple documents stored at various locationsDifferent users working on the proposal will have different access level to the folder; if one user is assigned access to a folder level, she/he can see all the sub folders and their content.Now we understand from an application point of view why this table was built this way. But you can quickly see the pain this causes the report writer to show a document link on the report. I wasn't surprised to find the report query had 5 self outer joins, which is at the mercy of nobody creating a document that is buried 6 levels deep, and not to mention the degradation in performance.With the help of 2 posts (at the end of this post), I was able to come up with this solution:Use recursive SQL to build out the folder pathUse SQL XML trick to concat the strings.Code (a reminder, I built this code in a stored procedure. If you copy the syntax into a simple query window and execute, you'll get an incorrect syntax error)
Normal
0
false
false
false
EN-US
X-NONE
X-NONE
MicrosoftInternetExplorer4
/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin-top:0in;
mso-para-margin-right:0in;
mso-para-margin-bottom:10.0pt;
mso-para-margin-left:0in;
line-height:115%;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;
mso-bidi-font-family:"Times New Roman";
mso-bidi-theme-font:minor-bidi;}
-- Get all
folders and group them by the original DocumentFolderID in PTSDocument table;WITH
DocFoldersByDocFolderID(PTSDocumentFolderID_Original, PTSDocumentFolderID_Parent,
sDocumentFolder, nLevel)AS (-- first member SELECT 'PTSDocumentFolderID_Original'
= d1.PTSDocumentFolderID ,
PTSDocumentFolderID_Parent , 'sDocumentFolder' = sName , 'nLevel' = CONVERT(INT, 1000000) FROM (SELECT DISTINCT
PTSDocumentFolderID FROM dbo.PTSDocument_DY WITH(READPAST) ) AS d1 INNER JOIN dbo.PTSDocumentFolder_DY
AS df1 WITH(READPAST) ON d1.PTSDocumentFolderID =
df1.PTSDocumentFolderID UNION ALL -- recursive SELECT ddf1.PTSDocumentFolderID_Original , df1.PTSDocumentFolderID_Parent , 'sDocumentFolder' = df1.sName , 'nLevel' = ddf1.nLevel - 1 FROM dbo.PTSDocumentFolder_DY AS df1 WITH(READPAST) INNER JOIN DocFoldersByDocFolderID AS
ddf1 ON df1.PTSDocumentFolderID =
ddf1.PTSDocumentFolderID_Parent)-- Flatten out
folder path,
DocFolderSingleByDocFolderID(PTSDocumentFolderID_Original, sDocumentFolder)AS (SELECT dfbdf.PTSDocumentFolderID_Original , 'sDocumentFolder' = STUFF((SELECT '\' + sDocumentFolder FROM DocFoldersByDocFolderID WHERE (PTSDocumentFolderID_Original
= dfbdf.PTSDocumentFolderID_Original) ORDER BY
PTSDocumentFolderID_Original, nLevel FOR XML PATH ('')),1,1,'') FROM DocFoldersByDocFolderID AS dfbdf GROUP BY
dfbdf.PTSDocumentFolderID_Original)
And voila, I use the second CTE to join back to my original query (which is now a CTE for Source as we can now use MERGE to do INSERT and UPDATE at the same time).Each part of this solution would not solve the problem by itself because:If I don't use recursion, I cannot build out the path properly. If I use the XML trick only, then I don't have the originating folder ID info that I need to link to the document.If I don't use the XML trick, then I don't have one row per document to show in the report.I could conceivably do this in the report function, but I'd rather not deal with the beginning or ending backslash and how to attach the document name.PIVOT doesn't do strings and UNPIVOT runs into the same problem as the above.I'm excited that each version of SQL server provides us new tools to solve old problems and/or enables us to solve problems in a more elegant wayThe 2 posts that helped me along:Recursive Queries Using Common Table ExpressionHow to use GROUP BY to concatenate strings in SQL server?