Merge Join component sorted outputs [SSIS]
- by jamiet
One question that I have been asked a few times of late in regard to performance tuning SSIS data flows is this:
Why isn’t the Merge Join output sorted (i.e.IsSorted=True)?
This is a fair question. After all both of the Merge Join inputs are sorted, hence why wouldn’t the output be sorted as well? Well here’s a little secret, the Merge Join output IS sorted! There’s a caveat though – it is only under certain circumstances and SSIS itself doesn’t do a good job of informing you of it.
Let’s take a look at an example. Here we have a dataflow that consumes data from the [AdventureWorks2008].[Sales].[SalesOrderHeader] & [AdventureWorks2008].[Sales].[SalesOrderDetail] tables then joins them using a Merge Join component:
Let’s take a look inside the editor of the Merge Join:
We are joining on the [SalesOrderId] field (which is what the two inputs just happen to be sorted upon). We are also putting [SalesOrderHeader].[SalesOrderId] into the output. Believe it or not the output from this Merge Join component is sorted (i.e. has IsSorted=True) but unfortunately the Merge Join component does not have an Advanced Editor hence it is hidden away from us. There are a couple of ways to prove to you that is the case; I could open up the package XML inside the .dtsx file and show you the metadata but there is an easier way than that – I can attach a Sort component to the output. Take a look:
Notice that the Sort component is attempting to sort on the [SalesOrderId] column. This gives us the following warning:
Validation warning. DFT Get raw data: {992B7C9A-35AD-47B9-A0B0-637F7DDF93EB}: The data is already sorted as specified so the transform can be removed.
The warning proves that the output from the Merge Join is sorted!
It must be noted that the Merge Join output will only have IsSorted=True if at least one of the join columns is included in the output.
So there you go, the Merge Join component can indeed produce a sorted output and that’s very useful in order to avoid unnecessary expensive Sort operations downstream. Hope this is useful to someone out there!
@Jamiet
P.S. Thank you to Bob Bojanic on the SSIS product team who pointed this out to me!