How can parallelism affect number of results?
- by spender
I have a fairly complex query that looks something like this:
create table Items(SomeOtherTableID int,SomeField int)
create table SomeOtherTable(Id int,GroupID int)
with cte1 as
(
select
SomeOtherTableID,COUNT(*) SubItemCount
from
Items t
where
t.SomeField is not null
group by
SomeOtherTableID
),cte2 as
(
select
tc.SomeOtherTableID,ROW_NUMBER() over (partition by a.GroupID order by tc.SubItemCount desc) SubItemRank
from
Items t
inner join SomeOtherTable a on a.Id=t.SomeOtherTableID
inner join cte1 tc on tc.SomeOtherTableID=t.SomeOtherTableID
where
t.SomeField is not null
),cte3 as
(
select
SomeOtherTableID
from
cte2
where
SubItemRank=1
)
select
*
from
cte3 t1
inner join cte3 t2 on t1.SomeOtherTableID<t2.SomeOtherTableID
option (maxdop 1)
The query is such that cte3 is filled with 6222 distinct results. In the final select, I am performing a cross join on cte3 with itself, (so that I can compare every value in the table with every other value in the table at a later point). Notice the final line :
option (maxdop 1)
Apparently, this switches off parallelism.
So, with 6222 results rows in cte3, I would expect (6222*6221)/2, or 19353531 results in the subsequent cross joining select, and with the final maxdop line in place, that is indeed the case.
However, when I remove the maxdop line, the number of results jumps to 19380454. I have 4 cores on my dev box.
WTF? Can anyone explain why this is? Do I need to reconsider previous queries that cross join in this way?