(At first glance this may look like a duplicate of http://stackoverflow.com/questions/421275 or http://stackoverflow.com/questions/414336, but my actual question is a bit different)
Alright, this one's had me stumped for a few hours. My example here is ridiculously abstracted, so I doubt it will be possible to recreate locally, but it provides context for my question (Also, I'm running SQL Server 2005).
I have a stored procedure with basically two steps, constructing a temp table, populating it with very few rows, and then querying a very large table joining against that temp table. It has multiple parameters, but the most relevant is a datetime "@MinDate." Essentially:
create table #smallTable (ID int)
insert into #smallTable
select (a very small number of rows from some other table)
select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @MinDate
If I just execute this as a normal query, by declaring @MinDate as a local variable and running that, it produces an optimal execution plan that executes very quickly (first joins on #smallTable and then only considers a very small subset of rows from aGiantTable while doing other operations). It seems to realize that #smallTable is tiny, so it would be efficient to start with it. This is good.
However, if I make that a stored procedure with @MinDate as a parameter, it produces a completely inefficient execution plan. (I am recompiling it each time, so it's not a bad cached plan...at least, I sure hope it's not)
But here's where it gets weird. If I change the proc to the following:
declare @LocalMinDate datetime
set @LocalMinDate = @MinDate --where @MinDate is still a parameter
create table #smallTable (ID int)
insert into #smallTable
select (a very small number of rows from some other table)
select * from aGiantTable
inner join #smallTable on #smallTable.ID = aGiantTable.ID
inner join anotherTable on anotherTable.GiantID = aGiantTable.ID
where aGiantTable.SomeDateField > @LocalMinDate
Then it gives me the efficient plan!
So my theory is this: when executing as a plain query (not as a stored procedure), it waits to construct the execution plan for the expensive query until the last minute, so the query optimizer knows that #smallTable is small and uses that information to give the efficient plan.
But when executing as a stored procedure, it creates the entire execution plan at once, thus it can't use this bit of information to optimize the plan.
But why does using the locally declared variables change this? Why does that delay the creation of the execution plan? Is that actually what's happening? If so, is there a way to force delayed compilation (if that indeed is what's going on here) even when not using local variables in this way?
More generally, does anyone have sources on when the execution plan is created for each step of a stored procedure? Googling hasn't provided any helpful information, but I don't think I'm looking for the right thing. Or is my theory just completely unfounded?
Edit: Since posting, I've learned of parameter sniffing, and I assume this is what's causing the execution plan to compile prematurely (unless stored procedures indeed compile all at once), so my question remains -- can you force the delay? Or disable the sniffing entirely?
The question is academic, since I can force a more efficient plan by replacing the select * from aGiantTable with
select * from (select * from aGiantTable where ID in (select ID from #smallTable)) as aGiantTable
Or just sucking it up and masking the parameters, but still, this inconsistency has me pretty curious.