I am curious on how exactly LINQ (not LINQ to SQL) is performing is joins behind the scenes in relation to how Sql Server performs joins.
Sql Server before executing a query, generates an Execution Plan. The Execution Plan is basically an Expression Tree on what it believes is the best way to execute the query. Each node provides information on whether to do a Sort, Scan, Select, Join, ect.
On a 'Join' node in our execution plan, we can see three possible algorithms; Hash Join, Merge Join, and Nested Loops Join. Sql Server will choose which algorithm to for each Join operation based on expected number of rows in Inner and Outer tables, what type of join we are doing (some algorithms don't support all types of joins), whether we need data ordered, and probably many other factors.
Join Algorithms:
Nested Loop Join:
Best for small inputs, can be optimized with ordered inner table.
Merge Join:
Best for medium to large inputs sorted inputs, or an output that needs to be ordered.
Hash Join:
Best for medium to large inputs, can be parallelized to scale linearly.
LINQ Query:
DataTable firstTable, secondTable;
...
var rows = from firstRow in firstTable.AsEnumerable ()
join secondRow in secondTable.AsEnumerable ()
on firstRow.Field<object> (randomObject.Property)
equals secondRow.Field<object> (randomObject.Property)
select new {firstRow, secondRow};
SQL Query:
SELECT *
FROM firstTable fT
INNER JOIN secondTable sT ON fT.Property = sT.Property
Sql Server might use a Nested Loop Join if it knows there are a small number of rows from each table, a merge join if it knows one of the tables has an index, and Hash join if it knows there are a lot of rows on either table and neither has an index.
Does Linq choose its algorithm for joins? or does it always use one?