Loading the last related record instantly for multiple parent records using Entity framework
- by Guillaume Schuermans
Does anyone know a good approach using Entity Framework for the problem described below?
I am trying for our next release to come up with a performant way to show the placed orders for the logged on customer.
Of course paging is always a good technique to use when a lot of data is available I would like to see an answer without any paging techniques.
Here's the story: a customer places an order which gets an orderstatus = PENDING. Depending on some strategy we move that order up the chain in order to get it APPROVED.
Every change of status is logged so we can see a trace for statusses and maybe even an extra line of comment per status which can provide some extra valuable information to whoever sees this order in an interface.
So an Order is linked to a Customer. One order can have multiple orderstatusses stored in OrderStatusHistory.
In my testscenario I am using a customer which has 100+ Orders each with about 5 records in the OrderStatusHistory-table.
I would for now like to see all orders in one page not using paging where for each Order I show the last relevant Status and the extra comment (if there is any for this last status; both fields coming from OrderStatusHistory; the record with the highest Id for the given OrderId).
There are multiple scenarios I have tried, but I would like to see any potential other solutions or comments on the things I have already tried.
Trying to do Include() when getting Orders but this still results in multiple queries launched on the database. Each order triggers an extra query to the database to get all orderstatusses in the history table. So all statusses are queried here instead of just returning the last relevant one, plus 100 extra queries are launched for 100 orders. You can imagine the problem when there are 100000+ orders in the database.
Having 2 computed columns on the database: LastStatus, LastStatusInformation and a regular Linq-Query which gets those columns which are available through the Entity-model.
The problem with this approach is the fact that those computed columns are determined using a scalar function which can not be changed without removing the formula from the computed column, etc...
In the end I am very familiar with SQL and Stored procedures, but since the rest of the data-layer uses Entity Framework I would like to stick to it as long as possible, even though I have my doubts about performance.
Using the SQL approach I would write something like this:
WITH cte (RN, OrderId, [Status], Information)
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY OrderId ORDER BY Id DESC), OrderId, [Status], Information
FROM OrderStatus
)
SELECT o.Id, cte.[Status], cte.Information AS StatusInformation, o.* FROM [Order] o
INNER JOIN cte ON o.Id = cte.OrderId AND cte.RN = 1
WHERE CustomerId = @CustomerId
ORDER BY 1 DESC;
which returns all orders for the customer with the statusinformation provided by the Common Table Expression.
Does anyone know a good approach using Entity Framework?