A friend of mine has found the following problem with Entity Framework 4: Two simple classes and one association between them (one to many): One condition to filter out soft-deleted entities (WHERE Deleted = 0): 100 records in the database; A simple query: 1: var l = ctx.Person.Include("Address").Where(x => (x.Address.Name == "317 Oak Blvd." && x.Address.Number == 926) || (x.Address.Name == "891 White Milton Drive" && x.Address.Number == 497));
Will produce the following SQL:
1: SELECT
2: [Extent1].[Id] AS [Id],
3: [Extent1].[FullName] AS [FullName],
4: [Extent1].[AddressId] AS [AddressId],
5: [Extent202].[Id] AS [Id1],
6: [Extent202].[Name] AS [Name],
7: [Extent202].[Number] AS [Number]
8: FROM [dbo].[Person] AS [Extent1]
9: LEFT OUTER JOIN [dbo].[Address] AS [Extent2] ON ([Extent2].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent2].[Id])
10: LEFT OUTER JOIN [dbo].[Address] AS [Extent3] ON ([Extent3].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent3].[Id])
11: LEFT OUTER JOIN [dbo].[Address] AS [Extent4] ON ([Extent4].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent4].[Id])
12: LEFT OUTER JOIN [dbo].[Address] AS [Extent5] ON ([Extent5].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent5].[Id])
13: LEFT OUTER JOIN [dbo].[Address] AS [Extent6] ON ([Extent6].[Deleted] = 0) AND ([Extent1].[AddressId] = [Extent6].[Id])
14: ...
15: WHERE ((N'317 Oak Blvd.' = [Extent2].[Name]) AND (926 = [Extent3].[Number]))
16: ...
And will result in 680 MB of memory being taken!
Now, Entity Framework has been historically known for producing less than optimal SQL, but 680 MB for 100 entities?!
According to Microsoft, the problem will be addressed in the following version, there is a Connect issue open. There is even a whitepaper, Performance Considerations for Entity Framework 5, which talks about some of the changes and optimizations coming on version 5, but by reading it, I got even more concerned:
“Once the cache contains a set number of entries (800), we start a timer that periodically (once-per-minute) sweeps the cache.”
Say what?! The next version of Entity Framework will spawn timer threads?!
When Code First came along, I thought it was a step in the right direction. Sure, it didn’t include some things that NHibernate did for quite some time – for example, different strategies for Id generation that do not rely on IDENTITY columns, which makes INSERT batching impossible, or support for enumerated types – but I thought these would come with the time. Now, enumerated types have, but so did… timer threads! I’m afraid Entity Framework is becoming a monster.