2

I have this query that I want translated pretty much 1:1 from Entity Framework to SQL:

SELECT GroupId, ItemId, count(*) as total
  FROM [TESTDB].[dbo].[TestTable]
  WHERE GroupId = '64025'
  GROUP BY GroupId, ItemId
  ORDER BY GroupId, total DESC

This SQL query should sort based on the number occurrence of the same ItemId (for that group).

I have this now:

from x in dataContext.TestTable.AsNoTracking()
where x.GroupId = 64025
group x by new {x.GroupId, x.ItemId}
into g
orderby g.Key.GroupId, g.Count() descending 
select new {g.Key.GroupId, g.Key.ItemId, Count = g.Count()};

But this generates the following SQL code:

SELECT 
  [GroupBy1].[K1] AS [GroupId], 
  [GroupBy1].[K2] AS [ItemId], 
  [GroupBy1].[A2] AS [C1]
FROM ( SELECT 
         [Extent1].[GroupId] AS [K1], 
         [Extent1].[ItemId] AS [K2], 
         COUNT(1) AS [A1], 
         COUNT(1) AS [A2]
       FROM [dbo].[TestTable] AS [Extent1]
       WHERE 64025 = [Extent1].[GroupId]
       GROUP BY [Extent1].[GroupId], [Extent1].[ItemId]
     )  AS [GroupBy1]
ORDER BY [GroupBy1].[K1] ASC, [GroupBy1].[A1] DESC

This also works but is a factor 2 slower than the SQL I created.

I've been fiddling around with the linq code for a while but I haven't managed to create something similar to my query.

Execution plan (only the last two items, the first two are identical):

FIRST:   |--Stream Aggregate(GROUP BY:([Extent1].[ItemId]) DEFINE:([Expr1006]=Count(*), [Extent1].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId] as [Extent1].[GroupId])))
           |--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group]), SEEK:([TESTDB].[dbo].[TestTable].[GroupId]=(64034)) ORDERED FORWARD)

SECOND:  |--Stream Aggregate(GROUP BY:([TESTDB].[dbo].[TestTable].[ItemId]) DEFINE:([Expr1007]=Count(*), [TESTDB].[dbo].[TestTable].[GroupId]=ANY([TESTDB].[dbo].[TestTable].[GroupId])))
           |--Index Seek(OBJECT:([TESTDB].[dbo].[TestTable].[IX_Group] AS [Extent1]), SEEK:([Extent1].[GroupId]=(64034)) ORDERED FORWARD)
8
  • 1
    Why do you not use the TOP functionality in LINQ? In Lambdas at least it exists (top, skip functions). Your speed difference is pulling all elements - instead of only top 1000 - possibly. EF version? They work on making better SQL. Commented Jan 6, 2013 at 10:44
  • @ TomTom, I know it exists, I need all data but the query with TOP was just a quick test from myself. I removed it from the question so please ignore :) Commented Jan 6, 2013 at 10:46
  • Hm, smells like bad SQL - which EF is known for. Not sure a lot can be done. Which EF Version you use? it gets better with 5 adand mabe 6, otherwise you can ask on the EF side on codeplex and the develoeprs may jump in. Commented Jan 6, 2013 at 10:51
  • 1
    What are the execution plans? It's possible bad stats leads to different plans for equivalent queries. Commented Jan 6, 2013 at 10:55
  • 1
    The execution plans are the same. Only names are different. Commented Jan 6, 2013 at 15:46

1 Answer 1

4

The query that Entity Framework generates and your hand crafted query are semantically the same and will give the same plan.

The derived table definition is inlined during query optimisation so the only difference might be some extremely minor additional overhead during parsing and compilation.

The snippets of SHOWPLAN_TEXT you have posted are the same plan. The only difference is aliases. It looks as though your table definition is something like.

CREATE TABLE [dbo].[TestTable] 
(
[GroupId] INT,
[ItemId] INT
)

CREATE NONCLUSTERED INDEX IX_Group ON  [dbo].[TestTable] ([GroupId], [ItemId]) 

And you are getting a plan like this

Plan

To all intents and purposes the plans are the same. Your performance testing methodology is probably flawed. Maybe your first query brought pages into cache that then benefited the second query for example.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, very interesting. I tested both variants several times with 10000 iterations, my own sql code was always nearly a factor 2 faster. The TestTable only includes one other column (only integers, named SetId), I don't suspect that this could cause the difference? The PK is set on all three colums (SetId, GroupId, ItemId), and the nonclustered index is on GroupId, SetId. The execution plans graphics you've posted look similar indeed to what I've got here. I tested this again because of your post and now I do see similar results. Maybe it was just a temp difference somewhere after all.
@Areius - Were you testing the actual raw queries or the performance of EF vs a raw query sent from C#? EF has overhead for compiling the query except if you were to use compiled queries.
Both from EF, my own using the EF's 'SqlQuery' method.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.