4

I tried the Internet and the SOF but couldn't locate a helpful resource. Perhaps I may not be using correct wording to search. If there are any previous questions I have missed due to this reason please let me know and I will take this question down.


I am dealing with a busy database so I am required to send less queries to the database.

If I access different columns of the same Linq query from different levels of the code then is Entity Framework smart enough to foresee the required columns and bring them all or does it call the db twice?

eg.

var query = from t1 in table_1
            join t2 in table_2 on t1.col1 equals t2.col1
            where t1.EmployeeId == EmployeeId
            group new { t1, t2 } by t1.col2 into grouped
            orderby grouped.Count() descending
            select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) };

var records = query.Take(10);

// point x
var x = records.Select(a => a.Column1).ToArray();

var y = records.Select(a => a.Column2).ToArray();

Does EF generate query the database twice to faciliate x and y (send a query first to get Column1, and then send another to get Column2) or is it smart enough to know it needs both Columns to be materialised and bring them both at point x?

Added to clarify the intention of the question:

I understand I can simply add a greedy method to the end of query.Take(10) and get it done but I am trying to understand if the approach I try (and in my opinion, more elegant) does work of if not what makes EF to make two queries please.

7
  • didn't understand what you mean in your update Commented Aug 30, 2016 at 9:26
  • Take(10) will get back with 10 rows, not the column Commented Aug 30, 2016 at 9:27
  • 2
    @SarveshMishra Take(10) will not immediately return 10 rows, it returns an IQueryable<T> that adds the taking of max 10 records when the query is finally executed. But yes, it does not change the selected columns in any way, correct. Commented Aug 30, 2016 at 9:29
  • 1
    maybe also interresting for you could be this question: stackoverflow.com/questions/32308095/… it's a similar question, where you can see what big performance-issues can appear if you use EntityFramework wrong (e.g. don't use ToList() or use ToList() where you should not use it) :) Commented Aug 30, 2016 at 9:32
  • 1
    @Menol - Edited my answer. I think it explained also before but maybe now it is more clear to why it goes once or twice to DB? Commented Aug 30, 2016 at 9:35

2 Answers 2

4

Yes currently your code will generate 2 queries that will be executed to the database. Reason being is because you have 2 different sqls generated:

  1. First is the top query, taking only 10 records and then only Column1
  2. Second is the top query, taking only 10 records and then only Column2

The reason these are 2 queries is because you have a ToArray over different Select statements -> generating different sql. Most of linq queries are differed executed and will be executed only when you use something like ToArray()/ToList()/FirstOrDefault() and so on - those that actually give you the concrete data. In your original query you have 2 different ToArray on data that has not yet been retrieved - meaning 2 queries (once for the first field and then for the second).

The following code will result in a single query to the database

var records = (from t1 in table_1
               join t2 in table_2 on t1.col1 equals t2.col1
               where t1.EmployeeId == EmployeeId
               group new { t1, t2 } by t1.col2 into grouped
               orderby grouped.Count() descending
               select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) })
              .Take(10).ToList();

var x = records.Select(a => a.Column1).ToArray();
var y = records.Select(a => a.Column2).ToArray();

In my solution above I added a ToList() after filtering out only that data you need (Take(10)) and then at that point it will execute to the database. Then you have all the data in memory and you can do any other linq operation over it without it going again to the database.


Add to your code ToString() so you can check the generated sql at different points. Then you will understand when and what is being executed:

var query = from t1 in table_1
            join t2 in table_2 on t1.col1 equals t2.col1
            where t1.EmployeeId == EmployeeId
            group new { t1, t2 } by t1.col2 into grouped
            orderby grouped.Count() descending
            select new { Column1 = grouped.Key, Column2 = grouped.Sum(g=>g.t2.col4) };
var generatedSql = query.ToString(); // Here you will see a query that brings all records

var records = query.Take(10);
generatedSql = query.ToString(); // Here you will see it taking only 10 records


// point x
var xQuery = records.Select(a => a.Column1);
generatedSql = xQuery.ToString(); // Here you will see only 1 column in query

// Still nothing has been executed to DB at this point

var x = xQuery.ToArray(); // And that is what will be executed here

// Now you are before second execution

var yQuery = records.Select(a => a.Column2);
generatedSql = yQuery.ToString(); // Here you will see only the second column in query

// Finally, second execution, now with the other column

var y = yQuery.ToArray();
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks. I am actually trying to understand what happens when I try the other method (that to me is more elegant). I am sorry for not being clearer before. I have updated the question now.
@Menol - The Take doesn't have anything to do with the "when it will go to the database and how many times"
Hey @Gilad Green thanks for your interest to help me. can you help me understand why EF doesn't bring both columns at point X? surely it should be able to see the need of the second column at the compile time when the IL is generated.
@Menol - at point X it still didn't bring anything and it's instructions at that point are to bring both columns. However then you add a Select and then ToArray. the select changed the generated sql to include only the 1 field and then the ToArray executes it. I'll edit my answer to show you how to check what is going on so you can better understand it
thanks again and if I may pick your brain a little more, As you know C# isn't interpreted line-by-line So why cant .NET and/or EF see the second query when it is generating the IL which is actually executed later? in other words how cannot it work like (ok, I need to bring Column 1 now but I see I also need to bring the second column after two lines of code so let me fetch it now to avoid another query)?
|
1

When you are running linq statement on an entity in EF if only prepares the Select statement (thats why the type is IQueryable). The data is loaded lazily. When you try to use a value from that query then only the result gets evaluated using a enumerator.

So when you turn it to a collection (.toList() etc.) explicitly it tries to get data to populate the list and hence the sql command is fired.

It is designed so to enhance the performance. So if a particular property of an entity is to be used EF doesn't get the value for all the columns from that table

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.