0

I've been testing Entity Framework to try to understand it better and see how it can be used effectively as a back-end device to query a database.

For reference, I know that Entity Framework uses lazy loading by default. For a back-end system like the one that I am trying to create, this is not useful.

int x = 0;
using (SandboxContext dbc = new SandboxContext()) {
    var customers = (from c in dbc.Customer orderby c.AcctNumber select new { c.CustomerKey, c.AcctNumber }).ToList();
    var products = (from p in dbc.Product orderby p.CustomerKey select new { p.CustomerKey }).ToList();
    foreach (var c in customers)
        foreach (var p in products.Where(s => s.CustomerKey == c.CustomerKey))
            ++x;
    dbc.Dispose();
}
return x;

This is code equivalent to what I am currently using.

Everything that I have tried seems to only worsen the performance of the method. For reference, this code executes for around 5 seconds on my machine to return a count of around 22000 pieces of auto-generated data. This code, on the other hand runs almost instantaneously for the same result:

SqlConnection sqlc = new SqlConnection(sqlConnectString);
SqlDataAdapter sqlda = new SqlDataAdapter("SELECT customerkey, acctnumber FROM customers", sqlc);

DataTable dtCustomers = new DataTable(), dtProducts = new DataTable();
sqlda.Fill(dtCustomers);
sqlda.SelectCommand.CommandText = "SELECT customerkey FROM product";
sqlda.Fill(dtProducts);
sqlda.Dispose();
sqlc.Close();

DataView dvCustomers = new DataView(dtCustomers) { Sort = "AcctNumber" };
DataView dvProducts = new DataView(dtProducts) { Sort = "CustomerKey" };

int x = 0;
for (int y = 0; y < 1000; y++)
    foreach (DataRowView drvCustomers in dvCustomers) {
        DataRowView[] drvaProducts = dvProducts.FindRows(drvCustomers["customerkey"].ToString());
        foreach (DataRowView drvProducts in drvaProducts)
            ++x;
        }
return x;

I far prefer the cleanliness and readability of the Entity Framework code, but I think that I'm missing some crucial piece of information that's significantly hurting the speed of my method. Any thoughts to improve the Entity Framework code to at least get close to the speed of the DataTable/DataView/DataRowView implementation?

8
  • What should x value represents? Commented Apr 15, 2016 at 15:28
  • 3
    If you have all the necessary reciprocated properties in your models (i.e. a Customer has a collection of Product), what you are trying to do should be as easy as dbc.Customer.SelectMany(c => c.Products).Count(). If your Customer doesn't have a collection of Product, then consider setting one up... you'll be missing out on a lot of EF goodness without this. Commented Apr 15, 2016 at 15:34
  • 1
    First, there is a warmup cost to EF where it builds the models used to build the SQL. msdn.microsoft.com/en-us/library/bb896240(v=vs.100).aspx Second, you're hitting the database twice. You could use a future query to prevent that. lostechies.com/jimmybogard/2014/03/11/… Commented Apr 15, 2016 at 15:38
  • 2
    @LightToTheEnd Great, so dbc.Customer.SelectMany(c => c.Products).Count() will give you a count of all products related to customers in a single query without having to ship all the join data locally. Commented Apr 15, 2016 at 15:52
  • 2
    @LightToTheEnd By using the Customer.Products (or whatever it's called) property, EF will do joins for you without having to be explicit. I'd consider using a join to be an anti-pattern when EF is set up to do it for you. Commented Apr 15, 2016 at 15:59

4 Answers 4

3

If your context is properly set up, you should find that a Customer has a collection of Product. For the sake of this answer, let's call that property Products.

By using the Products property, you are asking EF to do the join on your behalf, so you no longer have to explicitly write a join yourself. In fact, writing out that join "long-hand" is unnecessarily verbose and would be quite a strange thing to do, given that EF does it all for you.

So, now, you can select products that belong to customers as easily as:

dbc.Customer.Select(c => c.Products)

which will effectively give you a list of lists of products.

Now if we flatten that list of lists to a list with SelectMany instead, it's easy to do a count over the resultant list of products.

So:

using(var dbc = new SandboxContext())
{
    var customerProductsCount = dbc.Customer
                                   .SelectMany(c => c.Products)
                                   .Count();
} //note, no .Dispose... `using` took care of that for us.
Sign up to request clarification or add additional context in comments.

Comments

1

You should not dispose your context in using statement because using will do that for you.

Calling ToList will execute the query and prevent you to build complex queries and tun them on database side. Calling ToList will fetch the data from database and can reduce the performance dramatically.

There is no need to order the result of the queries when you don't need it. It just add overhead and increase execution time.

At the end it seems like you can reduce the whole code just to a simple query (thanks JoaoFSA).

using (SandboxContext dbc = new SandboxContext())
{   
    return dbc.Customer.Join(dbc.Product,
                 c => c.CustomerKey,
                 p => p.CustomerKey,
                 (c, p) => new { Customer = c, Product = p})
              .Count();
}

Comments

1

Are you interested in disabling lazy loading? You can disable it by adding

this.Configuration.LazyLoadingEnabled = false;

inside the constructor of your SandboxContext context.

Comments

1

Well when you are using EF you are doing ordering on the DB but in what you are currently using you do it in the code if i'm reading it right, performance may differ.

But the biggest problem i find in your approach is that you are loading all customers and products from DB to the app and then doing the join and count in the app, performance would be much better if this was done in DB with something like this:

using (SandboxContext dbc = new SandboxContext()) {
   return (from c in dbc.Customer join p in dbc.Product on c.CustomerKey equals p.CustomerKey select p).Count(); }

2 Comments

As an fyi, (I don't know if it's for every version, the syntax is still relatively new to me) the given won't compile unless there is a select or group clause at the end of the query block (it does very well once that's there, though).
Yeah i wrote that on the fly and forgot that. Added

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.