25

Is there a way to get the row count of a complex Linq query and millions of records without hitting the db twice or writing 2 separate queries??

I might have my own suggestion. Write a stored procedure, but I'm good with MySQL not MSSQL.

Any better suggestions would be great. Also, if anyone knows if Microsoft is working on adding this feature to the entity framework.

6
  • 3
    I quite confident you cannot get the number of rows in your query without either hitting the database or writing a query separate from the one that actually returns those rows. Commented Apr 13, 2012 at 17:54
  • 1
    When you use .Count() in EF it does not select all rows, it only executes a select count() from table sql statement - so while you do need 2 queries, one of them is very cheap. Commented Apr 16, 2012 at 23:02
  • @JK select count() is not cheap at all! Actually it has almost the same complexity as actual fetching of data, the only difference is instead of fetching rows it only counts them. But it still have to perform all scans, etc. Commented Apr 17, 2012 at 9:15
  • If I have a complex query that just counts the results, which could be 10,000+, another that would just grabbed 20 of those results. Will Count() tax the entire process? Commented Apr 18, 2012 at 15:53
  • Just to throw this out there. I tested the Count then Results execution time in milliseconds and found this: Total Records:1,324,224, Count time AVG: 125, 20 items - Results time AVG: 2850 Commented Apr 18, 2012 at 17:41

6 Answers 6

17

I'd suggest using the Take() function. This can be used to specify the number of records to take from a linq query or List. For example

List<customers> _customers = (from a in db.customers select a).ToList();
var _dataToWebPage = _customers.Take(50);

I use a similar technique in an MVC app where I write the _customers list to the session and then use this list for further pagination queries when the user clicks on page 2, 3 etc. This saves multiple database hits. However if your list is very large then writing it too the session is probably not a good idea.

For pagination you can use the Skip() and Take() function together. For example to get page 2 of the data :

var _dataToWebPage = _customers.Skip(50).Take(50);
Sign up to request clarification or add additional context in comments.

4 Comments

Thanks @TimNewton. Of course, but if I have a million records or better yet a thousand records with very large data columns you run into an out of memory exception.
Jason, you could consider writing the primary keys to a List in the session rather than the entire object, then retrieving the details from the database each time you need to redisplay the data using the primary keys from the list? This still requires multiple database reads though. I dont think you can get away without multiple db reads if you dataset is so large.
you might be right about not being able to make one db read for this instance in c# and using Linq. I know this can be done, its just that I want the ability to use Linq for that it's strongly typed and goes with the rest of my clean code. I have pagination working perfectly on smaller tables that don't hold large data sets or large data.
calling ToList() before Take(50) means you pull every record into your application. And then, after making the server do all that work, you ignore all but 50. Call Take(50) before ToList()
11

I was recently inspired by (copied from) this Code Project article Entity Framework pagination

The code:

public async Task<IList<SavedSearch>> FindAllSavedSearches(int page, int limit)
{
    if (page == 0)
        page = 1;

    if (limit == 0)
        limit = int.MaxValue;

    var skip = (page - 1) * limit;

    var savedSearches = _databaseContext.SavedSearches.Skip(skip).Take(limit).Include(x => x.Parameters);
    return await savedSearches.ToArrayAsync();
}

I'm not experienced with Entity Framework and I've not tested it for performance, so use with caution :)

2 Comments

what is the meaning of .Include(x => x.Parameters); ?
Include is a way for EF to also include other tables' contents. The include instructs EF to also take the "Parameters" table's rows with matching keys into memory along with the SavedSearches.
3

The common way to show millions of records is simply not to display all pages. Think of it: if you have millions of records, say 20 or even 100 items per page, then you'll have tens of thousands of pages. It does not make sense to show them all. You can simply load the current page and provide a link to the next page, that's it. Or you may load say 100-500 records, but still show only one page and use the loaded records information to generate page links for first several pages (so know for sure how many next pages are available).

7 Comments

You are right, I would not want to show hundreds or thousands of links or even make someone step through that many pages. I guess the trouble I'm having is that I want to write just one query. For instance: MySQL allowed you to use SQL_CALC_FOUND_ROWS within a query and then another query to pull just that result. Very efficient!!! Made Pagination wonderful! Thanks @VladimirPerevalov for your thoughts!
AFAIK there is no such support in LINQ neither in MS SQL Server. Actually there are many things that MySql does and MS SQL does not. E.g. SELECT BETWEEN ... is also very effective for pagination.
you can also provide a text box "Go To:" and a button to jump to a specific page directly.
Actually, it is quite common to show some sort of pager controls that indicate the total number of pages/items.
@Jonathan Wood look at results from search engines for example. They show only ~10 pages. And an approximate number of total results (but is quite the other question).
|
3

If you need a quick solution you can use XPagedList https://github.com/dncuug/X.PagedList. XPagedList is a library that enables you to easily take an IEnumerable/IQueryable, chop it up into "pages", and grab a specific "page" by an index. For example

var products = await _context.Products.ToPagedListAsync(pageNumber, pageSize)

Comments

2

It is so easy on SQL Server.

You can write this query:

select count() over(), table.* from table

The count () over() will return the count of the total rows in the result, so you don't need to run two queries. Remember that you should run raw SQL on your context or use Dapper, which returns the result as a view model.

Comments

0

I created a nuget library that does pagination for you. https://github.com/wdunn001/EntityFrameworkPaginateCore

add nuget to project

Install-Package EntityFrameworkPaginateCore add

using EntityFrameworkPaginateCore; to you provider

Has 1 method and 2 overloads for that method overloads allow sorting and filtering. use the sort object and the filter objects

public async Task<Page<Example>> GetPaginatedExample(
            int pageSize = 10, 
            int currentPage = 1, 
            string searchText = "", 
            int sortBy = 2
            )
        {
            var filters = new Filters<Example>();
                filters.Add(!string.IsNullOrEmpty(searchText), x => x.Title.Contains(searchText));

            var sorts = new Sorts<Example>();
            sorts.Add(sortBy == 1, x => x.ExampleId);
            sorts.Add(sortBy == 2, x => x.Edited);
            sorts.Add(sortBy == 3, x => x.Title);

            try
            {
                return await _Context.EfExample.Select(e => _mapper.Map<Example>(e)).PaginateAsync(currentPage, pageSize, sorts, filters);
            }
            catch (Exception ex)
            {
                throw new KeyNotFoundException(ex.Message);
            }
        }

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.