0

I want to remove duplicate records using Entity Framework.

This is what I've tried

var result = _context.History
            .GroupBy(s => new
                    {
                        s.Date,
                        s.EventId
                    })
            .SelectMany(grp => grp.Skip(1)).ToList();

_context.History.RemoveRange(result);
await _context.SaveChangesAsync();

But I get an error

System.InvalidOperationException: Processing of the LINQ expression 'grp => grp.Skip(1)' by 'NavigationExpandingExpressionVisitor' failed. This may indicate either a bug or a limitation in EF Core

I understand that this is breaking change for Entity Framework, but I really don't know how to update my code.

10
  • 1
    Don't use EF Core in the first place. EF Core is an ORM, not a SQL replacement. There are no Objects here, and the easiest and most efficient way to remove duplicates involves a CTE with ROW_NUMBER() that would return all multiples, ranked by whatever sort order you want, allowing you to select which row to keep Commented Nov 22, 2020 at 17:53
  • Eg with dups as (select *, row_number() over (partition by date,eventid order by id desc) rn from...) delete dups where rn>1 will delete all duplicates except the largest id. The CTE doesn't need to return all columns, just the key columns are enough. You can specify a different ORDER BY to select different rows to preserve Commented Nov 22, 2020 at 17:56
  • 1
    @PanagiotisKanavos Is this CTE database agnostic or just SqlServer specific? ORM might not be a SQL replacement, but LINQ is supposed to be abstraction and database agnostic language integrated query language, so why don't use it? The fact that EF Core breaks the contract by not willing to translate it doesn't mean OP is doing something wrong. Commented Nov 22, 2020 at 18:02
  • @IvanStoev the operation isn't object-agnostic. There are no objects involved. LINQ wasn't meant to handle such situations. ORMs were never meant for reporting queries or fully replacing SQL. If you replace database agnostic with ANSI standard, yes, it's ANSI standard and even supported in MySQL after MySQL 8. All other major databases had ROW_NUMBER() already Commented Nov 22, 2020 at 18:04
  • @IvanStoev and SQLite added windowing functions in version 3.25. Besides, what the OP is trying to do doesn't make sense in SQL - that group isn't really grouping and there's no SKIP in SQL. This is trying to apply (somewhat inefficient) LINQ-to-Objects operation to a database hoping that EF Core can somehow translate this to SQL Commented Nov 22, 2020 at 18:07

2 Answers 2

1

Looks like Entity Framework doesn't know how to translate this Skip part of LINQ query. Moreover, it cannot make translate this GroupBy part. In EF Core 3 it will throw an exception to let us know :)

So, a dirty but simple way is to add AsEnumerable almost at the beginning, however, it will fetch all table and perform operations in memory:

var result = _context.History
            .AsEnumerable()
            .GroupBy(s => new { s.Date, s.EventId })
            .SelectMany(g => g.Skip(1))
            .ToList();

_context.History.RemoveRange(result);
await _context.SaveChangesAsync();

Since in most cases it's not acceptable to fetch everything we can split first request into two so that we download only duplicated records.

Second answer of this question might help, we can try something like this:

var keys = _context.History
                .GroupBy(s => new { s.Date, s.EventId })
                .Select(g => new { g.Key, Count = g.Count() })
                .Where(t => t.Count > 1)
                .Select(t => new { t.Key.Date, t.Key.EventId })
                .ToList();

var result = _context.History
    .Where(h => keys.Any(k => k.Date == h.Date && k.EventId == h.EventId))
    .AsEnumerable()
    .GroupBy(s => new { s.Date, s.EventId })
    .SelectMany(g => g.Skip(1))
    .ToList();

_context.History.RemoveRange(result);
await _context.SaveChangesAsync();
Sign up to request clarification or add additional context in comments.

3 Comments

Hello, welcome to SO. This is the error I get System.InvalidOperationException: Client side GroupBy is not supported.
Interesting, I didn't know it but looks like in EF Core 3 they added an explicit error since GroupBy is not being translated to SQL, second answer here is quite good: stackoverflow.com/questions/58138556/… The easiest solution is to move AsEnumerable() to the top right after _context.History. However, it will fetch all data from this table to the server and perform everything in memory. Is it acceptable in your case?
I've updated the answer so that it's easier for you to understand my previous comment. It might help :)
1

In this case you are grouping by both columns:

var duplicate = DB.History.GroupBy(x => new { x.Date, x.EventId})
                         .Where(x => x.Count() > 1)
                         .SelectMany(x => x.ToList());

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.