How to remove duplicates based on a certain column in SQL Server? [duplicate]

Question

If I have a table like this

fid name   date
---------------------
1   John1  2020-10-08
1   John2  2020-10-08
1   John3  2018-06-04
2   Tom1   2019-10-08

I want to preserve the row for each fid that has the most recent date. However, if there are multiple, keep only 1 (any of them is fine). So the final result should end up like

fid name   date
---------------------
1   John1  2020-10-08
2   Tom1   2019-10-08

Does anyone know how to do this in SQL Server? I use v14 (2017) if that matters.

Problem is if I group by fid and do Max(date), I get 1 record per fid, but then when I left join on it to get the other columns it gives me back 2 records since the most recent date shows up twice.

First it sounds like you want to remove rows from your table ("remove duplicates"), then it sounds like you merely want to select rows without duplicates ("to get the other columns"). Which do you want? DELETE or SELECT? — Thorsten Kettner
– Thorsten Kettner, Commented Oct 22, 2021 at 6:53
Not delete original data, just get a query that doesn't include the ones I want gone. So I want to select. — omega
– omega, Commented Oct 22, 2021 at 13:51

Tim Biegeleisen · Accepted Answer · 2021-10-22 06:22:49Z

9

We can use a deletable CTE along with ROW_NUMBER here:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY fid ORDER BY date DESC, name) rn
    FROM yourTable
)

DELETE
FROM cte
WHERE rn > 1;

The above logic will assign rn = 1 (i.e. spare) the record with the most recent date, per group of fid records. Should two records with the same fid also have the same latest date, then it spares the earlier name.

answered Oct 22, 2021 at 6:22

Tim Biegeleisen

526k32 gold badges323 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

omega Over a year ago

Note that I want to just select, not delete any actual data.

Thorsten Kettner Over a year ago

@omega: Then replace DELETE with SELECT * (or rather the columns you want to select).

Tim Biegeleisen Over a year ago

@omega Then use the same CTE but do SELECT * FROM cte WHERE rn = 1

Collectives™ on Stack Overflow

How to remove duplicates based on a certain column in SQL Server? [duplicate]

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related