C# - Remove rows with the same column value from a DataTable [duplicate]

Question

I have a DataTable which looks like this:

 ID   Name    DateBirth
.......................
 1     aa      1.1.11
 2     bb      2.3.11
 2     cc      1.2.12
 3     cd      2.3.12

Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):

 ID   Name    DateBirth
.......................
 1     aa      1.1.11
 2     bb      2.3.11
 3     cd      2.3.12

I don't want to double pass the table rows, because the row number is big. I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.

What have you tried? Is it just about the ID? The other fields are irrelevant? — DerApe
– DerApe, Commented Mar 27, 2013 at 16:07
The common way. 2 for's and verify every row's ID field. If it is duplicate , delete it. But this is basic and low performance. And yes, the other fields are irrelevant. Just ID is important. — darkdante
– darkdante, Commented Mar 27, 2013 at 16:10

cuongle · Accepted Answer · 2013-03-27 16:11:22Z

11

You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:

  var result = dt.AsEnumerable()
                 .GroupBy(r => r.Field<int>("ID"))
                 .Select(g => g.First())
                 .CopyToDataTable();

answered Mar 27, 2013 at 16:11

cuongle

75.5k30 gold badges155 silver badges212 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

shA.t · Accepted Answer · 2015-04-27 04:49:48Z

3

I was solving the same situation and found it quite interesting and would like to share my finding.

If rows are to be distinct based on ALL COLUMNS.

DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");

The columns you mention here, only those will be returned back in newDatatable.

If distinct based on one column and column type is int then I would prefer LINQ query.

  DataTable newDatatable = dt.AsEnumerable()
                           .GroupBy(dr => dr.Field<int>("ID"))
                           .Select(dg => dg).Take(1)
                           .CopyToDataTable();

If distinct based on one column and column type is string then I would prefer loop.

List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
    var idValue = (string)dt.Rows[i]["ID"];
    if (toExclude.Contains(idValue))
    {
        dt.Rows.Remove(dt.Rows[i]);
        i--;
    }
    toExclude.Add(glAccount);
}

Third being my favorite.

I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.

Hope it helps.

edited Apr 27, 2015 at 4:49

shA.t

17k5 gold badges59 silver badges121 bronze badges

answered Jun 24, 2014 at 13:28

Sandy

11.7k27 gold badges84 silver badges130 bronze badges

1 Comment

Jogi Over a year ago

What if distinct is based on two columns? Not one (i.e., id in above case).

COLD TOLD · Accepted Answer · 2013-03-27 16:10:24Z

2

you can try this

DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");

answered Mar 27, 2013 at 16:10

COLD TOLD

13.6k3 gold badges37 silver badges53 bronze badges

Comments

Tim Schmelter · Accepted Answer · 2013-03-27 16:12:50Z

2

Not necessarily the most efficient approach, but maybe the most readable:

table = table.AsEnumerable()
    .GroupBy(row => row.Field<int>("ID"))
    .Select(rowGroup => rowGroup.First())
    .CopyToDataTable();

Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:

table = table.AsEnumerable()
    .GroupBy(row => row.Field<int>("ID"))
    .Select(rowGroup => rowGroup
                          .OrderByDescending(r => r.Field<DateTime>("DateBirth"))
                          .First())
    .CopyToDataTable();

answered Mar 27, 2013 at 16:12

Tim Schmelter

462k79 gold badges719 silver badges980 bronze badges

Comments

shA.t · Accepted Answer · 2015-04-27 04:56:54Z

2

Get a record count for each ID

var rowsToDelete = 
    (from row in dataTable.AsEnumerable()
    group row by row.ID into g
    where g.Count() > 1

Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest

select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))

Flatten

.SelectMany( g => g );

Delete rows

rowsToDelete.ForEach( dr => dr.Delete() );

Accept changes

dataTable.AcceptChanges();

edited Apr 27, 2015 at 4:56

shA.t

17k5 gold badges59 silver badges121 bronze badges

answered Mar 27, 2013 at 16:37

Moho

16.8k1 gold badge36 silver badges35 bronze badges

Comments

Satinder singh · Accepted Answer · 2013-04-10 07:55:35Z

Heres a way to achive this, All you need to use moreLinq library use its function DistinctBy

Code:

protected void Page_Load(object sender, EventArgs e)
{
  var DistinctByIdColumn = getDT2().AsEnumerable()
                                   .DistinctBy(
                                   row => new { Id = row["Id"] });
  DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}


public DataTable getDT2()
{
   DataTable dt = new DataTable();
   dt.Columns.Add("Id", typeof(string));
   dt.Columns.Add("Name", typeof(string));
   dt.Columns.Add("Dob", typeof(string));
   dt.Rows.Add("1", "aa","1.1.11");
   dt.Rows.Add("2", "bb","2.3.11");
   dt.Rows.Add("2", "cc","1.2.12");
   dt.Rows.Add("3", "cd","2.3.12");
   return dt;
}

OutPut: As what you expected

enter image description here

For moreLinq sample code view my blog

Collectives™ on Stack Overflow

C# - Remove rows with the same column value from a DataTable [duplicate]

6 Answers 6

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

1 Comment

Comments

Comments

Comments

Comments

Linked

Related