4

I have a DataTable which looks like this:

 ID   Name    DateBirth
.......................
 1     aa      1.1.11
 2     bb      2.3.11
 2     cc      1.2.12
 3     cd      2.3.12

Which is the fastest way to remove the rows with the same ID, to get something like this (keep the first occurrence, delete the next ones):

 ID   Name    DateBirth
.......................
 1     aa      1.1.11
 2     bb      2.3.11
 3     cd      2.3.12

I don't want to double pass the table rows, because the row number is big. I want to use some LinQ if possible, but I guess it will be a big query and I have to use a comparer.

2
  • What have you tried? Is it just about the ID? The other fields are irrelevant? Commented Mar 27, 2013 at 16:07
  • The common way. 2 for's and verify every row's ID field. If it is duplicate , delete it. But this is basic and low performance. And yes, the other fields are irrelevant. Just ID is important. Commented Mar 27, 2013 at 16:10

6 Answers 6

11

You can use LINQ to DataTable, to distinct based on column ID, you can group by on this column, then do select first:

  var result = dt.AsEnumerable()
                 .GroupBy(r => r.Field<int>("ID"))
                 .Select(g => g.First())
                 .CopyToDataTable();
Sign up to request clarification or add additional context in comments.

Comments

3

I was solving the same situation and found it quite interesting and would like to share my finding.

  1. If rows are to be distinct based on ALL COLUMNS.
DataTable newDatatable = dt.DefaultView.ToTable(true, "ID", "Name", "DateBirth");

The columns you mention here, only those will be returned back in newDatatable.

  1. If distinct based on one column and column type is int then I would prefer LINQ query.
  DataTable newDatatable = dt.AsEnumerable()
                           .GroupBy(dr => dr.Field<int>("ID"))
                           .Select(dg => dg).Take(1)
                           .CopyToDataTable();
  1. If distinct based on one column and column type is string then I would prefer loop.
List<string> toExclude = new List<string>();
for (int i = 0; i < dt.Rows.Count; i++)
{
    var idValue = (string)dt.Rows[i]["ID"];
    if (toExclude.Contains(idValue))
    {
        dt.Rows.Remove(dt.Rows[i]);
        i--;
    }
    toExclude.Add(glAccount);
}

Third being my favorite.

I may have answered few things which are not asked in the question. It was done in good intent and with little excitement as well.

Hope it helps.

1 Comment

What if distinct is based on two columns? Not one (i.e., id in above case).
2

you can try this

DataTable uniqueCols = dt.DefaultView.ToTable(true, "ID");

Comments

2

Not necessarily the most efficient approach, but maybe the most readable:

table = table.AsEnumerable()
    .GroupBy(row => row.Field<int>("ID"))
    .Select(rowGroup => rowGroup.First())
    .CopyToDataTable();

Linq is also more powerful. For example, if you want to change the logic and not select the first (arbitrary) row of each id-group but the last according to DateBirth:

table = table.AsEnumerable()
    .GroupBy(row => row.Field<int>("ID"))
    .Select(rowGroup => rowGroup
                          .OrderByDescending(r => r.Field<DateTime>("DateBirth"))
                          .First())
    .CopyToDataTable();

Comments

2
  1. Get a record count for each ID
var rowsToDelete = 
    (from row in dataTable.AsEnumerable()
    group row by row.ID into g
    where g.Count() > 1
  1. Determine which record to keep (don't know your criteria; I will just sort by DoB then Name and keep first record) and select the rest
select g.OrderBy( dr => dr.Field<DateTime>( "DateBirth" ) ).ThenBy( dr => dr.Field<string>( "Name" ) ).Skip(1))
  1. Flatten
.SelectMany( g => g );
  1. Delete rows
rowsToDelete.ForEach( dr => dr.Delete() );
  1. Accept changes
dataTable.AcceptChanges();

Comments

1

Heres a way to achive this, All you need to use moreLinq library use its function DistinctBy

Code:

protected void Page_Load(object sender, EventArgs e)
{
  var DistinctByIdColumn = getDT2().AsEnumerable()
                                   .DistinctBy(
                                   row => new { Id = row["Id"] });
  DataTable dtDistinctByIdColumn = DistinctByIdColumn.CopyToDataTable();
}


public DataTable getDT2()
{
   DataTable dt = new DataTable();
   dt.Columns.Add("Id", typeof(string));
   dt.Columns.Add("Name", typeof(string));
   dt.Columns.Add("Dob", typeof(string));
   dt.Rows.Add("1", "aa","1.1.11");
   dt.Rows.Add("2", "bb","2.3.11");
   dt.Rows.Add("2", "cc","1.2.12");
   dt.Rows.Add("3", "cd","2.3.12");
   return dt;
}

OutPut: As what you expected

enter image description here

For moreLinq sample code view my blog

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.