Count duplicate rows in a datatable using vb.net

Question

I have a datatable called dtstore with 4 columns called section, department, palletnumber and uniquenumber. I am trying to make a new datatable called dtmulti which has an extra column called multi which shows the count for the number of duplicate rows...

dtstore

section | department | palletnumber | batchnumber

---------------------------------------------------

 pipes      2012          1234           21

 taps       2011          5678           345

 pipes      2012          1234           21

 taps       2011          5678           345

 taps       2011          5678           345

 plugs      2009          7643           63


dtmulti

section | department | palletnumber | batchnumber | multi

----------------------------------------------------------

 pipes      2012          1234           21           2

 taps       2011          5678           345          3

I have tried lots of approaches but my code always feels clumsy and bloated, is there an efficient way to do this?

Here is the code I am using:

Dim dupmulti = dataTable.AsEnumerable().GroupBy(Function(i) i).Where(Function(g) g.Count() = 2).Select(Function(g) g.Key)  

For Each row In dupmulti multirow("Section")  = dup("Section") 

multirow("Department") = dup("Department") 
multirow("PalletNumber") = dup("PalletNumber") 
multirow("BatchNumber") = dup("BatchNumber") 
multirow("Multi") = 2
    Next

This specific code is plainly wrong. Firstly, it does not compile (and you are not even using LINQ right with the RowData type) and secondly it doesn't even try to deliver what you want (looking for rows repeated only two times?!). Anyway... will post a working code. — user2480047
– user2480047, Commented Sep 13, 2015 at 18:19

score 2 · Accepted Answer · 2015-09-13 18:31:32Z

2

Assumptions of the code below these lines: the DataTable containing the original information is called dup. It might contain any number of duplicates and all of them can be defined by just looking at the first column.

'Creating final table from the columns in the original table
Dim multirow As DataTable = New DataTable

For Each col As DataColumn In dup.Columns
   multirow.Columns.Add(col.ColumnName, col.DataType)
Next
multirow.Columns.Add("multi", GetType(Integer))

'Looping though the groupped rows (= no duplicates) on account of the first column
For Each groups In dup.AsEnumerable().GroupBy(Function(x) x(0))

    multirow.Rows.Add()

    'Adding all the cells in the corresponding row except the last one
    For c As Integer = 0 To dup.Columns.Count - 1
        multirow(multirow.Rows.Count - 1)(c) = groups(0)(c)
    Next

    'Adding the last cell (duplicates count) 
    multirow(multirow.Rows.Count - 1)(multirow.Columns.Count - 1) = groups.Count

Next

edited Sep 13, 2015 at 18:31

answered Sep 13, 2015 at 18:10

user2480047

Sign up to request clarification or add additional context in comments.

2 Comments

ThickAsAPlank Over a year ago

duplicates can not be defined by looking just at the first column, all columns need to match for it to be a duplicate.

user2480047 Over a year ago

@ThickAsAPlank I think that writing this code from what you were providing (mainly by bearing in mind that SO is not a custom code writing service) is already a quite good answer; you should take it as a first solid step to build your own code, rather than keep requesting. Additionally, in your sample data/code all the duplicate cells were identical. You can easily find references to multi-column grouping. For example: stackoverflow.com/questions/11121303/… Just update the Function(x) x(0) by including all the columns you want.

Collectives™ on Stack Overflow

Count duplicate rows in a datatable using vb.net

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related