PowerShell finding duplicates in CSV and outputting different header

Question

I guess the question is in the title.

I have a CSV that looks something like

user,path,original_path

I'm trying to find duplicates on the original path, then output both the user and original_path line.

This is what I have so far.

$2 = Import-Csv 'Total 20_01_16.csv' | Group-Object -Property Original_path | 
Where-Object { $_.count -ge 2 } | fl Group | out-string -width 500

This gives me the duplicates in Original_Path. I can see all the required information but I'll be danged if I know how to get to it or format it into something useful.

I did a bit of Googleing and found this script:

$ROWS = Import-CSV -Path 'Total 20_01_16.csv'
$NAMES = @{}
$OUTPUT = foreach ( $ROW in $ROWS ) { 
IF ( $NAMES.ContainsKey( $ROW.Original_path ) -and $NAMES[$ROW.original_path] -lt 2 ) 
{ $ROW }
$NAMES[$ROW.original_path] += 1 }

Write-Output $OUTPUT

I'm reluctant to use this because, well first I have no idea what it's doing. So little of the makes any sense to me, I don't like using scripts I can't get my head around. Also, and this is the more important part, it's only giving me a single duplicate, it's not giving me both sets. I'm after both offending lines, so I can find both users with the same file.

If anyone could be so kind as to lend a hand I'd appreciate it. Thanks

What output do you need? The original csv-rows for duplicates? — Frode F.
– Frode F., Commented Jan 21, 2016 at 11:42
Pretty much the whole thing. So if a duplicate is found in Original_Path, I want User,Path,Original_Path But I need the output for both discoveries. So if my csv looks like this: user,path,original_path user1,\\compa\c$\program files\test.doc,\\server1\files\test1.doc user2,\\compb\c$\program files\test.doc,\\server1\files\test1.doc I'll need to know about both user1 and user2, not just user2 which is all I'm getting at the moment. Thanks — Graham J
– Graham J, Commented Jan 21, 2016 at 11:51

ScriptMonkey · Accepted Answer · 2017-08-25 06:32:41Z

11

It depends on the output format you need, but to build on what you already have we can use this to show the records in the console:

Import-Csv 'Total 20_01_16.csv' |
Group-Object -Property Original_path |
Where-Object { $_.count -ge 2 } |
Foreach-Object { $_.Group } |
Format-Table User, Path, Original_path -AutoSize

Alternatively, use this to save them in a new csv-file:

Import-Csv 'Total 20_01_16.csv' |
Group-Object -Property Original_path |
Where-Object { $_.count -ge 2 } |
Foreach-Object { $_.Group } |
Select User, Path, Original_path |
Export-csv -Path output.csv -NoTypeInformation

edited Aug 25, 2017 at 6:32

ScriptMonkey

3212 gold badges5 silver badges22 bronze badges

answered Jan 21, 2016 at 11:51

Frode F.

55.4k9 gold badges104 silver badges118 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

js2010 Over a year ago

I'm surprised there's not an option like unix's "uniq -d" that prints out the duplicates. I also tried "sort -u propertyname" and then doing a diff with the original array, but it didn't work well.

JHBonarius Over a year ago

Jeeze this almost killed my pc for a 430.000.000 line file (not exaggerating, and running 64GB ram). Isn't there something cheaper? The file is already sorted.

Frode F. Over a year ago

Thousands vs millions lines require different approaches for text reading in general. Look into System.IO.StreamReader + a dictionary/hashset/whatever to quickly lookup duplicates. Ex. of using StreamReader: stackoverflow.com/questions/35119112/…

Jeremy E. · Accepted Answer · 2025-02-21 17:20:32Z

0

The logic above is skipping a defined field that is blank for all values on the file doesn't output the duplicate records / fields.

answered Feb 21 at 17:20

Jeremy E.

1

1 Comment

Community Feb 21 at 20:47

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.

Collectives™ on Stack Overflow

PowerShell finding duplicates in CSV and outputting different header

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related