1

I have a csv file containing detailed data, say columns A,B,C,D etc. Columns A and B are categories and C is a time stamp.

I am trying to create a summary file showing one row for each combination of A and B. It should pick the row from the original data where C is the most recent date.

Below is my attempt at solving the problem.

Import-CSV InputData.csv |  `
Sort-Object -property @{Expression="ColumnA";Descending=$false}, `
@{Expression="ColumnB";Descending=$false}, `
@{Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)};Descending=$true} | `
Sort-Object ColumnA, ColumnB -unique `
 | Export-CSV OutputData.csv -NoTypeInformation

First the file is read, then everything is sorted by all 3 columns, the second Sort-Object call is supposed to then take the first row of each. However, Sort-Object with the -unique switch seems to pick a random row, rather than the first one. Thus this does get one row for each AB combination, but not the one corresponding to most recent C.

Any suggestions for improvements? The data set is very large, so going through the file line by line is awkward, so would prefer a powershell solution.

1 Answer 1

3

You should look into Group-By. I didn't create a sample CSV (you should provide it :-) ) so I haven't tested this out, but I think it should work:

Import-CSV InputData.csv |  `
Select-Object -Property *, @{Label="DateTime";Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)}} | `
Group-Object ColumnA, ColumnB | `
% {
    $sum = ($_.Group | Measure-Object -Property ColumnD -Sum).Sum
    $_.Group | Sort-Object -Property "DateTime" -Descending | Select-Object -First 1 -Property *, @{name="SumD";e={ $sum } } -ExcludeProperty DateTime
} | Export-CSV OutputData.csv -NoTypeInformation

This returns the same columns that was inputted(datetime gets excluded from the output).

Sign up to request clarification or add additional context in comments.

4 Comments

seems close, but it only shows the 3 columns A,B,C and none of the other columns. I am trying to get the row with the latest C (which it does) along the columns D,E,etc.
try the updated answer. if it doesn't work, can you provide a sample csv with the exact number of columns and some sample data ? :-)
That works fine. Many thanks. Can I ask for a tiny additional feature? If I wanted a subtotal for ColumnD, can that be added easily. So output will show ColumnA,ColumnB grouped, the latest of date time ColumnC, all other columns and as an additional field the total of ColumnD (summed wherever A and B match that particular group, i.e. same subsets where we want to find latest Column C).
Done. Added a column "SumD" per group that shows to sum of ColumnD for all "group members".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.