1

I'm working on a script that will need to process the same type of file, but with different content at different times. I have a CSV file that looks something like the example below. Not every field may contain a value.

record,title,creator,date,subject,location 0,Title1,Creator1,2018-08-17,Subject1,Location1 1,Title2,Creator2,2018-08-17,,Location1 2,Title3,Creator3,,Subject2,Location2

I need to convert this CSV from a data table to a list of key-value pairs, per record, ONLY if there is a value present. The header will be kind of generic, with field,value repeating for each key-value pair in the rows. For example:

record,field,value,field,value,field,value,field,value,field,value 0,title,Title1,creator,Creator1,date,2018-08-17,subject,Subject1,location,Location1 1,title,Title2,creator,Creator2,date,2018-08-17,location,Location1,,, 2,title,Title3,creator,Creator3,subject,Subject2,location,Location2,,,

I can read the CSV in with Import-CSV, but I'm having a hard time changing the structure. Every path I've tried to go down leads no where, same with searching for solutions. At this point, it seems like it might be easiest to build the new CSV manually, but that didn't seem right, so I thought I'd ask here. Can anyone point me in the right direction?

I can find a lot of CSV, hashtable, and key-value pair questions on StackOverflow, but nothing quite like this.

2
  • You have an inconsistency in your second CSV example. You show key/value for everything except 'record'. Is that on purpose? Commented Aug 17, 2018 at 22:35
  • Yes, that's intentional. The record column is an identifier for a digital object and the key/values are metadata properties. I have to convert a data table exported from one system into a set of key/value pairs but still associated with one record. (And in CSV.) Commented Aug 18, 2018 at 2:48

1 Answer 1

1

I think you misunderstand how Import-Csv works. It does not create a hashtable, it creates an array of objects. Each object will have a set of properties, as defined by the header. Since the data was imported from a CSV it simplifies things by guaranteeing that each object has the same properties (they may not have values, but the properties exist and are identical). Because of that we can get a list of the properties of the first object as a baseline set, then loop through each record and build a string for each record based on that baseline. As you suggested we would be building the CSV manually.

$DataIn = Import-Csv C:\Path\To\File.csv
$Props = $DataIn[0].psobject.properties.name
$DataOut = ('record,'+$((2..$Props.Count|%{'field,value'}) -join ',')),$(For($i=0;$i -lt $DataIn.count;$i++){
    [array]$tmpRecord = Switch($Props){
        'Record' {$DataIn[$i].record;continue}
        {[string]::IsNullOrEmpty($DataIn[$i].$_)} {continue}
        default {'{0},{1}' -f $_, $DataIn[$i].$_}
    }
    If(($tmpDiff = $Props.count - $tmpRecord.count) -gt 0){$tmpRecord += ','*($tmpDiff*2-1)}
    $tmpRecord -join ','
})
$DataOut | Set-Content C:\Path\To\Output.csv

So that does exactly what I suggested, while retaining your example output of not doing key/value for the record column. The switch checks against each potential property, and if it is the 'record' property it just outputs the record value and continues on to the next property. If it is is anything else it checks to see if that property is blank, and if so it moves to the next property. If it isn't blank it outputs field,value, and then all of those outputs (record, and any field/value combo) are joined by commas into a single line per record. It also adds on extra commas for fields that are null. Each record's line is collected in $DataOut, along with a calculated header line.

Mind you, PowerShell will not want to read that file in with Import-Csv because of the duplicate columns since the header row is mainly 'field,value' repeated over and over. I assume that you are saving in this format for some external program that needs that format for input.

Sign up to request clarification or add additional context in comments.

4 Comments

Thank you. I misspoke when I used hashtable before and should have said data table. I wasn't referring to how import-csv parses the data, but how the data is modeled in the original CSV itself. This solution sounds like it will work, thank you. I look forward to testing it. The CSV will be used for another program and the repeated header may be problematic, but I'm hoping it will just get skipped over. I'll report back.
Huzzah, it works as described! I didn't need to have add 'record,+ into $DataOut since that column came in with $Props.
I do have a secondary problem though. Some of the values are double-quoted and contain spaces and commas, so the splitting into key/value pairs is incorrect. For example, " the key/value of Title/"Chapter 4, Page 1" is coming out Title/Chapter 4 and Page 1/Format. (Format/book is the next key/value.)
Got it, [char]34 +!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.