0

I have a CSV file that is comma-delimited, but the last field in each line is double-quoted and may contain commas within the quotes. I need to have all the commas replaced with pipes (like | ) EXCEPT for the ones within the quoted field at the end of each line.

Example of a line from the file:

2,1,24,Bourne,Jason,,06-01-1973,M,Ned,,Grove,,College Rd,72,1,01-10-2012,Null,85,S,"notes go here, and may contain commas."

I've run the following Powershell script but found that it replaces even the commas within the quotes at the end of the line:

(Get-Content c:\input.csv) 
    | % {$_ -replace ',', "|"} 
    | out-file -FilePath c:\output.csv -Force -Encoding ascii

I've been struggling for a couple hours now trying to come up with a regex to only replace the first 19 commas, but not much success so far. My experience with regex is very limited so this is a learning experience for me. Any help is greatly appreciated!

1
  • what about grabbing everything on each line up until a " then running your replace on that? Commented May 16, 2014 at 17:03

3 Answers 3

2

I would say don't bother with a regex and just use powershell's importing/exporting of csv features. The export-csv command lets you choose the delimeter:

import-csv C:\Input.csv | export-csv -Delimiter "|" -path c:\updated.csv
Sign up to request clarification or add additional context in comments.

2 Comments

This method adds quotes to every field in the file. This would require another operation to remove quotes from all fields except for the last field somehow.
@user3645507 Yup, it will. Note that with the quotes, it's still a valid csv file. But if that's a problem for you, then this probably isn't the best approach.
0

Run it through a RegEx to split it up for that last field, and do a replace on the first part like this:

GC C:\Input.csv |?{$_ -match "(.*?)(`"[^`"]*?`")"}|%{"$($Matches[1].replace(",","|"))$($Matches[2])"}| out-file -FilePath c:\output.csv -Force -Encoding ascii

1 Comment

Had to choose this as the correct answer to the problem, as it was the one-liner that did the trick with no modifications.
0

Another option is to split at the ," and then do the replace in the first element, then rejoin with |"

$text = 
'2,1,24,Bourne,Jason,,06-01-1973,M,Ned,,Grove,,College Rd,72,1,01-10-2012,Null,85,S,"notes go here, and may contain commas."'

($text -split ',"')[0].Replace(',','|'),($text -split ',"')[1] -join '|"' 

2|1|24|Bourne|Jason||06-01-1973|M|Ned||Grove||College Rd|72|1|01-10-2012|Null|85|S|"notes go here, and may contain commas."

Or just split it at the quotes, and re-assemble with a format string:

'{0}"{1}"' -f $text.split('"')[0].replace(',','|'),$text.split('"')[1] 

2|1|24|Bourne|Jason||06-01-1973|M|Ned||Grove||College Rd|72|1|01-10-2012|Null|85|S|"notes go here, and may contain commas."

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.