I'm trying to format large text files (~300MB) between 0 to 3 columns :
12345|123 Main St, New York|91110
23456|234 Main St, New York
34567|345 Main St, New York|91110
And the output should be:
000000000012345,"123 Main St, New York",91110,,,,,,,,,,,,
000000000023456,"234 Main St, New York",,,,,,,,,,,,,
000000000034567,"345 Main St, New York",91110,,,,,,,,,,,,
I'm new to powershell, but I've read that I should avoid Get-Content so I am using StreamReader. It is still much too slow:
function append-comma{} #helper function to append the correct amount of commas to each line
$separator = '|'
$infile = "\large_data.csv"
$outfile = "new_file.csv"
$target_file_in = New-Object System.IO.StreamReader -Arg $infile
If ($header -eq 'TRUE') {
$firstline = $target_file_in.ReadLine() #skip header if exists
}
while (!$target_file_in.EndOfStream ) {
$line = $target_file_in.ReadLine()
$a = $line.split($separator)[0].trim()
$b = ""
$c = ""
if ($dataType -eq 'ECN'){$a = $a.padleft(15,'0')}
if ($line.split($separator)[1].length -gt 0){$b = $line.split($separator)[1].trim()}
if ($line.split($separator)[2].length -gt 0){$c = $line.split($separator)[2].trim()}
$line = $a +',"'+$b+'","'+$c +'"'
$line -replace '(?m)"([^,]*?)"(?=,|$)', '$1' |append-comma >> $outfile
}
$target_file_in.close()
I am building this for other people on my team and wanted to add a gui using this guide: http://blogs.technet.com/b/heyscriptingguy/archive/2014/08/01/i-39-ve-got-a-powershell-secret-adding-a-gui-to-scripts.aspx
Is there a faster way to do this in Powershell? I wrote a script using Linux bash(Cygwin64 on Windows) and a separate one in Python. Both ran much faster, but I am trying to script something that would be "approved" on a Windows Platform.