1

I am currently working on a small PowerShell script that is supposed to split a CSV file with translations into individual files for the respective language. For this I use the Import-Csv cmdlet to import the source file, which has about this format:

ResourceId;GermanTranslation;EnglishTranslation;Check
0;Deutscher Text;English text;OK
1;  mit Leerzeichen  ;  with spaces  ;OK

The goal is to get a line-by-line representation of the translations in the format ResourceId|EnglishTranslation|. For this I have built the following script:

Set-Variable SOURCE_FILE -Option Constant -Value ".\sourceFile.csv"
Set-Variable RESULT_FILE -Option Constant -Value ".\resultFile.csv"

foreach ($row in (Import-Csv -Path $SOURCE_FILE -Delimiter ";")) {
    Add-Content -Path $RESULT_FILE -Value ($row.RessourceId + "|" + $row.EnglishTranslation + "|")
}

Basically, everything works as desired, but when I examined the results, I noticed that the spaces with which some elements begin were no longer present in the result:

0|English text|
1|with spaces  |

Unfortunately I didn't find a parameter in the MS documentation that addresses this problem, so I was unsure at first. After that, I took a look at RFC 4180, which describes the CSV file format more or less exactly. It states that spaces should be considered as part of a field and not ignored. Should means not must, so it may well be that there really is no option for this.

Is there a possibility to preserve the spaces without me having to parse the whole file on my own?

3
  • You may try the Import-Csv from the PowerShell version 7 as this has some more options. Another option could be to treat your input csv before using it to enclose cell content in quotes to preserve the spaces as needed. ;-) Commented Apr 9, 2021 at 14:36
  • If you have control of the upstream file (which you obviously might not :-S) you'd ideally generate it with strings quoted - e.g. 1;" mit Leerzeichen ";" with spaces ";OK which does get preserved with Import-Csv. Commented Apr 9, 2021 at 14:43
  • Even in PS 7 there is no parameter to control whitespace behaviour of Import-Csv. You may try a .NET class instead, see stackoverflow.com/a/33796861/7571258 Commented Apr 9, 2021 at 14:52

3 Answers 3

3

Here is a solution that should be more robust (and propably faster) than replacing characters from the CSV file.

It uses the .NET TextFieldParser class from Microsoft.VisualBasic assembly. It has a TrimWhiteSpace attribute which, when set to $false, preserves any leading and trailing whitespace of each field, even if the field is not enclosed in double quotation marks.

I've encapsulated the .NET code in a function named Import-CustomCsv. It supports some additional options of TextFieldParser through various parameters.

Function Import-CustomCsv {
   [CmdletBinding()]
   param (
      [Parameter(Mandatory, ValueFromPipeline)] [String] $Path,
      [String[]] $Delimiter = ',',
      [String[]] $CommentTokens = '#',
      [switch] $TrimWhiteSpace,
      [switch] $HasFieldsEnclosedInQuotes
   )

   # Load Visual Basic assembly if necessary. Seems to be required only for PS 5.
   if (-not ([System.Management.Automation.PSTypeName]'Microsoft.VisualBasic.FileIO.TextFieldParser').Type) {
      Add-Type -AssemblyName 'Microsoft.VisualBasic, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
   }   

   # Create a CSV parser
   $csvParser = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $Path

   try {
      # Set CSV parser options
      $csvParser.SetDelimiters( $Delimiter )
      $csvParser.CommentTokens = $CommentTokens
      $csvParser.TrimWhiteSpace = $TrimWhiteSpace
      $csvParser.HasFieldsEnclosedInQuotes = $HasFieldsEnclosedInQuotes

      # Read the header
      $header = $csvParser.ReadFields()

      while( -not $csvParser.EndOfData ) {
         # Read current line fields, pointer moves to the next line.
         $fields = $csvParser.ReadFields()

         # Associate each field with its name from the header by storing it in an
         # ordered hashtable.
         $namedFields = [ordered]@{}
         for( $i = 0; $i -lt $header.Count; $i++ ) {
            $namedFields[ $header[ $i ] ] = $fields[ $i ]
         }

         # Finally convert fields to PSCustomObject and output (implicitly)
         [PSCustomObject] $namedFields
      }
   }
   finally {
      # Make sure to close the file even in case of an exception.
      $csvParser.Close()
   }
}

Usage example: Parse CSV, preserving whitespace:

Import-CustomCsv test.csv -Delimiter ';'

Output:

ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK

Usage example: Parse CSV, trimming whitespace (like Import-Csv):

Import-CustomCsv test.csv -Delimiter ';' -TrimWhiteSpace

Output:

ResourceId GermanTranslation EnglishTranslation Check
---------- ----------------- ------------------ -----
0          Deutscher Text    English text       OK   
1          mit Leerzeichen   with spaces        OK   

Note:

The above two samples keep field-enclosing double-quotation marks (if any) in the output. To remove double-quotation marks around fields, pass parameter -HasFieldsEnclosedInQuotes.

Sign up to request clarification or add additional context in comments.

2 Comments

This is great +1 for nice improvement over native import-csv, should be tagged as resolved already. Sadly, the one asking the question doesn't seem to be much appreciative.
Wow, a really great solution with pleasantly detailed examples! This actually works. (Sorry I didn't notice the answers until now - it's now marked as solved).
2

One way to to avoid the default trimming of text would be to convert spaces to another character first before treating it as a CSV. Not sure which character would be the best to use but here is an example using underscore:

Get-Content $SOURCE_FILE | %{
    $_ -replace ' ', '_'
} | ConvertFrom-Csv -Delimiter ';' | %{
    '{0}|{1}|' -f $_.ResourceId, $_.EnglishTranslation
} | % {$_ -replace '_', ' '} | Out-File $RESULT_FILE 

Comments

1

This should work as you intend. Try it and let me know.

Basically this is doing what @mclayton recommended in the comments.

$content=Get-Content ./test.csv|%{
    '"{0}"' -f ($_ -replace ';','";"')
}|ConvertFrom-Csv -Delimiter ';'


PS /~> $content


ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.