3

I get a bunch of tab-delimited files and I want to convert them to comma-delimited files. I also need only two columns from the file "Date" and "Value1", so I drop the rest of the columns.

Date    Value1  Value2  Value3  Value4  Sensor
08.07.2010  115,28  115,45  115,45  115,28  100
07.07.2010  115,34  115,32  115,34  115,25  85
06.07.2010  115,23  115,74  115,74  115,20  203

This works as follows:

(get-ChildItem -Path '*.txt').name | ForEach-Object { 
    Import-Csv -Path $_ -Delimiter "`t" | 
    Select-Object -Property Date,Value1 | 
    Export-Csv "out\$_" 
}

Unfortunately in the source file a comma is used as decimal separator. So I get these files where Value1 is interpreted as string:

"Date","Value1"
"24.02.2017","30,18"
"23.02.2017","30,20"
"22.02.2017","30,18"
"21.02.2017","30,18"
"20.02.2017","30,17"

How can I set the data type of the column to numeric? During Import-CSV?

2
  • can you share the initial CSV? Commented May 19, 2017 at 18:31
  • Added. I had some issues inserting the tabs ... but now it seems to displays correctly. Commented May 19, 2017 at 18:36

2 Answers 2

2

Export-CSV always adding "..." around the output field values is not (in itself) a problem:

  • The CSV data format has no intrinsic concept of a data type, and double-quoting is solely used to enclose a field value (which is a syntactical requirement for field values with embedded , chars., for instance).

  • Excel, for instance, infers a column's data type solely from its content, whether that content is enclosed in double quotes or not.

    • The content interpretation is culture-aware (locale-aware), however.
  • By contrast, PowerShell's Import-Csv never interprets the data and returns all field values as strings.

Depending on the active culture in the environment the CSV will be processed in, you have two options:

  • For a culture such as de-DE (Germany) in which , (comma) rather than . (dot, period) is used as the decimal mark, and ; rather than , is used as the list separator:

    • With the target culture in effect (verify with Get-Culture) use Export-Csv -UseCulture
      (-UseCulture is available since (at least) v2[1] ).Thanks, LotPings
    • Alternatively, use something like Export-Csv -Delimiter ([cultureinfo]::GetCultureInfo('de-DE').TextInfo.ListSeparator)
  • For an English-language culture (or any other culture that use . as the decimal mark):

    • Custom-process the file and replace the , instances in numeric fields with . - see below.

To replace the , instances with ., the following command, which uses string manipulation only, will do:

Get-ChildItem -Path *.txt | ForEach-Object {
  Get-Content -LiteralPath $_.FullName | ForEach-Object {
    ($_ -split '\t')[0,1] -replace ',', '.' -join ','
  } |
    Set-Content -Encoding utf8 "out/$($_.Name)"
}

Note: For simplicity, the command assumes that the Date column values contain no instances of , and that none of the values require "..."-enclosing - which is reasonable in this specific scenario.

  • I've chosen UTF-8 as the output encoding above, because Set-Content defaults to the legacy, culture-specific "ANSI" code page - adjust as needed.

[1] Generally, if a parameter description in a cmdlet's help topic doesn't mention a specific version in which it was introduced, it implies that it's been around since v2. You can now browse older versions of the documentation - down to v3 - on GitHub - just type T or click Find file and start typing a cmdlet / conceptual help-topic name.

Sign up to request clarification or add additional context in comments.

Comments

0

The Export-CSV adds the quotes regardless of the type. Probably the only way you can do is to load the file again and use a regex to remove those quotes:

(get-ChildItem -Path '*.txt').name | ForEach-Object { 
    Import-Csv -Path $_ -Delimiter "`t" | 
    Select-Object -Property Date,Value1 | 
    Export-Csv "out\$_" 
    (Get-Content $_) -replace ',"(\d+),(\d+)"$', ',$1.$2'  | Set-Content "out\$_"
}

Note: You may want to set the -Encoding for the Set-Content cmdlet!

4 Comments

this gives me lines linke this "22.07.2016",8,47 ... not what I had expected, because 8,47 will be interpreted as two columns
you should also change the Get-Content/Set-Content $_ to out\$_ ... because I just wiped my input files on my PC :) ... and now I'll have to wait till Monday to get a new copy in the office.
uppps, sorry about that :p
As an aside, if that folder was indexed just right click on it and choose Restore Previous Versions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.