8

I am having a little trouble creating/populating a csv file with powershell. I am new to powershell, so I may be missing some obvious things, so please go easy on me. Here is the situation:

First I am creating an array(?) to act as my table

#Create output table with headers
$output = @()
$row = New-Object System.Object
$row | Add-Member -MemberType NoteProperty -Name "Example Header 1" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 2" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 3" -Value $null
$output += $row

I am writing it to a file using $output | Export-Csv new.csv -NoTypeInformation

This appears to make a csv file with the headers that I want. If there is a better way to do this, please let me know. The next step is where I am running into problems. I now need to programatically populate the table with data. When importing existing csv files, I am able to access/modify data in the table like an array (i.e. $output[rowIndex]."Header Name" = "new data").

So I tried to add data to my newly created table. I wrote $ouput[0]."Example Header 1" = "Test Data". This works as I expected and populates the first row in the column with the specified header with "Test Data". However, I can ONLY access [0]. $output[1] and so on cause errors because I guess they do not exist. I tried using $output += $row again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because its all the same object).

So basically my question is, how can I create a csv file from scratch, add some headers to it, and then start writing to all the (unknown/variable number of) rows? I am sure there is a better way to do it, but like I said, I am very new to powershell. Ideally I would like to be able to access rows by index (0,1,2, etc) but I am open to whatever.


Basic solution (adapted from Martin Brandl's answer)

This basically reads data from one csv file, and inserts it into another with new specified headers.

$csv = Import-Csv "MyCsv.csv"
$newCsv = @()
foreach($row in $csv) {
    $newCsv += [PSCustomObject]@{
        "New Column Header1" = $row."Original Column Header1"
        "New Column Header2" = $row."Original Column Header2"
    }
}
1
  • 6
    Don't write headers and data as separate manual steps. Create an array of objects representing each row, and then pipe all of it Export-Csv at once, it'll automatically extract property names and create the header row for you. Show us the input CSV samples + desired output format if you want help with rewriting existing CSV data Commented May 22, 2017 at 13:13

2 Answers 2

13

To complement Martin Brandl's helpful answer with an explanation of your symptoms (emphasis added):

I tried using $output += $row again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because it's all the same object).

Indeed, that is what happened: In .NET terms, type (class) [pscustomobject] is a reference type rather than a value type - as evidenced by [pscustomobject].IsValueType returning $false.

If you add a given instance (object) of a reference type to an array multiple times, all such elements point to the very same instance.

Here's brief demonstration.

$obj = [PSCustomObject] @{
    'Example Header 1' = $null
    'Example Header 2' = $null
}

$array = @()
foreach ($ndx in 1..2) {
  # By working with the original $obj every time, you
  # keep modifying the same instance's property values.
  $obj.'Example Header 1' = "h1-$ndx"
  $obj.'Example Header 2' = "h2-$ndx"
  # Adding $obj to an array does NOT create a COPY of $obj
  # but stores a REFERENCE directly to $obj in the array
  # (similar to storing a pointer in unmanaged languages such as C++).
  $array += $obj
}

# Output the array.
$array

This yields the following:

Example Header 1 Example Header 2
---------------- ----------------
h1-2             h2-2
h1-2             h2-2

As you can see, only the last values assigned to .Example Header 1 and .Example Header 2 took effect, because both array elements reference the very same object.

Martin's approach is the simplest way to solve this problem: create a new instance of the custom object in every iteration (casting a hashtable literal to [pscustomobject], as now shown in the question itself: $array += [pscustomobject] @{ ... }).

If you don't want to or cannot recreate your instances from scratch inside the loop, you have two basic choices:

  • Clone a template custom object in every loop iteration or simply use [pscustomobject] @{ ... } object creation inside the loop, which implicitly creates a new instance every time:

  • PSv5+ alternative: Define a custom class and instantiate it in every loop iteration - see below.


Class-based Solution

In PSv5+, a custom class allows for an elegant solution that also performs better than creating instances in-loop using literal syntax.

# Define a custom class that represents the rows of the
# output CSV.
# Note: [object] is being used here as the properties' type.
#       In real life, you'd use more specific types such as [string]
#       or [int].
class CsvRow {
  [object] ${Example Header 1}
  [object] ${Example Header 2}
}

$array = @()
foreach ($ndx in 1..2) {
  # Instantiate the custom class.
  $rowObj = [CsvRow]::new()
  # Set the values.
  $rowObj.'Example Header 1' = "h1-$ndx"
  $rowObj.'Example Header 2' = "h2-$ndx"
  # Add the instance to the array.
  $array += $rowObj
}

# Output the array.
$array

Performance Considerations

Two factors determine performance:

  • How quickly the array is extended in each loop iteration:

    • Extending an array element by element with $array += ... is very convenient, but is slow and inefficient, because a new array must be created every time (arrays are fixed-size collections and cannot be directly extended).

    • For small iteration counts that may not matter, but the higher the number, the more performance will suffer, and at some point this approach becomes infeasible.

    • The next best solution is to use a [System.Collections.Generic.List[object]] instance to build the array instead - such lists are designed to be efficiently extended.

    • The best and simplest solution, however, is to simply let PowerShell collect multiple outputs from a loop-like statement in an array, simply by assigning to a variable - see below.

  • How quickly the new object is instantiated in each loop iteration:

    • Instantiating an instance of the custom class is faster than creating an instance via the hashtable literal, but only if [CsvRow]::new() is used for instantiation; the functionally equivalent New-Object CsvRow is much slower, due to involving a cmdlet call.

The following variant of the custom-class solution uses implicit array creation to ensure acceptable performance even with higher iteration counts:

# Define the custom class.
class CsvRow {
  [object] ${Example Header 1}
  [object] ${Example Header 2}
}

# Determine the iteration count.
$count = 1000

# Loop and let PowerShell collect the outputs
# from all iterations implicitly in variable $array
[array] $array = foreach ($ndx in 1..$count) {
  # Instantiate the custom class.
  $rowObj = [CsvRow]::new()
  # Set the values.
  $rowObj.'Example Header 1' = "h1-$ndx"
  $rowObj.'Example Header 2' = "h2-$ndx"
  # Simply output the row object
  $rowObj
}

# Output the array.
$array

Note: The [array] type constraint is only needed if you need to ensure that $ToWrite is always an array; without it, if there happened to be just a single loop iteration and therefore output object, $ToWrite would store that output object as-is, not wrapped in an array (this behavior is fundamental to PowerShell's pipeline).

Sign up to request clarification or add additional context in comments.

2 Comments

It's interesting to see a use case for a class here. Why is the class more performant / preferrable than a PSCustomObject, e.g., $array = foreach ($ndx in 1...$count) { $rowObj = [PSCustomObject]{ example header 1 = "h1-$ndx"; example header 2 = "h2-$ndx"}; $rowObj } ? Or is it that better asked in a new question? Edit: Measure-Command gives me 16 ms for PSCustomObject and 10ms for the class using a count of 1000. It would be interesting to know more about this, or if there's a link I'll happily read it.
@Blaisem, in my benchmarks a custom class with per-property initialization is much faster than [pscustomobject], and even using hashtable-based initialization with the custom class is faster, though not by as much. It's hard to explain it briefly, but [pscustomobject]s basically simulate arbitrary properties at runtime, which makes them both slow and more memory-hungry. If you have further questions, yes, please ask a new question.
8

As Mathias mentioned, you shouldn't create the CSV first containg only the headers. Instead, populate your CSV with the actual rows you want and export it:

[PSCustomObject]@{
    'Example Header 1' = "a"
    'Example Header 2' = "b"
    'Example Header 3' = "c"
}, [PSCustomObject]@{
    'Example Header 1' = "a2"
    'Example Header 2' = "b2"
    'Example Header 3' = "c2"
}, [PSCustomObject]@{
    'Example Header 1' = "a3"
    'Example Header 2' = "b4"
    'Example Header 3' = "c5"
} | Export-Csv new.csv -NoTypeInformation

Output:

"Example Header 1","Example Header 2","Example Header 3"
"a","b","c"
"a2","b2","c2"
"a3","b4","c5"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.