Create/populate a csv file with Powershell

Question

I am having a little trouble creating/populating a csv file with powershell. I am new to powershell, so I may be missing some obvious things, so please go easy on me. Here is the situation:

First I am creating an array(?) to act as my table

#Create output table with headers
$output = @()
$row = New-Object System.Object
$row | Add-Member -MemberType NoteProperty -Name "Example Header 1" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 2" -Value $null
$row | Add-Member -MemberType NoteProperty -Name "Example Header 3" -Value $null
$output += $row

I am writing it to a file using $output | Export-Csv new.csv -NoTypeInformation

This appears to make a csv file with the headers that I want. If there is a better way to do this, please let me know. The next step is where I am running into problems. I now need to programatically populate the table with data. When importing existing csv files, I am able to access/modify data in the table like an array (i.e. $output[rowIndex]."Header Name" = "new data").

So I tried to add data to my newly created table. I wrote $ouput[0]."Example Header 1" = "Test Data". This works as I expected and populates the first row in the column with the specified header with "Test Data". However, I can ONLY access [0]. $output[1] and so on cause errors because I guess they do not exist. I tried using $output += $row again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because its all the same object).

So basically my question is, how can I create a csv file from scratch, add some headers to it, and then start writing to all the (unknown/variable number of) rows? I am sure there is a better way to do it, but like I said, I am very new to powershell. Ideally I would like to be able to access rows by index (0,1,2, etc) but I am open to whatever.

Basic solution (adapted from Martin Brandl's answer)

This basically reads data from one csv file, and inserts it into another with new specified headers.

$csv = Import-Csv "MyCsv.csv"
$newCsv = @()
foreach($row in $csv) {
    $newCsv += [PSCustomObject]@{
        "New Column Header1" = $row."Original Column Header1"
        "New Column Header2" = $row."Original Column Header2"
    }
}

Don't write headers and data as separate manual steps. Create an array of objects representing each row, and then pipe all of it Export-Csv at once, it'll automatically extract property names and create the header row for you. Show us the input CSV samples + desired output format if you want help with rewriting existing CSV data — Mathias R. Jessen
– Mathias R. Jessen, Commented May 22, 2017 at 13:13

mklement0 · Accepted Answer · 2022-02-16 03:49:45Z

To complement Martin Brandl's helpful answer with an explanation of your symptoms (emphasis added):

I tried using $output += $row again to add more rows, but it does not work at all and causes some strange errors to happen (if I write to a row, it writes to all rows, probably because it's all the same object).

Indeed, that is what happened: In .NET terms, type (class) [pscustomobject] is a reference type rather than a value type - as evidenced by [pscustomobject].IsValueType returning $false.

If you add a given instance (object) of a reference type to an array multiple times, all such elements point to the very same instance.

Here's brief demonstration.

$obj = [PSCustomObject] @{
    'Example Header 1' = $null
    'Example Header 2' = $null
}

$array = @()
foreach ($ndx in 1..2) {
  # By working with the original $obj every time, you
  # keep modifying the same instance's property values.
  $obj.'Example Header 1' = "h1-$ndx"
  $obj.'Example Header 2' = "h2-$ndx"
  # Adding $obj to an array does NOT create a COPY of $obj
  # but stores a REFERENCE directly to $obj in the array
  # (similar to storing a pointer in unmanaged languages such as C++).
  $array += $obj
}

# Output the array.
$array

This yields the following:

Example Header 1 Example Header 2
---------------- ----------------
h1-2             h2-2
h1-2             h2-2

As you can see, only the last values assigned to .Example Header 1 and .Example Header 2 took effect, because both array elements reference the very same object.

Martin's approach is the simplest way to solve this problem: create a new instance of the custom object in every iteration (casting a hashtable literal to [pscustomobject], as now shown in the question itself: $array += [pscustomobject] @{ ... }).

If you don't want to or cannot recreate your instances from scratch inside the loop, you have two basic choices:

Clone a template custom object in every loop iteration or simply use [pscustomobject] @{ ... } object creation inside the loop, which implicitly creates a new instance every time:
- See this answer
PSv5+ alternative: Define a custom class and instantiate it in every loop iteration - see below.

Class-based Solution

In PSv5+, a custom class allows for an elegant solution that also performs better than creating instances in-loop using literal syntax.

# Define a custom class that represents the rows of the
# output CSV.
# Note: [object] is being used here as the properties' type.
#       In real life, you'd use more specific types such as [string]
#       or [int].
class CsvRow {
  [object] ${Example Header 1}
  [object] ${Example Header 2}
}

$array = @()
foreach ($ndx in 1..2) {
  # Instantiate the custom class.
  $rowObj = [CsvRow]::new()
  # Set the values.
  $rowObj.'Example Header 1' = "h1-$ndx"
  $rowObj.'Example Header 2' = "h2-$ndx"
  # Add the instance to the array.
  $array += $rowObj
}

# Output the array.
$array

Performance Considerations

Two factors determine performance:

How quickly the array is extended in each loop iteration:
- Extending an array element by element with $array += ... is very convenient, but is slow and inefficient, because a new array must be created every time (arrays are fixed-size collections and cannot be directly extended).
- For small iteration counts that may not matter, but the higher the number, the more performance will suffer, and at some point this approach becomes infeasible.
- The next best solution is to use a [System.Collections.Generic.List[object]] instance to build the array instead - such lists are designed to be efficiently extended.
- The best and simplest solution, however, is to simply let PowerShell collect multiple outputs from a loop-like statement in an array, simply by assigning to a variable - see below.
How quickly the new object is instantiated in each loop iteration:
- Instantiating an instance of the custom class is faster than creating an instance via the hashtable literal, but only if [CsvRow]::new() is used for instantiation; the functionally equivalent New-Object CsvRow is much slower, due to involving a cmdlet call.

The following variant of the custom-class solution uses implicit array creation to ensure acceptable performance even with higher iteration counts:

# Define the custom class.
class CsvRow {
  [object] ${Example Header 1}
  [object] ${Example Header 2}
}

# Determine the iteration count.
$count = 1000

# Loop and let PowerShell collect the outputs
# from all iterations implicitly in variable $array
[array] $array = foreach ($ndx in 1..$count) {
  # Instantiate the custom class.
  $rowObj = [CsvRow]::new()
  # Set the values.
  $rowObj.'Example Header 1' = "h1-$ndx"
  $rowObj.'Example Header 2' = "h2-$ndx"
  # Simply output the row object
  $rowObj
}

# Output the array.
$array

Note: The [array] type constraint is only needed if you need to ensure that $ToWrite is always an array; without it, if there happened to be just a single loop iteration and therefore output object, $ToWrite would store that output object as-is, not wrapped in an array (this behavior is fundamental to PowerShell's pipeline).

It's interesting to see a use case for a class here. Why is the class more performant / preferrable than a PSCustomObject, e.g., $array = foreach ($ndx in 1...$count) { $rowObj = [PSCustomObject]{ example header 1 = "h1-$ndx"; example header 2 = "h2-$ndx"}; $rowObj } ? Or is it that better asked in a new question? Edit: Measure-Command gives me 16 ms for PSCustomObject and 10ms for the class using a count of 1000. It would be interesting to know more about this, or if there's a link I'll happily read it.
@Blaisem, in my benchmarks a custom class with per-property initialization is much faster than [pscustomobject], and even using hashtable-based initialization with the custom class is faster, though not by as much. It's hard to explain it briefly, but [pscustomobject]s basically simulate arbitrary properties at runtime, which makes them both slow and more memory-hungry. If you have further questions, yes, please ask a new question.

Martin Brandl · Accepted Answer · 2017-05-22 13:17:22Z

8

As Mathias mentioned, you shouldn't create the CSV first containg only the headers. Instead, populate your CSV with the actual rows you want and export it:

[PSCustomObject]@{
    'Example Header 1' = "a"
    'Example Header 2' = "b"
    'Example Header 3' = "c"
}, [PSCustomObject]@{
    'Example Header 1' = "a2"
    'Example Header 2' = "b2"
    'Example Header 3' = "c2"
}, [PSCustomObject]@{
    'Example Header 1' = "a3"
    'Example Header 2' = "b4"
    'Example Header 3' = "c5"
} | Export-Csv new.csv -NoTypeInformation

Output:

"Example Header 1","Example Header 2","Example Header 3"
"a","b","c"
"a2","b2","c2"
"a3","b4","c5"

answered May 22, 2017 at 13:17

Martin Brandl

59.4k16 gold badges151 silver badges190 bronze badges

Collectives™ on Stack Overflow

Create/populate a csv file with Powershell

2 Answers 2

Class-based Solution

Performance Considerations

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Class-based Solution

Performance Considerations

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related