I'm a PowerShell and XPath beginner struggling to efficiently parse through some XML and build up an array of objects for further processing (e.g. CSV output, SQL Server load). A sample of the XML is included below as well as the code snippet that I'm currently using. In this schema, each object-array represents a single row in the desired output. I'm parsing the MetaData children to get the proper names of the columns, then building a collection of PSObjects where each object in the array represents a single row. The MetaData information is used to find the column names (PSObject properties).
This works fine for files with 10K rows or so, but bogs down horribly when run against my largest files with over 500K rows. In these cases each row is taking around 3-4 seconds to process. At 500K rows, that's a looong time to run. Is there some magic around XPath or PS variable assignment that I can use to speed this up?
The immediate need to is translate this XML into a CSV (currently performed via export-csv), but I'd prefer to have this portion of the script generate a collection of objects as I'll next be looking to either load this data into a SQL Server instance or do other processing.
Thanks for the help!
David
Sample XML
<Report>
<Data>
<Columns>
<MetaData>
<Index>0</Index>
<Name>Column1</Name>
<Index>1</Index>
<Name>Column2</Name>
<Index>2</Index>
<Name>Column3</Name>
</MetaData>
</Columns>
<Rows>
<object-array>
<string>column1 value</string>
<int>column2 value</string>
<string>column3 value</string>
</object-array>
</Rows>
</Data>
</Report>
Sample Code
#extract the column headers
[string[]]$ColumnHeaders = @()
$obj.SelectNodes("/Report/Data/Columns/MetaData") |% {$ColumnHeaders += $_.name}
$collection = @()
$rowint = 0
$rowcount = $obj.Report.Data.Rows."object-array".count
#unwind the rows
do {
$hash=@{}
#loop through each element in the row parent element and add it to the hash
$columnint = 0
$columncount = (Select-Xml -xPath "Report/Data/Rows/object-array[$rowint]/node()" $obj).count
do {
$hash.Add($columnheaders[$columnint], (Select-Xml -xPath "Report/Data/Rows/object-array[$rowint]/descendant::text()[$columnint]" $obj).Node.Value)
$columnint++
} while ($columnint -lt $columncount)
$thisrow = New-Object PSObject -Property $hash
#add this new row to the collection
$collection += $thisrow
$rowint++
} while ($rowint -lt $rowcount)