3

I'm exporting about 1,000,000 rows via an Invoke-Sqlcmd script and it exports into a csv around 180MB.

What I would like to do is be able to export it instead into 10 100,000 row files. How do I go about doing this as it doesn't seem to be a feature in Export-Csv or Invoke-Sqlcmd that I can find.

Current code:

$dataSubset = "mydata"
$localFile = ("{0}.csv" -f $dataSubset)
$localPath = "my\path"
$serverInstance = "myserver.domain.com"
$database = "mydatabase"

$Query = @"
my crazy ass query
"@
$Results = Invoke-Sqlcmd -QueryTimeout 0 -ServerInstance $ServerInstance -Database $database -Query $Query
$Results | Export-csv $localPath/$localFile -NoTypeInformation

Instead of having a single mydata.csv I want to have mydata_1.csv, mydata_2.csv, etc.

3 Answers 3

2

Use the Select-Object cmdlet (whose built-in alias is select), it has -Skip and -First parameters:

for ($i=0; $i -lt 10; $i++) {
    $localFile = ("{0}_{1}.csv" -f $dataSubset, $i)
    $Results | Select -Skip ($i * 100000) -First 100000 | Export-Csv $localPath/$localFile -NoTypeInformation
}
Sign up to request clarification or add additional context in comments.

1 Comment

Good pointers, though you probably don't want to send the entire 1-million objects array through the pipeline for each and every batch; ideally, you'd avoid collecting all 1 million objects in memory to begin with.
2

Unfortunately, as of PowerShell 7.0, there is no batching (chunking) mechanism in PowerShell.

  • This GitHub issue suggests adding a -ReadCount parameter to Select-Object to enable batching.

For now, you'll have to implement your own:

# ... 

$batchSize = 10000
$fileNdx = 0

# The list to hold a batch.
$batch = [Collections.Generic.List[object]]::new($batchSize)

# The script block for exporting a single batch to a CSV.
# Exports to .../mydata_1.csv, .../mydata_2.csv, ... 
$sb = { 
        ++$fileNdx; 
        $batch | Export-Csv -NoTypeInformation "$localPath/mydata_$fileNdx.csv" 
      }

Invoke-Sqlcmd -QueryTimeout 0 -ServerInstance $ServerInstance -Database $database -Query $Query |
  ForEach-Object -Process {
    $batch.Add($_) # add object to current batch
    if ($batch.Count -eq $batchSize) { # batch is complete
      . $sb # export
      $batch.Clear() # start new batch
    }
  } -End { 
    if ($batch.Count) { # final, incomplete batch
      . $sb # export
    }
  }

Note that you should generally avoid $Results = Invoke-Sqlcmd ... with large queries, because it invariably collects the entire result set in memory as a whole - instead, use the pipeline, with its object-by-object processing, as shown above.
That said, if you have enough memory, capturing all input at once can situationally improve performance.

Comments

0

I dont think that powershell is the way to do this. Cant you export directy from the SQL Server via cmd integration?

Anyway, concerning your question, another Alterantive is:

$length = $Results.length


$items = 10;
$stepwitch = [Math]::Ceiling($length / $items)

for($i=1;$i -le $items; $i++) {
$firstItem = ($i - 1) * $stepwitch 
$lastItem = ($i * $stepwitch) - 1
if ($lastItem -gt $length) {$lastItem = $length}
echo ("FI: " + $firstItem + " LI:"  + $lastItem)
$localFile = ("{($i)}.csv" -f $dataSubset)
$Results[$firstItem..$lastItem] | Export-csv $localPath/$localFile -NoTypeInformation
}

(not tested, but at the end similiar to Mark Arends solution)

Another alterantive would maybe to use the .net method system.array.copy https://learn.microsoft.com/de-de/dotnet/api/system.array.copy?view=netcore-3.1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.