6

I am trying to gather data from several servers using ForEach-Object -Parallel. The variable I use is being populated within the loop, but when the loop finishes the variable is empty.

$DBDetails = "SELECT @@VERSION"

$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = @()

$scriptBlock = {
    $vm = $_
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    $DBInventory += $result
    Write-Host "Added $($result.Count) rows from $($vm)"
}

$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"

I expect the last line to return the number of elements gathered within the loop that executed on the previous line. There should be a total of 7 elements, but I am left with none.

My result looks like this:

Added 1 rows from vm1
Added 1 rows from vm2
Added 1 rows from vm3
Added 1 rows from vm4
Added 1 rows from vm5
Added 1 rows from vm6
Added 1 rows from vm7
Number of elements in DBInventory: 0

2 Answers 2

3

ForEach-Object -Parallel causes execution of the loop body in a separate runspace, meaning you don't have direct access to the variables defined in the calling scope.

To work around this, make two changes to your code:

  • Use a collection type other than a resizable array (below I've use a generic [List[psobject]])
  • Reference the variable from the caller's scope with the using: scope modifier and assign to a local inside the block

The resulting local variable will then reference the same list-object in memory, and changes made to that list via its methods (Add(), Remove(), AddRange(), etc.) will be reflected anywhere else its referenced (including the original $DBInventory variable from your calling scope).

$DBDetails = "SELECT @@VERSION"

$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = [System.Collections.Generic.List[psobject]]::new()

$scriptBlock = {
    $vm = $_
    $inventory = $using:DBInventory
    
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    $inventory.AddRange([psobject[]]$result)
    Write-Host "Added $($result.Count) rows from $($vm)"
}

$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"

As mklement0 notes, [List[psobject]] is not thread-safe - for production code you'll definitely want to pick a collection type that is, like for example a [System.Collections.Concurrent.ConcurrenBag[psobject]] - essentially an unordered list:

$DBInventory = [System.Collections.Concurrent.ConcurrentBag[psobject]]::new()

Beware that the ConcurrentBag type, as the name might suggest, does not preserve insertion order. If this is a problem, you may want to consider using a [ConcurrentDictionary[string,psobject[]]] - this way you can tie the query output back to the orignal input string:

$DBInventory = [System.Collections.Concurrent.ConcurrentDictionary[string,psobject[]]]::new()

Since another thread may (hypothetically) have added an entry for the same key since you dispatched your call to Add(), the ConcurrentDictionary type requires us to use it slightly differently than a regular dictionary or hashtable:

$scriptBlock = {
    $vm = $_
    $inventory = $using:DBInventory
    
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    $adder = $updater = { return Write-Output $result -NoEnumerate }
    $inventory.AddOrUpdate($vm, $adder, $updater)
    Write-Host "Added $($result.Count) rows from $($vm)"
}

Here, the concurrent dictionary will execute the $adder function on our behalf if the key doesn't already exist (otherwise it'll run the $updater), and the result will be assigned as the entry value.

You can subsequently access the entry values the same way you would a hashtable:

$DBInventory[$vms[-1]] # returns array containing the query results from the last VM in the list
Sign up to request clarification or add additional context in comments.

Comments

3

tl;dr

  • Use the $using: scope to refer to the value of variables defined in the caller's scope, as you're partially already doing.

  • You cannot directly modify variables in the caller's scope ($using:DBInventory += $result would not work), but you don't need to: let PowerShell collect the output objects in an array for you:

$DBDetails = "SELECT @@VERSION"

$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
 = @()

$scriptBlock = {
    $vm = $_
    $result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
    Write-Host "Outputting $($result.Count) rows from $($vm)"
    $result # Simply output the objects 
}

# Let PowerShell collect all output objects from the ForEach-Object -Parallel call 
# in an array.
# Note: The [array] type constraint ensures that $DBInventory is an array
#       even if there happens to be only *one* output object.
[array] $DBInventory = $VMs | ForEach-Object -Parallel $scriptBlock

Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"

$DBInventory will contain a regular PowerShell array ([object[]]).


Background information:

  • Your code already partially shows awareness that you need the $using: scope inside a script block that runs in a different runspace (such as those in the threads that ForEach-Object -Parallel creates) in order to refer to variable values from the caller's scope.

    • This therefore in principle applies to your caller-side $DBInventory variable as well, however:
      • A $using: reference is a reference to a variable value, not to a variable itself, so you cannot assign to $using: references.
      • That is, $using:DBInventory += $result would not work, leaving aside the general point that using += to "grow" arrays is best avoided due to its inefficiency - see this answer.
  • While you could initialize $DBInventory to an efficiently extensible list type, you'd have to ensure that it is grown in thread-safe manner, given that you're using thread-based parallelism via ForEach-Object -Parallel:

    • Notably, the commonly used list types [System.Collections.Generic.List[object]] and System.Collections.ArrayList are not thread-safe.

    • You'd either have to:

      • add manual synchronization code to your script block, using .NET APIs, which is nontrivial.
      • pick a different, concurrent (thread-safe) collection type from the System.Collections.Concurrent namespace (there is none built in for lists as of .NET 8; this answer offers a custom implementation)
      • use a thread-safe wrapper, e.g. $DBInventory = [System.Collections.ArrayList]::Synchronized([System.Collections.Generic.List[object]] @()), which returns a non-generic [System.Collections.IList] implementation.
        Note, however, that this is inefficient, due to the whole-list locking mechanism used - see Thread-safe collections; additionally, it doesn't expose an .AddRange() method for efficient appending of multiple elements, and its .Add() method returns a (usually unwanted) value, which you'll have to discard with $null = ($using:DBInventory).Add(...)
    • Note that the reason that growing lists via $using: does work - as opposed to via += - is that you're adding elements via methods (.Add(), .AddRange()) of the object that is the value of the variable being referenced with $using:. That is, you're directly modifying the variable value, not the variable itself (which isn't supported).

  • Fortunately, there's a simpler solution: Rely on PowerShell's ability to automatically collect all output objects emitted by a pipeline in an array, which is both more concise and more efficient than manual growth of a list, and also works with ForEach-Object -Parallel, as shown at the top - again, see this answer for background information.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.