英文:
Powershell: Why is my variable empty after using ForEach-Object -Parallel?
问题
我正在尝试使用ForEach-Object -Parallel 从多个服务器收集数据。我使用的变量在循环内部被填充,但是当循环完成时,变量为空。
$DBDetails = "SELECT @@VERSION"
$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = @()
$scriptBlock = {
$vm = $_
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
$DBInventory += $result
Write-Host "Added $($result.Count) rows from $($vm)"
}
$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"
我期望最后一行返回在前一行执行的循环中收集的元素数量。应该总共有7个元素,但我却一个都没有。
我的结果如下:
Added 1 rows from vm1
Added 1 rows from vm2
Added 1 rows from vm3
Added 1 rows from vm4
Added 1 rows from vm5
Added 1 rows from vm6
Added 1 rows from vm7
Number of elements in DBInventory: 0
英文:
I am trying to gather data from several servers using ForEach-Object -Parallel. The variable I use is being populated within the loop, but when the loop finishes the variable is empty.
$DBDetails = "SELECT @@VERSION"
$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = @()
$scriptBlock = {
$vm = $_
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
$DBInventory += $result
Write-Host "Added $($result.Count) rows from $($vm)"
}
$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"
I expect the last line to return the number of elements gathered within the loop that executed on the previous line. There should be a total of 7 elements, but I am left with none.
My result looks like this:
Added 1 rows from vm1
Added 1 rows from vm2
Added 1 rows from vm3
Added 1 rows from vm4
Added 1 rows from vm5
Added 1 rows from vm6
Added 1 rows from vm7
Number of elements in DBInventory: 0
答案1
得分: 3
ForEach-Object -Parallel
导致循环体在一个独立的运行空间中执行,这意味着你无法直接访问调用范围中定义的变量。
要解决这个问题,对你的代码进行两个更改:
- 使用一个非可调整大小的数组(下面我使用了通用的
[List[psobject]]
) - 使用
using:
作用域修饰符从调用者的范围引用变量并分配给块内的本地变量
然后,生成的本地变量将引用相同的内存中的列表对象,通过其方法(Add()
、Remove()
、AddRange()
等)对该列表进行的更改将在任何其他地方反映出来,包括你的调用范围中的原始 $DBInventory
变量。
$DBDetails = "SELECT @@VERSION"
$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = [System.Collections.Generic.List[psobject]]::new()
$scriptBlock = {
$vm = $_
$inventory = $using:DBInventory
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
$inventory.AddRange([psobject[]]$result)
Write-Host "Added $($result.Count) rows from $($vm)"
}
$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"
如 mklement0 指出,[List[psobject]]
是非线程安全的 - 对于生产代码,你绝对希望选择一个线程安全的集合类型,比如 [System.Collections.Concurrent.ConcurrentBag[psobject]]
- 本质上是一个无序列表:
$DBInventory = [System.Collections.Concurrent.ConcurrentBag[psobject]]::new()
请注意,ConcurrentBag
类型,正如其名称所示,不会保留插入顺序。如果这是一个问题,你可能想考虑使用 [ConcurrentDictionary[string,psobject[]]]
- 这样你可以将查询输出与原始输入字符串关联起来:
$DBInventory = [System.Collections.Concurrent.ConcurrentDictionary[string,psobject[]]]::new()
由于另一个线程(假设)可能在你调用 Add()
之后添加了相同键的条目,因此与常规字典或哈希表不同,ConcurrentDictionary
类型要求我们以稍微不同的方式使用它:
$scriptBlock = {
$vm = $_
$inventory = $using:DBInventory
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
$adder = $updater = { return Write-Output $result -NoEnumerate }
$inventory.AddOrUpdate($vm, $adder, $updater)
Write-Host "Added $($result.Count) rows from $($vm)"
}
在这里,如果键不存在,ConcurrentDictionary
将代表我们执行 $adder
函数(否则它将运行 $updater
),并将结果分配为条目值。
随后,你可以像操作哈希表一样访问条目值:
$DBInventory[$vms[-1]] # 返回包含最后一个VM查询结果的数组
英文:
ForEach-Object -Parallel
causes execution of the loop body in a separate runspace, meaning you don't have direct access to the variables defined in the calling scope.
To work around this, make two changes to your code:
- Use a collection type other than a resizable array (below I've use a generic
[List[psobject]]
) - Reference the variable from the caller's scope with the
using:
scope modifier and assign to a local inside the block
The resulting local variable will then reference the same list-object in memory, and changes made to that list via its methods (Add()
, Remove()
, AddRange()
, etc.) will be reflected anywhere else its referenced (including the original $DBInventory
variable from your calling scope).
$DBDetails = "SELECT @@VERSION"
$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = [System.Collections.Generic.List[psobject]]::new()
$scriptBlock = {
$vm = $_
$inventory = $using:DBInventory
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
$inventory.AddRange([psobject[]]$result)
Write-Host "Added $($result.Count) rows from $($vm)"
}
$VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"
As mklement0 notes, [List[psobject]]
is not thread-safe - for production code you'll definitely want to pick a collection type that is, like for example a [System.Collections.Concurrent.ConcurrenBag[psobject]]
- essentially an unordered list:
$DBInventory = [System.Collections.Concurrent.ConcurrentBag[psobject]]::new()
Beware that the ConcurrentBag
type, as the name might suggest, does not preserve insertion order. If this is a problem, you may want to consider using a [ConcurrentDictionary[string,psobject[]]]
- this way you can tie the query output back to the orignal input string:
$DBInventory = [System.Collections.Concurrent.ConcurrentDictionary[string,psobject[]]]::new()
Since another thread may (hypothetically) have added an entry for the same key since you dispatched your call to Add()
, the ConcurrentDictionary
type requires us to use it slightly differently than a regular dictionary or hashtable:
$scriptBlock = {
$vm = $_
$inventory = $using:DBInventory
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
$adder = $updater = { return Write-Output $result -NoEnumerate }
$inventory.AddOrUpdate($vm, $adder, $updater)
Write-Host "Added $($result.Count) rows from $($vm)"
}
Here, the concurrent dictionary will execute the $adder
function on our behalf if the key doesn't already exist (otherwise it'll run the $updater
), and the result will be assigned as the entry value.
You can subsequently access the entry values the same way you would a hashtable:
$DBInventory[$vms[-1]] # returns array containing the query results from the last VM in the list
答案2
得分: 2
tl;dr
-
使用**
$using:
范围来引用调用者作用域中定义的变量的值**,正如你已经部分地在做的那样。 -
你不能直接在调用者的作用域中修改变量(例如
$using:DBInventory += $result
不会起作用),但你不需要这样做:让PowerShell自动将输出对象收集到一个数组中:
$DBDetails = "SELECT @@VERSION"
$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
$DBInventory = @()
$scriptBlock = {
$vm = $_
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
Write-Host "Outputting $($result.Count) rows from $($vm)"
$result # 直接输出对象
}
# 让PowerShell将ForEach-Object -Parallel调用中的所有输出对象收集到一个数组中。
# 注意:[array]类型约束确保$DBInventory是一个数组,即使只有一个输出对象。
[array] $DBInventory = $VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "DBInventory中的元素数量:$($DBInventory.Count)"
$DBInventory
将包含一个普通的PowerShell数组([object[]]
)。
背景信息:
-
你的代码已经部分地表明了你需要在运行在不同运行空间的脚本块中使用
$using:
范围(例如ForEach-Object -Parallel
创建的线程),以便引用来自调用者作用域的变量值。- 因此,这原则上也适用于调用方的
$DBInventory
变量,但是:$using:
引用是对变量值的引用,而不是对变量本身的引用,所以你不能对$using:
引用进行赋值。- 也就是说,
$using:DBInventory += $result
不起作用,不考虑使用+=
来“增长”数组通常是不推荐的,因为它效率低下 - 请参阅这个答案。
- 因此,这原则上也适用于调用方的
-
虽然你可以将
$DBInventory
初始化为一个高效可扩展的列表类型,但你需要确保它以线程安全的方式增长,因为你正在使用ForEach-Object -Parallel
进行基于线程的并行处理:-
值得注意的是,通常使用的列表类型
[System.Collections.Generic.List[object]]
和System.Collections.ArrayList
都不是线程安全的。 -
你要么必须:
- 在脚本块中添加手动同步代码,使用.NET API,这是复杂的。
- 选择不同的、并发(线程安全)的列表类型(没有为通用列表内置的)。
- 使用线程安全的包装器,例如
$DBInventory = [System.Collections.ArrayList]::Synchronized([System.Collections.Generic.List[object]] @())
,它返回一个非通用[System.Collections.IList]
实现。
请注意,使用具有值类型元素的通用列表来增长列表可能效率低下,不提供用于高效附加多个元素的.AddRange()
方法,它的.Add()
方法返回一个(通常不需要的)值,你需要使用$null = ($using:DBInventory).Add(...)
来丢弃它。
-
请注意,通过
$using:
增加列表之所以有效 - 与通过+=
不同 - 是因为你通过$using:
引用的变量值的方法(.Add()
、.AddRange()
)来添加元素。也就是说,你直接修改了变量值,而不是变量本身(不支持)。
-
-
幸运的是,有一个更简单的解决方案:依赖于PowerShell自动收集管道中发出的所有输出对象到一个数组中,这比手动增长列表更简洁、更高效,也适用于
ForEach-Object -Parallel
,如上所示 - 再次参考这个答案了解背景信息。
英文:
tl;dr
-
Use the
$using:
scope to refer to the value of variables defined in the caller's scope, as you're partially already doing. -
You cannot directly modify variables in the caller's scope (
$using:DBInventory += $result
would not work), but you don't need to: let PowerShell collect the output objects in an array for you:
$DBDetails = "SELECT @@VERSION"
$VMs = ("vm1", "vm2", "vm3", "vm4", "vm5", "vm6", "vm7")
= @()
$scriptBlock = {
$vm = $_
$result = Invoke-Sqlcmd -ServerInstance $vm -Query $using:DBDetails
Write-Host "Outputting $($result.Count) rows from $($vm)"
$result # Simply output the objects
}
# Let PowerShell collect all output objects from the ForEach-Object -Parallel call
# in an array.
# Note: The [array] type constraint ensures that $DBInventory is an array
# even if there happens to be only *one* output object.
[array] $DBInventory = $VMs | ForEach-Object -Parallel $scriptBlock
Write-Host "Number of elements in DBInventory: $($DBInventory.Count)"
$DBInventory
will contain a regular PowerShell array ([object[]]
).
Background information:
-
Your code already partially shows awareness that you need the
$using:
scope inside a script block that runs in a different runspace (such as those in the threads thatForEach-Object -Parallel
creates) in order to refer to variable values from the caller's scope.- This therefore in principle applies to your caller-side
$DBInventory
variable as well, however:- A
$using:
reference is a reference to a variable value, not to a variable itself, so you cannot assign to$using:
references. - That is,
$using:DBInventory += $result
would not work, leaving aside the general point that using+=
to "grow" arrays is best avoided due to its inefficiency - see this answer.
- A
- This therefore in principle applies to your caller-side
-
While you could initialize
$DBInventory
to an efficiently extensible list type, you'd have to ensure that it is grown in thread-safe manner, given that you're using thread-based parallelism viaForEach-Object -Parallel
:-
Notably, the commonly used list types
[System.Collections.Generic.List[object]]
andSystem.Collections.ArrayList
are not thread-safe. -
You'd either have to:
- add manual synchronization code to your script block, using .NET APIs, which is nontrivial.
- pick a different, concurrent (thread-safe) list type (there is none built in for generic lists)
- use a thread-safe wrapper, e.g.
$DBInventory = [System.Collections.ArrayList]::Synchronized([System.Collections.Generic.List[object]] @())
, which returns a non-generic[System.Collections.IList]
implementation.
Note, however, that this be inefficient with generic lists with value-type elements, doesn't expose an.AddRange()
method for efficient appending of multiple elements, and its.Add()
method returns a (usually unwanted) value, which you'll have to discard with$null = ($using:DBInventory).Add(...)
-
Note that the reason that growing lists via
$using:
does work - as opposed to via+=
- is that you're adding elements via methods (.Add()
,.AddRange()
) of the object that is the value of the variable being referenced with$using:
. That is, you're directly modifying the variable value, not the variable itself (which isn't supported).
-
-
Fortunately, there's a simpler solution: Rely on PowerShell's ability to automatically collect all output objects emitted by a pipeline in an array, which is both more concise and more efficient than manual growth of a list, and also works with
ForEach-Object
-Parallel
, as shown at the top - again, see this answer for background information.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论