英文:
Is this a good "pattern" for processing collection-based parameters which belong to parameter sets of a function?
问题
我已经写了很多年的高级函数,甚至已经写了不少模块。但有一个问题,我始终找不到答案。
让我们以 Microsoft 在 MSMQ 模块中提供的 Cmdlet 为例,并将其“重新实现”为高级 PowerShell 函数:Send-MsmqQueue
。但这个函数将与 MSMQ 模块提供的函数有点不同,因为它不仅接受 $InputObject
参数的多个 MSMQ 队列,还接受 $Name
参数的多个 MSMQ 队列名称,这两个参数属于不同的参数集。 (此函数的 Cmdlet 版本通常只接受 $Name
参数的单个字符串值。)我不会展示一个“完整”的重新实现,只是足够说明在这种情况下我自己常常会怎么做。 (注意:另一个细微的差异是,我将使用System.Messaging
命名空间中的类,而不是Microsoft.Msmq.PowerShell.Commands
命名空间中 PowerShell 提供的类。因此,请隐式地假定Add-Type -AssemblyName System.Messaging
在某个地方已被执行。)
function Send-MsmqQueue {
[CmdletBinding(DefaultParameterSetName = 'Name')]
[OutputType([Messaging.Message])]
Param (
[Parameter(
Mandatory,
ValueFromPipeline,
ParameterSetName = 'InputObject')
]
[Messaging.MessageQueue[]] $InputObject,
[Parameter(
Mandatory,
ValueFromPipeline,
ParameterSetName = 'Name')
]
[string[]] $Name,
# 下面是原始参数名,不是我起的 ;)
[Messaging.Message] $MessageObject
# 所有其他正常的 Send-MsmqQueue 参数都被省略了,因为它们不需要来说明我的问题。
)
Process {
# 当我有上述定义的参数时,在我的 Process 块中的第一件事就是“使数据同质化”,
# 这样我就不必在每个 foreach 循环中实现两次循环,也不必在每次循环迭代中进行分支,
# 这可能会掩盖正在执行的主要逻辑,也就是说,我会在一开始就完成这个任务。
#
# 我的一个问题是,从纯粹的 PowerShell 角度来看,这是否会对性能造成任何有意义的影响?
# (我知道,当涉及到具体的实现细节时,有无限多种编写性能低下的代码的方法,所以从纯粹的 PowerShell 角度来看,
# 就语言设计/内部工作而言,这是否会影响性能?
#
# 注意:通常情况下,我不需要这种包装“强制将其转换为数组”的构造(,<array_items>),
# 但在这种情况下,C# System.Messaging.MessageQueue 类实现了 IEnumerable,
# PowerShell(没有帮助地)会自动进行迭代,导致队列中的消息被迭代,而不是队列本身,所以这是特定于此特定函数的实现细节。
$Queues = (,@(
if ($PSCmdlet.ParameterSetName -ieq 'Name') {
# 处理当参数未通过管道传递时...
foreach ($n in $Name) { [Messaging.MessageQueue]::new($n) }
} else {
$InputObject
}
))
# 我喜欢使用 'foreach (...) { ... }' 而不是 ForEach-Object,因为经常需要根据实现细节进行中断或继续,
# 使用 ForEach-Object 结合 break/continue 会导致管道提前退出。
foreach ($q in $Queues) {
$q.Send($MessageObject)
# 通常情况下,我不会返回这个值,特别是因为它没有被修改,但这是对 MSFT 的 Send-MsmqQueue 的重新实现,
# 它返回了已发送的消息。
$MessageObject
}
}
}
正如我在这个问题的开头所说,我已经写了很多函数,它们接受不同参数集的各种集合参数,这些参数可以被传递到函数中,这是我使用的模式。我希望有人可以确认,从 PowerShell 语言/风格的角度来看,这是可以接受的,或者帮助我理解为什么不应该这样做以及我应该考虑什么。
谢谢!
英文:
I've been writing advanced functions for many years now and have even written quite a few modules at this point. But there's one question for which I have never really been able to find an answer.
Let's look at a Cmdlet that Microsoft provides in the MSMQ module, as an example, and "re-implement" it as an advanced PowerShell function: Send-MsmqQueue
. But this function will be a bit different than the one provided by the MSMQ module in that not only will it accept multiple MSMQ queues for the $InputObject
parameter, but also multiple MSMQ queue names for the $Name
parameter, where these two parameters belong to different parameter sets. (The Cmdlet version of this function normally only accepts a single string value for the $Name
parameter.) I won't be showing a complete re-implementation, just enough to illustrate what I, at times, find myself doing when this situation arises. (NOTE: one other slight difference is that I will be using the classes from System.Messaging
namespace instead of the PowerShell-provided ones in Microsoft.Msmq.PowerShell.Commands
namespace. So assume that implicitly, somewhere, Add-Type -AssemblyName System.Messaging
has been executed.)
function Send-MsmqQueue {
[CmdletBinding(DefaultParameterSetName = 'Name')]
[OutputType([Messaging.Message])]
Param (
[Parameter(
Mandatory,
ValueFromPipeline,
ParameterSetName = 'InputObject')
]
[Messaging.MessageQueue[]] $InputObject,
[Parameter(
Mandatory,
ValueFromPipeline,
ParameterSetName = 'Name')
]
[string[]] $Name,
# Below is the original parameter name, not mine ;)
[Messaging.Message] $MessageObject
# All other normal Send-MsmqQueue parameters elided as they are not
# needed to illustrate the premise of my question.
)
Process {
# When I have parameters defined as above, the first thing I do in my
# Process block is "homogenize" the data so I don't have to implement
# two foreach loops or do the branching on each foreach loop iteration
# which can obscure the main logic that is being executed, i.e., I get
# this done all "up-front".
#
# One aspect of my question is, from purely a PowerShell perspective,
# is this hurting performance in any meaningful way? (I know that when it
# comes to specific implementation details, there are INFINITE ways to
# write non-performant code, so from purely a PowerShell perspective,
# as far as the language design/inner-workings, is this hurting
# performance?
#
# NOTE: I don't normally need the wrapping "force this thing to be an
# array" construct (,<array_items>), BUT, in this case, the C#
# System.Messaging.MessageQueue class implements IEnumerable,
# which PowerShell (unhelpfully) iterates over automatically, and results
# in the messages in the queues being iterated over instead of the queues
# themselves, so this is an implementation detail specific to this
# particular function.
$Queues = (,@(
if ($PSCmdlet.ParameterSetName -ieq 'Name') {
# Handle when the parameter is NOT passed by the pipeline...
foreach ($n in $Name) { [Messaging.MessageQueue]::new($n) }
} else {
$InputObject
}
))
# I like using 'foreach (...) { ... }' instead of ForEach-Object because
# oftentimes, I will need to break or continue based on implementation
# details, and using ForEach-Object in combination with break/continue
# causes the pipeline to prematurely exit.
foreach ($q in $Queues) {
$q.Send($MessageObject)
# Normally, I wouldn't return this, especially since it wasn't
# modified, but this is a re-implementation of MSFT's Send-MsmqQueue,
# and it returns the sent message.
$MessageObject
}
}
}
As I stated in the introduction to this question, I have written many functions which take varying collection-based parameters belonging to different parameter sets which can be piped into the function, and this is the pattern that I use. I'm hoping someone can either confirm that this is OK from a PowerShell language/style perspective and/or help me understand why I should not do this and what I ought to consider instead.
Thank you!
答案1
得分: 2
以下是翻译好的内容:
<!-- language-all: sh -->
关于性能的一个基本决策是是否要**优化参数传递与管道输入**:
* 将参数声明为数组(例如`[string[]] $Name`)允许通过参数(参数值)有效传递**多个**输入对象。
* 但是,这样做会**损害管道性能**,因为每个管道输入对象都会创建一个单一元素数组,如下面的示例所示:它为通过管道传递的数组的标量字符串元素的**每个元素**输出`String[]`:
'one', 'two' |
& {
param(
[Parameter(Mandatory, ValueFromPipeline)]
[string[]] $Name
)
process {
$Name.GetType().Name # -> 'String[]' *每个*输入字符串
}
}
* **注意**:为简洁起见,本答案中的所有示例都使用了[脚本块](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Script_Blocks)而不是`function`定义。也就是说,函数声明(`function foo { ... }`)后跟其调用(`... | foo`)缩短为功能上等效的`... | & { ... }`
有关相关讨论,请参见[GitHub问题#4242](https://github.com/PowerShell/PowerShell/issues/4242)。
---
对于**"同质化"不同类型的参数值**,以便只需要**一个**处理循环,有两种基本的优化方式:
* **只声明一个参数**,依赖PowerShell**自动**将其他类型的值转换为该参数的类型,或者实现自动应用的**自定义转换**,从而消除了"同质化"的需要:
* 如果参数类型具有接受其他类型的实例作为其(唯一)参数的公共单参数构造函数,或者(如果另一种类型是`[string]`)如果该类型具有带有单一`[string]`参数的静态`::Parse()`方法,则**转换是自动的**;例如:
# 带有接受[int]值的公共单参数构造函数的示例类。
class Foo {
[int] $n
Foo([int] $val) {
$this.n = $val
}
}
# [int]值(无论是通过管道提供还是作为参数提供的)
# 自动转换为[Foo]实例
42, 43 | & {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[Foo[]] $Foo
)
process {
$Foo # 诊断输出。
}
}
* 在您的情况下,`[Messaging.MessageQueue]`确实具有接受字符串的公共单参数构造函数(如您的`[Messaging.MessageQueue]::new($n)`调用所示),因此您可以简单地**省略**`$Name`参数声明,依赖于将`[string]`输入的自动转换。
* **一般警告**:
* 这种自动转换 - 也发生在**强制转换**(例如,`[Foo[]] (0x2a, 43)`,见下文)和(很少使用的)[内置`.ForEach()`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Arrays#foreach)的类型转换形式(例如,`(0x2a, 43).ForEach([Foo])`)中 - 相对于匹配构造函数的参数类型,它更**严格**。
* 我对具体规则不清楚,但是使用`[double]`值,例如,通过`[Foo]::new(42.1)`可以成功(也就是说,会自动执行到`[int]`的转换),但是使用`[Foo] 42.1`和`(42.1).ForEach([Foo])`都会**失败**(后者当前会生成模糊的错误消息)。
* 如果**自动转换不起作用**,请通过在参数上装饰一个自定义属性,该属性从抽象的[`ArgumentTransformationAttribute`](https://docs.microsoft.com/en-US/dotnet/api/System.Management.Automation.ArgumentTransformationAttribute)类派生来**实现自定义转换**,然后PowerShell会自动应用它;例如:
using namespace System.Management.Automation
# 带有接受[int]值的公共单参数构造函数的示例类。
class Foo {
[int] $n
Foo([int] $val) {
$this.n = $val
}
}
# 一个示例的参数转换属性类,将可以解释为[int]的字符串转换为[Foo]实例。
class CustomTransformationAttribute : ArgumentTransformationAttribute {
[object] Transform([EngineIntrinsics] $engineIntrinsics, [object] $inputData) {
# 注意:如果输入作为*数组参数*传递,$inputData是一个数组。
return $(foreach ($o in $inputData) {
if ($null -ne ($int = $o -as [int])) { [Foo]::new($int) }
else { $o }
})
}
}
# [string]值(无论是通过管道提供还是作为参数提供的)
# 可以自动转换为[Foo]实例,
# 依赖于自定义[ArgumentTransformationAttribute]派生属性。
'0x2a', '43' | & {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[CustomTransformation()] # 这实现了自定义转换。
[Foo[]] $Foo
)
process {
$Foo # 诊断输出。
}
}
* 如果**确实需要*分开*的参数,请优化转换过程**:
* 上述自动类型转换规则也
<details>
<summary>英文:</summary>
<!-- language-all: sh -->
A fundamental performance decision is whether you want to **optimize for _argument-passing_ vs. _pipeline input_**:
* Declaring your parameters _as arrays_ (e.g. `[string[]] $Name`) allows efficient passing of _multiple_ input objects by _argument_ (parameter value).
* However, doing so _hurts pipeline performance_, because a single-element array is then created for each every pipeline input object, as the following example demonstrates: It outputs `String[]` for _each_ of the scalar string elements of the array passed via the pipeline:
'one', 'two' |
& {
param(
[Parameter(Mandatory, ValueFromPipeline)]
[string[]] $Name
)
process {
$Name.GetType().Name # -> 'String[]' *for each* input string
}
}
* **Note**: For brevity, the example above as well all others in this answer use a [script block](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Script_Blocks) in lieu of a `function` definition. That is, a function declaration (`function foo { ... }`) followed by its invocation (`... | foo`) is shortened to the functionally equivalent `... | & { ... }`
See [GitHub issue #4242](https://github.com/PowerShell/PowerShell/issues/4242) for a related discussion.
---
With _array_ parameters, you indeed need to ensure element-by-element processing yourself, notably inside the `process` block if they're also _pipeline-binding_.
As for **"homogenizing" parameter values of different types** so that only _one_ processing loop is required, two fundamental optimizations are possible:
* **Declare only a _single_ parameter** and rely either on PowerShell to _automatically_ convert values of other types to that parameter's type, or implement an automatically applied _custom conversion_, which obviates the need for "homogenizing" altogether:
* The **conversion is _automatic_** if the parameter type has a public, single-parameter constructor that accepts an instance of the other type as its (only) argument or - in case the other type is `[string]`, if the type has a static `::Parse()` method with a single `[string]` parameter; e.g.:
# Sample class with a single-parameter
# public constructor that accepts [int] values.
class Foo {
[int] $n
Foo([int] $val) {
$this.n = $val
}
}
# [int] values (whether provided via the pipeline or as an argument)
# auto-convert to [Foo] instances
42, 43 | & {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[Foo[]] $Foo
)
process {
$Foo # Diagnostic output.
}
}
* In your case, `[Messaging.MessageQueue]` _does_ have a public single-parameter constructor that accepts a string (as evidenced by your `[Messaging.MessageQueue]::new($n)` call), so you could simply _omit_ the `$Name` parameter declaration, and rely on the automatic conversion of `[string]` inputs.
* A _general caveat_:
* This automatic conversion - which also happens with _casts_ (e.g, `[Foo[]] (0x2a, 43)`, see below) and the (rarely used) type-conversion form of the [intrinsic `.ForEach()`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Arrays#foreach) (e.g., `(0x2a, 43).ForEach([Foo])`) - is _stricter_ than calling a single-element constructor with respect to matching the constructor's parameter type.
* I'm unclear on the exact rules, but using a `[double]` value, for instance, succeeds with `[Foo]::new(42.1)` (that is, conversion to `[int]` is automatically performed), but *fails* with both `[Foo] 42.1` and `(42.1).ForEach([Foo])` (the latter currently produces an obscure error message).
* If the conversion _isn't_ automatic, **implement a _custom_ conversion** that PowerShell then applies automatically, by way of decorating your parameter with a custom attribute that derives from the abstract [`ArgumentTransformationAttribute`](https://docs.microsoft.com/en-US/dotnet/api/System.Management.Automation.ArgumentTransformationAttribute) class; e.g.:
using namespace System.Management.Automation
# Sample class with a single-parameter
# public constructor that accepts [int] values.
class Foo {
[int] $n
Foo([int] $val) {
$this.n = $val
}
}
# A sample argument-conversion (transformation) attribute class that
# converts strings that can be interpreted as [int] to [Foo] instances.
class CustomTransformationAttribute : ArgumentTransformationAttribute {
[object] Transform([EngineIntrinsics] $engineIntrinsics, [object] $inputData) {
# Note: If the inputs were passed as an *array argument*, $inputData is an array.
return $(foreach ($o in $inputData) {
if ($null -ne ($int = $o -as [int])) { [Foo]::new($int) }
else { $o }
})
}
}
# [string] values (whether provided via the pipeline or as an argument)
# that can be interpreted as [int] now auto-convert to [Foo] instances,
# thanks to the custom [ArgumentTransformationAttribute]-derived attribute.
'0x2a', '43' | & {
[CmdletBinding()]
param(
[Parameter(ValueFromPipeline)]
[CustomTransformation()] # This implements the custom transformation.
[Foo[]] $Foo
)
process {
$Foo # Diagnostic output.
}
}
* If you *do* want ***separate* parameters, optimize the conversion process**:
* The auto type-conversion rules described above also apply to _explicit casts_ (including support for _arrays_ of values), so you can simplify your code as follows:
if ($PSCmdlet.ParameterSetName -eq 'Name') {
# Simply use an array cast.
$Queues = [Messaging.MessageQueue[]] $Name
} else {
$Queues = $InputObject
}
* In cases where element-by-element construction to effect conversion is required:
if ($PSCmdlet.ParameterSetName -eq 'Name') {
# Note the ","
$Queues = foreach ($n in $Name) { , [Messaging.MessageQueue]::new($n) }
} else {
$Queues = $InputObject
}
* Note the use of the unary form of `,` the [array constructor ("comma") operator](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Operators#comma-operator-), as in your attempt, albeit:
* _inside_ the `foreach` loop, and
* _without_ `@(...)` enclosure of the object to wrap in a single-element array, as `@(...)` _itself_ would trigger enumeration.
* While `Write-Output -NoEnumerate ([Messaging.MessageQueue]::new($n))`, as shown in Mathias' answer works too, it is _slower_. It comes down to a tradeoff between performance / concision vs. readability / signaling the intent explicitly.
* The need to wrap _each_ [`[System.Messaging.MessageQueue]`](https://learn.microsoft.com/en-us/dotnet/api/system.messaging.messagequeue) instance in an aux. single-element wrapper with unary `,` / to use `Write-Output -NoEnumerate` stems from the fact that this type implements the [`System.Collections.IEnumerable`](https://learn.microsoft.com/en-US/dotnet/api/System.Collections.IEnumerable) interface, which means that PowerShell automatically _enumerates_ instances of the type by default.<sup>[1]</sup> Applying either technique ensures that the `[System.Messaging.MessageQueue]` is output _as a whole_ to the pipeline (for details, see [this answer](https://stackoverflow.com/a/48360724/45375)).
* Note that this is _not_ necessary in the first snippet, because `$Queues = [Messaging.MessageQueue[]] $Name` is an _expression_, to which automatic enumeration does _not_ apply.
* The above also implies that you need the same technique if you want to pass a _single_ `[System.Messaging.MessageQueue]` instance or a *single-element* array containing such an instance _via the pipeline_; e.g.:
# !! Without `,` this command would *break*, because
# !! PowerShell would try to enumerate the elements of the queue
# !! which fails with an empty one.
, [System.Messaging.MessageQueue]::new('foo') | Get-Member
* By *not* using an `if` statement as a single *assignment expression* (`$Queue = if ...`) and instead assigning to `$Queue` in the _branches_ of the `if` statement, you additionally prevent subjecting `$InputObject` to unnecessary enumeration.
---
<sup>[1] There are some exceptions, notably strings and dictionaries. See the bottom section of [this answer](https://stackoverflow.com/a/65530467/45375) for details.</sup>
</details>
# 答案2
**得分**: 1
这种模式(根据选择的参数集“同质化”输入实体)是完全有效的,并且在我个人看来至少构成了良好的参数设计。
话虽如此,你可能希望使用 `Write-Output -NoEnumerate` 来避免笨拙的 `,@(...)` 解包封包数组的技巧:
```powershell
if ($PSCmdlet.ParameterSetName -ieq 'Name') {
# 当参数未通过管道传递时处理...
$Queues = foreach ($n in $Name) {
$queue = [Messaging.MessageQueue]::new($n)
Write-Output $queue -NoEnumerate
}
}
else {
# 输入已经是 [MessageQueue[]],完全避免管道边界
$Queues = $InputObject
}
英文:
This pattern ("homogenizing" the input entities based on chosen parameter set) is perfectly valid, and constitutes - in my personal opinion at least - good parameter design.
That being said, you might want to use Write-Output -NoEnumerate
to avoid the clunky ,@(...)
unwrapped-wrapped-array unpacking trick:
if ($PSCmdlet.ParameterSetName -ieq 'Name') {
# Handle when the parameter is NOT passed by the pipeline...
$Queues = foreach ($n in $Name) {
$queue = [Messaging.MessageQueue]::new($n)
Write-Output $queue -NoEnumerate
}
}
else {
# Input is already [MessageQueue[]], avoid pipeline boundaries entirely
$Queues = $InputObject
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论