这是处理函数参数集中属于集合的参数的良好“模式”吗?

huangapple go评论63阅读模式
英文:

Is this a good "pattern" for processing collection-based parameters which belong to parameter sets of a function?

问题

我已经写了很多年的高级函数,甚至已经写了不少模块。但有一个问题,我始终找不到答案。

让我们以 Microsoft 在 MSMQ 模块中提供的 Cmdlet 为例,并将其“重新实现”为高级 PowerShell 函数:Send-MsmqQueue。但这个函数将与 MSMQ 模块提供的函数有点不同,因为它不仅接受 $InputObject 参数的多个 MSMQ 队列,还接受 $Name 参数的多个 MSMQ 队列名称,这两个参数属于不同的参数集。 (此函数的 Cmdlet 版本通常只接受 $Name 参数的单个字符串值。)我不会展示一个“完整”的重新实现,只是足够说明在这种情况下我自己常常会怎么做。 (注意:另一个细微的差异是,我将使用System.Messaging命名空间中的类,而不是Microsoft.Msmq.PowerShell.Commands命名空间中 PowerShell 提供的类。因此,请隐式地假定Add-Type -AssemblyName System.Messaging在某个地方已被执行。)

function Send-MsmqQueue {
    [CmdletBinding(DefaultParameterSetName = 'Name')]
    [OutputType([Messaging.Message])]
    Param (
        [Parameter(
            Mandatory,
            ValueFromPipeline,
            ParameterSetName = 'InputObject')
        ]
        [Messaging.MessageQueue[]] $InputObject,

        [Parameter(
            Mandatory,
            ValueFromPipeline,
            ParameterSetName = 'Name')
        ]
        [string[]] $Name,

        # 下面是原始参数名,不是我起的 ;)
        [Messaging.Message] $MessageObject

        # 所有其他正常的 Send-MsmqQueue 参数都被省略了,因为它们不需要来说明我的问题。
    )

    Process {
        # 当我有上述定义的参数时,在我的 Process 块中的第一件事就是“使数据同质化”,
        # 这样我就不必在每个 foreach 循环中实现两次循环,也不必在每次循环迭代中进行分支,
        # 这可能会掩盖正在执行的主要逻辑,也就是说,我会在一开始就完成这个任务。
        #
        # 我的一个问题是,从纯粹的 PowerShell 角度来看,这是否会对性能造成任何有意义的影响?
        # (我知道,当涉及到具体的实现细节时,有无限多种编写性能低下的代码的方法,所以从纯粹的 PowerShell 角度来看,
        # 就语言设计/内部工作而言,这是否会影响性能?
        #
        # 注意:通常情况下,我不需要这种包装“强制将其转换为数组”的构造(,<array_items>),
        # 但在这种情况下,C# System.Messaging.MessageQueue 类实现了 IEnumerable,
        # PowerShell(没有帮助地)会自动进行迭代,导致队列中的消息被迭代,而不是队列本身,所以这是特定于此特定函数的实现细节。
        $Queues = (,@(
            if ($PSCmdlet.ParameterSetName -ieq 'Name') {
                # 处理当参数未通过管道传递时...
                foreach ($n in $Name) { [Messaging.MessageQueue]::new($n) }
            } else {
                $InputObject
            }
        ))

        # 我喜欢使用 'foreach (...) { ... }' 而不是 ForEach-Object,因为经常需要根据实现细节进行中断或继续,
        # 使用 ForEach-Object 结合 break/continue 会导致管道提前退出。
        foreach ($q in $Queues) {
            $q.Send($MessageObject)
            # 通常情况下,我不会返回这个值,特别是因为它没有被修改,但这是对 MSFT 的 Send-MsmqQueue 的重新实现,
            # 它返回了已发送的消息。
            $MessageObject
        }
    }
}

正如我在这个问题的开头所说,我已经写了很多函数,它们接受不同参数集的各种集合参数,这些参数可以被传递到函数中,这是我使用的模式。我希望有人可以确认,从 PowerShell 语言/风格的角度来看,这是可以接受的,或者帮助我理解为什么不应该这样做以及我应该考虑什么。

谢谢!

英文:

I've been writing advanced functions for many years now and have even written quite a few modules at this point. But there's one question for which I have never really been able to find an answer.

Let's look at a Cmdlet that Microsoft provides in the MSMQ module, as an example, and "re-implement" it as an advanced PowerShell function: Send-MsmqQueue. But this function will be a bit different than the one provided by the MSMQ module in that not only will it accept multiple MSMQ queues for the $InputObject parameter, but also multiple MSMQ queue names for the $Name parameter, where these two parameters belong to different parameter sets. (The Cmdlet version of this function normally only accepts a single string value for the $Name parameter.) I won't be showing a complete re-implementation, just enough to illustrate what I, at times, find myself doing when this situation arises. (NOTE: one other slight difference is that I will be using the classes from System.Messaging namespace instead of the PowerShell-provided ones in Microsoft.Msmq.PowerShell.Commands namespace. So assume that implicitly, somewhere, Add-Type -AssemblyName System.Messaging has been executed.)

function Send-MsmqQueue {
    [CmdletBinding(DefaultParameterSetName = &#39;Name&#39;)]
    [OutputType([Messaging.Message])]
    Param (
        [Parameter(
            Mandatory,
            ValueFromPipeline,
            ParameterSetName = &#39;InputObject&#39;)
        ]
        [Messaging.MessageQueue[]] $InputObject,

        [Parameter(
            Mandatory,
            ValueFromPipeline,
            ParameterSetName = &#39;Name&#39;)
        ]
        [string[]] $Name,

        # Below is the original parameter name, not mine ;)
        [Messaging.Message] $MessageObject

        # All other normal Send-MsmqQueue parameters elided as they are not
        # needed to illustrate the premise of my question.
    )

    Process {
        # When I have parameters defined as above, the first thing I do in my
        # Process block is &quot;homogenize&quot; the data so I don&#39;t have to implement
        # two foreach loops or do the branching on each foreach loop iteration
        # which can obscure the main logic that is being executed, i.e., I get
        # this done all &quot;up-front&quot;.
        #
        # One aspect of my question is, from purely a PowerShell perspective,
        # is this hurting performance in any meaningful way? (I know that when it
        # comes to specific implementation details, there are INFINITE ways to
        # write non-performant code, so from purely a PowerShell perspective,
        # as far as the language design/inner-workings, is this hurting
        # performance?
        #
        # NOTE: I don&#39;t normally need the wrapping &quot;force this thing to be an
        # array&quot; construct (,&lt;array_items&gt;), BUT, in this case, the C#
        # System.Messaging.MessageQueue class implements IEnumerable,
        # which PowerShell (unhelpfully) iterates over automatically, and results
        # in the messages in the queues being iterated over instead of the queues
        # themselves, so this is an implementation detail specific to this
        # particular function.
        $Queues = (,@(
            if ($PSCmdlet.ParameterSetName -ieq &#39;Name&#39;) {
                # Handle when the parameter is NOT passed by the pipeline...
                foreach ($n in $Name) { [Messaging.MessageQueue]::new($n) }
            } else {
                $InputObject
            }
        ))

        # I like using &#39;foreach (...) { ... }&#39; instead of ForEach-Object because
        # oftentimes, I will need to break or continue based on implementation
        # details, and using ForEach-Object in combination with break/continue
        # causes the pipeline to prematurely exit.
        foreach ($q in $Queues) {
            $q.Send($MessageObject)
            # Normally, I wouldn&#39;t return this, especially since it wasn&#39;t
            # modified, but this is a re-implementation of MSFT&#39;s Send-MsmqQueue,
            # and it returns the sent message.
            $MessageObject
        }
    }
}

As I stated in the introduction to this question, I have written many functions which take varying collection-based parameters belonging to different parameter sets which can be piped into the function, and this is the pattern that I use. I'm hoping someone can either confirm that this is OK from a PowerShell language/style perspective and/or help me understand why I should not do this and what I ought to consider instead.

Thank you!

答案1

得分: 2

以下是翻译好的内容:

&lt;!-- language-all: sh --&gt;

关于性能的一个基本决策是是否要**优化参数传递与管道输入**:

* 将参数声明为数组(例如`[string[]] $Name`)允许通过参数(参数值)有效传递**多个**输入对象。

* 但是,这样做会**损害管道性能**,因为每个管道输入对象都会创建一个单一元素数组,如下面的示例所示:它为通过管道传递的数组的标量字符串元素的**每个元素**输出`String[]`
      &#39;one&#39;, &#39;two&#39; | 
        &amp; {
          param(
            [Parameter(Mandatory, ValueFromPipeline)]
            [string[]] $Name
          )
          process {
            $Name.GetType().Name # -&gt; &#39;String[]&#39; *每个*输入字符串
          }
        }

  * **注意**:为简洁起见,本答案中的所有示例都使用了[脚本块](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Script_Blocks)而不是`function`定义。也就是说,函数声明(`function foo { ... }`)后跟其调用(`... | foo`)缩短为功能上等效的`... | &amp; { ... }`

有关相关讨论,请参见[GitHub问题#4242](https://github.com/PowerShell/PowerShell/issues/4242)
---

对于**"同质化"不同类型的参数值**,以便只需要**一个**处理循环,有两种基本的优化方式:

* **只声明一个参数**,依赖PowerShell**自动**将其他类型的值转换为该参数的类型,或者实现自动应用的**自定义转换**,从而消除了"同质化"的需要:

  * 如果参数类型具有接受其他类型的实例作为其(唯一)参数的公共单参数构造函数,或者(如果另一种类型是`[string]`)如果该类型具有带有单一`[string]`参数的静态`::Parse()`方法,则**转换是自动的**;例如:

        # 带有接受[int]值的公共单参数构造函数的示例类。
        class Foo {
          [int] $n
          Foo([int] $val) {
            $this.n = $val
          }
        }

        # [int]值(无论是通过管道提供还是作为参数提供的)
        # 自动转换为[Foo]实例
        42, 43 | &amp; {
          [CmdletBinding()]
          param(
            [Parameter(ValueFromPipeline)]
            [Foo[]] $Foo
          )
          process {
            $Foo # 诊断输出。
          }
        }

    * 在您的情况下,`[Messaging.MessageQueue]`确实具有接受字符串的公共单参数构造函数(如您的`[Messaging.MessageQueue]::new($n)`调用所示),因此您可以简单地**省略**`$Name`参数声明,依赖于将`[string]`输入的自动转换。

    * **一般警告**:

       * 这种自动转换 - 也发生在**强制转换**(例如,`[Foo[]] (0x2a, 43)`,见下文)和(很少使用的)[内置`.ForEach()`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Arrays#foreach)的类型转换形式(例如,`(0x2a, 43).ForEach([Foo])`)中 - 相对于匹配构造函数的参数类型,它更**严格**。
      * 我对具体规则不清楚,但是使用`[double]`值,例如,通过`[Foo]::new(42.1)`可以成功(也就是说,会自动执行到`[int]`的转换),但是使用`[Foo] 42.1``(42.1).ForEach([Foo])`都会**失败**(后者当前会生成模糊的错误消息)。

  * 如果**自动转换不起作用**,请通过在参数上装饰一个自定义属性,该属性从抽象的[`ArgumentTransformationAttribute`](https://docs.microsoft.com/en-US/dotnet/api/System.Management.Automation.ArgumentTransformationAttribute)类派生来**实现自定义转换**,然后PowerShell会自动应用它;例如:

        using namespace System.Management.Automation
        
        # 带有接受[int]值的公共单参数构造函数的示例类。
        class Foo {
          [int] $n
          Foo([int] $val) {
            $this.n = $val
          }
        }

        # 一个示例的参数转换属性类,将可以解释为[int]的字符串转换为[Foo]实例。
        class CustomTransformationAttribute : ArgumentTransformationAttribute  {
          [object] Transform([EngineIntrinsics] $engineIntrinsics, [object] $inputData) {            
            # 注意:如果输入作为*数组参数*传递,$inputData是一个数组。
            return $(foreach ($o in $inputData) {
              if ($null -ne ($int = $o -as [int])) { [Foo]::new($int) }
              else                                 { $o }
            })
          }
        }
        
        # [string]值(无论是通过管道提供还是作为参数提供的)
        # 可以自动转换为[Foo]实例,
        # 依赖于自定义[ArgumentTransformationAttribute]派生属性。
        &#39;0x2a&#39;, &#39;43&#39; | &amp; {
          [CmdletBinding()]
          param(
            [Parameter(ValueFromPipeline)]
            [CustomTransformation()] # 这实现了自定义转换。
            [Foo[]] $Foo
          )
          process {
            $Foo # 诊断输出。
          }
        }


* 如果**确实需要*分开*的参数,请优化转换过程**:

  * 上述自动类型转换规则也

<details>
<summary>英文:</summary>

&lt;!-- language-all: sh --&gt;

A fundamental performance decision is whether you want to **optimize for _argument-passing_ vs. _pipeline input_**:

* Declaring your parameters _as arrays_ (e.g. `[string[]] $Name`) allows efficient passing of _multiple_ input objects by _argument_ (parameter value).

* However, doing so _hurts pipeline performance_, because a single-element array is then created for each every pipeline input object, as the following example demonstrates: It outputs `String[]` for _each_ of the scalar string elements of the array passed via the pipeline:

      &#39;one&#39;, &#39;two&#39; | 
        &amp; {
          param(
            [Parameter(Mandatory, ValueFromPipeline)]
            [string[]] $Name
          )
          process {
            $Name.GetType().Name # -&gt; &#39;String[]&#39; *for each* input string
          }
        }

  * **Note**: For brevity, the example above as well all others in this answer use a [script block](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Script_Blocks) in lieu of a `function` definition. That is, a function declaration (`function foo { ... }`) followed by its invocation (`... | foo`) is shortened to the functionally equivalent `... | &amp; { ... }`

See [GitHub issue #4242](https://github.com/PowerShell/PowerShell/issues/4242) for a related discussion.

---

With _array_ parameters, you indeed need to ensure element-by-element processing yourself, notably inside the `process` block if they&#39;re also _pipeline-binding_.

As for **&quot;homogenizing&quot; parameter values of different types** so that only _one_ processing loop is required, two fundamental optimizations are possible:

* **Declare only a _single_ parameter** and rely either on PowerShell to _automatically_ convert values of other types to that parameter&#39;s type, or implement an automatically applied _custom conversion_, which obviates the need for &quot;homogenizing&quot; altogether:

  * The **conversion is _automatic_** if the parameter type has a public, single-parameter constructor that accepts an instance of the other type as its (only) argument or - in case the other type is `[string]`, if the type has a static `::Parse()` method with a single `[string]` parameter; e.g.:

        # Sample class with a single-parameter
        # public constructor that accepts [int] values.
        class Foo {
          [int] $n
          Foo([int] $val) {
            $this.n = $val
          }
        }

        # [int] values (whether provided via the pipeline or as an argument)
        # auto-convert to [Foo] instances
        42, 43 | &amp; {
          [CmdletBinding()]
          param(
            [Parameter(ValueFromPipeline)]
            [Foo[]] $Foo
          )
          process {
            $Foo # Diagnostic output.
          }
        }

    * In your case, `[Messaging.MessageQueue]` _does_ have a public single-parameter constructor that accepts a string (as evidenced by your `[Messaging.MessageQueue]::new($n)` call), so you could simply _omit_ the `$Name` parameter declaration, and rely on the automatic conversion of `[string]` inputs.

    * A _general caveat_:

       * This automatic conversion - which also happens with _casts_ (e.g, `[Foo[]] (0x2a, 43)`, see below) and the (rarely used) type-conversion form of the [intrinsic `.ForEach()`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Arrays#foreach) (e.g., `(0x2a, 43).ForEach([Foo])`) - is _stricter_ than calling a single-element constructor with respect to matching the constructor&#39;s parameter type.
      * I&#39;m unclear on the exact rules, but using a `[double]` value, for instance, succeeds with `[Foo]::new(42.1)` (that is, conversion to `[int]` is automatically performed), but *fails* with both `[Foo] 42.1` and `(42.1).ForEach([Foo])` (the latter currently produces an obscure error message).

  * If the conversion _isn&#39;t_ automatic, **implement a _custom_ conversion** that PowerShell then applies automatically, by way of decorating your parameter with a custom attribute that derives from the abstract [`ArgumentTransformationAttribute`](https://docs.microsoft.com/en-US/dotnet/api/System.Management.Automation.ArgumentTransformationAttribute) class; e.g.:

        using namespace System.Management.Automation
        
        # Sample class with a single-parameter
        # public constructor that accepts [int] values.
        class Foo {
          [int] $n
          Foo([int] $val) {
            $this.n = $val
          }
        }

        # A sample argument-conversion (transformation) attribute class that
        # converts strings that can be interpreted as [int] to [Foo] instances.
        class CustomTransformationAttribute : ArgumentTransformationAttribute  {
          [object] Transform([EngineIntrinsics] $engineIntrinsics, [object] $inputData) {            
            # Note: If the inputs were passed as an *array argument*, $inputData is an array.
            return $(foreach ($o in $inputData) {
              if ($null -ne ($int = $o -as [int])) { [Foo]::new($int) }
              else                                 { $o }
            })
          }
        }
        
        # [string] values (whether provided via the pipeline or as an argument)
        # that can be interpreted as [int] now auto-convert to [Foo] instances,
        #  thanks to the custom [ArgumentTransformationAttribute]-derived attribute.
        &#39;0x2a&#39;, &#39;43&#39; | &amp; {
          [CmdletBinding()]
          param(
            [Parameter(ValueFromPipeline)]
            [CustomTransformation()] # This implements the custom transformation.
            [Foo[]] $Foo
          )
          process {
            $Foo # Diagnostic output.
          }
        }


* If you *do* want ***separate* parameters, optimize the conversion process**:

  * The auto type-conversion rules described above also apply to _explicit casts_ (including support for _arrays_ of values), so you can simplify your code as follows:


        if ($PSCmdlet.ParameterSetName -eq &#39;Name&#39;) {
          # Simply use an array cast.
          $Queues = [Messaging.MessageQueue[]] $Name
        } else {
          $Queues = $InputObject
        }

  * In cases where element-by-element construction to effect conversion is required:

        if ($PSCmdlet.ParameterSetName -eq &#39;Name&#39;) {
          # Note the &quot;,&quot;
          $Queues = foreach ($n in $Name) { , [Messaging.MessageQueue]::new($n) }
        } else {
          $Queues = $InputObject
        }

    * Note the use of the unary form of `,` the [array constructor (&quot;comma&quot;) operator](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_Operators#comma-operator-), as in your attempt, albeit:
      * _inside_ the `foreach` loop, and
      * _without_ `@(...)` enclosure of the object to wrap in a single-element array, as `@(...)` _itself_ would trigger enumeration.

      * While `Write-Output -NoEnumerate ([Messaging.MessageQueue]::new($n))`, as shown in Mathias&#39; answer works too, it is _slower_. It comes down to a tradeoff between performance / concision vs. readability / signaling the intent explicitly.

     * The need to wrap _each_ [`[System.Messaging.MessageQueue]`](https://learn.microsoft.com/en-us/dotnet/api/system.messaging.messagequeue) instance in an aux. single-element wrapper with unary `,` / to use `Write-Output -NoEnumerate` stems from the fact that this type implements the [`System.Collections.IEnumerable`](https://learn.microsoft.com/en-US/dotnet/api/System.Collections.IEnumerable) interface, which means that PowerShell automatically _enumerates_ instances of the type by default.&lt;sup&gt;[1]&lt;/sup&gt; Applying either technique ensures that the `[System.Messaging.MessageQueue]` is output _as a whole_ to the pipeline (for details, see [this answer](https://stackoverflow.com/a/48360724/45375)).

       * Note that this is _not_ necessary in the first snippet, because `$Queues = [Messaging.MessageQueue[]] $Name` is an _expression_, to which automatic enumeration does _not_ apply.

       * The above also implies that you need the same technique if you want to pass a _single_ `[System.Messaging.MessageQueue]` instance or a *single-element* array containing such an instance _via the pipeline_; e.g.:

             # !! Without `,` this command would *break*, because
             # !! PowerShell would try to enumerate the elements of the queue
             # !! which fails with an empty one.
             , [System.Messaging.MessageQueue]::new(&#39;foo&#39;) | Get-Member

     * By *not* using an `if` statement as a single *assignment expression* (`$Queue = if ...`) and instead assigning to `$Queue` in the _branches_ of the `if` statement, you additionally prevent subjecting `$InputObject` to unnecessary enumeration.

---

&lt;sup&gt;[1] There are some exceptions, notably strings and dictionaries. See the bottom section of [this answer](https://stackoverflow.com/a/65530467/45375) for details.&lt;/sup&gt;

</details>



# 答案2
**得分**: 1

这种模式(根据选择的参数集“同质化”输入实体)是完全有效的,并且在我个人看来至少构成了良好的参数设计。

话虽如此,你可能希望使用 `Write-Output -NoEnumerate` 来避免笨拙的 `,@(...)` 解包封包数组的技巧:

```powershell
if ($PSCmdlet.ParameterSetName -ieq 'Name') {
    # 当参数未通过管道传递时处理...
    $Queues = foreach ($n in $Name) {
        $queue = [Messaging.MessageQueue]::new($n)
        Write-Output $queue -NoEnumerate
    }
}
else {
    # 输入已经是 [MessageQueue[]],完全避免管道边界
    $Queues = $InputObject 
}
英文:

This pattern ("homogenizing" the input entities based on chosen parameter set) is perfectly valid, and constitutes - in my personal opinion at least - good parameter design.

That being said, you might want to use Write-Output -NoEnumerate to avoid the clunky ,@(...) unwrapped-wrapped-array unpacking trick:

if ($PSCmdlet.ParameterSetName -ieq &#39;Name&#39;) {
# Handle when the parameter is NOT passed by the pipeline...
$Queues = foreach ($n in $Name) {
$queue = [Messaging.MessageQueue]::new($n)
Write-Output $queue -NoEnumerate
}
}
else {
# Input is already [MessageQueue[]], avoid pipeline boundaries entirely
$Queues = $InputObject 
}

huangapple
  • 本文由 发表于 2023年6月29日 22:21:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76581956.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定