你能使用 Get-ChildItem -Filter 来按文件名长度进行筛选吗?

huangapple go评论84阅读模式
英文:

Can you use Get-ChildItem -Filter to filter by filename length?

问题

以下是要翻译的内容:

"Is something like the below example code possible?

    Get-ChildItem $copyFilePath -Filter $_.Basename.Length -ne 22 | forEach-Object{
        
        Copy-Item -path "$($copyFilePath)$($_.Fullname)"
    }

I am trying to find a way to remove files of such name length from my code without having to loop through the entire file list. By doing so it will reduce my code's run time by an expected 80-85%."

英文:

Is something like the below example code possible?

    Get-ChildItem $copyFilePath -Filter $_.Basename.Length -ne 22 | forEach-Object{
        
        Copy-Item -path "$($copyFilePath)$($_.Fullname)"
    }

I am trying to find a way to remove files of such name length from my code without having to loop through the entire file list. By doing so it will reduce my code's run time by an expected 80-85%.

答案1

得分: 3

No, -Filter 参数仅用于文件系统的 名称,您需要使用 Where-Object 命令来筛选其他内容:

Get-ChildItem $copyFilePath -File | Where-Object {$_.BaseName.Length -ne 22} | ForEach-Object {
    # ...
}

请注意,在 ForEach-Object 主体中的表达式 "$($copyFilePath)\$($_.Fullname)" 将导致一个 无效 的文件路径,因为 FullName 属性包含一个已根路径。

英文:

No, the -Filter parameter is only for file system names, you'll need the Where-Object cmdlet to filter on anything else:

Get-ChildItem $copyFilePath -File |Where-Object {$_.BaseName.Length -ne 22} |ForEach-Object {
    # ...
}

Beware that the expression "$($copyFilePath)\$($_.Fullname)" inside the ForEach-Object body will result in an invalid file path, as the FullName property contains an already-rooted path.

答案2

得分: 2

Mathias的答案是正确的PowerShell方法来完成你想要的操作,但显然不是最快的方法。如果你想要更快的方法,可以依赖.NET API调用,而不是依赖PowerShell cmdlet。

$queue = [System.Collections.Generic.Queue[System.IO.DirectoryInfo]]::new()
$copyFilePath = Get-Item '绝对路径\到\初始目录'
$queue.Enqueue($copyFilePath)

while($queue.Count) {
    $dir = $queue.Dequeue()
    try {
        $enum = $dir.EnumerateFileSystemInfos()
    }
    catch {
        # 在需要时,可以使用 `$_` 进行错误处理
        # 如果无法枚举此目录(权限等),则继续下一个
        continue
    }

    foreach($item in $enum) {
        if($item -is [System.IO.DirectoryInfo]) {
            $queue.Enqueue($item)
            continue
        }

        # `$item` 这里是一个 `FileInfo`,检查其长度
        if($item.BaseName.Length -eq 22) {
            # 如果条件满足,跳过此文件
            continue
        }

        # 这里可以使用 `.CopyTo` 而不是 `Copy-Item`:
        # public FileInfo CopyTo(string destFileName);
        # public FileInfo CopyTo(string destFileName, bool overwrite);

        try {
            # `$destination` 需要事先定义,并且应始终是绝对路径
            # 如果需要保留文件夹结构,还需要在这里处理文件夹的创建
            $item.CopyTo($destination)
        }
        catch {
            # 这里处理错误
        }
    }
}
英文:

Mathias's answer is the correct PowerShell way to do what you're looking for, however definitely not the fastest. If you want something faster you can rely on .NET API calls instead of relying on PowerShell cmdlets.

$queue = [System.Collections.Generic.Queue[System.IO.DirectoryInfo]]::new()
$copyFilePath = Get-Item 'absolute\path\to\initialDir'
$queue.Enqueue($copyFilePath)

while($queue.Count) {
    $dir = $queue.Dequeue()
    try {
        $enum = $dir.EnumerateFileSystemInfos()
    }
    catch {
        # Use `$_` here for error handling if needed
        # if we can't enumerate this Directory (permissions, etc), go next
        continue
    }

    foreach($item in $enum) {
        if($item -is [System.IO.DirectoryInfo]) {
            $queue.Enqueue($item)
            continue
        }

        # `$item` is a `FileInfo` here, check its Length
        if($item.BaseName.Length -eq 22) {
            # skip this file if condition is met
            continue
        }

        # here you can use `.CopyTo` instead of `Copy-Item`:
        # public FileInfo CopyTo(string destFileName);
        # public FileInfo CopyTo(string destFileName, bool overwrite);

        try {
            # `$destination` needs to be defined beforehand and should always be an absolute path
            # if the folder structure needs to be preserved you also need to handle the folder creation here
            $item.CopyTo($destination)
        }
        catch {
            # error handling here
        }
    }
}

答案3

得分: 1

以下是翻译好的内容:

要补充Mathias的有益回答Santiago的有益回答,提供一些背景信息性能考虑

  • 如果可行的话,确实建议使用-Filter参数,因为它在源头进行过滤并仅返回感兴趣的对象,比在PowerShell中返回所有对象然后在事后进行过滤要快得多。

    • 每个PowerShell提供程序都决定-Filter是否支持什么类型的过滤器;任何这样的过滤器都不可避免地是一个字符串(尽管提供程序如何解释字符串由提供程序决定)。

    • 对于FileSystem提供程序来说,-Filter仅支持单个通配符名称模式(例如,'*.txt'),通过.NET API最终传递给平台本地API。

    • 值得注意的是,这些API支持的通配符“语言”(a)比PowerShell自己的通配符(例如,它们缺少[...]来表示字符范围和集合)要弱大,并且(b)在Windows上充斥着遗留怪癖 - 请参阅此答案以了解详细信息。

  • PowerShell在底层使用的.NET API也不支持诸如_文件大小_之类的开放性过滤器;换句话说,在源头执行所需的过滤是基本不支持的。

    • 但是,直接的.NET API调用确实提供了性能优势:

      • PowerShell的cmdlet和管道通常与直接的.NET API调用相比有开销,尤其是在来自_provider_ cmdlet的情况下,每个输出对象都被装饰为包含提供程序元数据的_instance_-level ETS属性,例如.PSPath

      • 可能在未来加快速度(并减少内存负载),通过使用CodeProperty成员在_type_级别而不是每个实例的NoteProperty成员来定义这些属性,这是GitHub问题#7501的主题。

    • 或者,还有一些你可以在_PowerShell_方面做的事情来提高性能,如下所讨论。


提高PowerShell代码性能

  • 避免使用管道和每个输入对象的cmdlet调用至关重要。

  • 例如,您可以将管道中的Where-Object替换为intrinsic .Where()方法以加速处理,尽管会增加内存消耗。

  • 如果需要根据每个输入对象确定目标位置,可以使用延迟绑定脚本块Copy-Item替换为每个输入对象一次:

    (Get-ChildItem $copyFilePath).Where({ $_.BaseName.Length -ne 22 }) |         
      Copy-Item -Destination $destination
    
  • 如果不需要在目标路径中进行每个输入对象的变化,您可以通过利用许多文件处理cmdlet接受输入路径的_array_来进一步加速处理:

    Copy-Item `         
      -LiteralPath (Get-ChildItem $copyFilePath).Where({ $_.BaseName.Length -ne 22 }).FullName `         
      -Destination $destination
    
    • 请注意,使用成员访问枚举直接从.Where()返回的集合的各个元素获取.FullName属性值。

    • PowerShell (Core) 7+中,您甚至不再需要((...).FullName),因为[System.IO.FileInfo][System.IO.DirectoryInfo]实例现在在_stringified_时都以它们的.FullName属性一致地表示(请参阅此答案了解背景信息)。

英文:

To complement Mathias' helpful answer and Santiago's helpful answer with some background information and performance considerations:

  • It is indeed advisable to use a -Filter argument if feasible, as it filters at the source and returns only the objects of interest, which is much faster than returning all objects and performing filtering after the fact, in PowerShell.

    • Each PowerShell provider determines what kind of filters - if any - -Filter supports; any such filter is invariably a string (though it is up to the provider how to interpret the string.

    • In the case of the FileSystem provider, -Filter supports only a single, wildcard-based name pattern (e.g., '*.txt'), which, via .NET APIs, is ultimately passed through to platform-native APIs.

    • Notably, the wildcard "language" supported by these APIs is (a) less powerful than PowerShell's own wildcards available via the -Include parameter, for instance (they lack [...] to express character ranges and sets) and (b), on Windows, riddled with legacy quirks - see this answer for the gory details.

  • The .NET APIs that PowerShell uses under the covers also do not support open-ended filters such as by file size; that is, performing your desired filtering at the source is fundamentally unsupported.

    • Still, direct .NET API calls do offer a performance benefit:

      • PowerShell's cmdlets and pipeline generally incur overhead compared to direct .NET API calls, with a notable slowdown in the case of provider cmdlets coming from each output object getting decorated with instance-level ETS properties such as .PSPath, containing provider metadata.

      • Potentially speeding this up (and reducing memory load) in the future, by defining these properties at the type level via CodeProperty members rather than per-instance NoteProperty members, is the subject of GitHub issue #7501.

    • Alternatively, there are things you can do on the PowerShell side to improve performance as well, as discussed next.


Improving the performance of your PowerShell code:

  • Avoiding the pipeline and per-input-object cmdlet calls is key.

  • E.g, you can replace a Where-Object in a pipeline with the intrinsic .Where() method speeds up processing, albeit at the expense of memory consumption.

  • Instead of calling Copy-Item once for each input object, pipe directly to it; if you need to determine the destination location on a per-input-object basis, you can use a delay-bind script block:

    (Get-ChildItem $copyFilePath).Where({ $_.BaseName.Length -ne 22 }) |         
      Copy-Item -Destination $destination
    
  • If no per-input-object variation in the destination path is needed, you can further speed up processing by taking advantage of the fact that many file-processing cmdlets accept an array of input paths:

    Copy-Item `
      -LiteralPath (Get-ChildItem $copyFilePath).Where({ $_.BaseName.Length -ne 22 }).FullName `
      -Destination $destination
    
    • Note the use of member-access enumeration to directly obtain the .FullName property values from the individual elements of the collection returned by .Where().

    • In PowerShell (Core) 7+, you wouldn't even need ((...).FullName) anymore, because [System.IO.FileInfo] and [System.IO.DirectoryInfo] instances are now consistently represented by their .FullName property when stringified (see this answer for background information).

huangapple
  • 本文由 发表于 2023年6月14日 23:51:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475408.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定