英文:
Why this script is so slow when creating a random list
问题
我尝试生成一个包含数万个目录的随机目录结构,但这个脚本运行时间太长。是递归调用CreateDir导致速度变慢吗?我使用文件来存储目录列表,以便一次创建所有目录,因为我无法弄清楚如何使用数组。例如,创建1200个目录花了3分钟。
Clear-Host
$ParentDir = 'c:\a\'
$MinNumberOfLeaves = 2
$MaxNumberOfLeaves = 10
$MaxDepth = 5
$MinDepth = 2
Function CreateDirName{
$fRandomNames = [System.Collections.ArrayList]@()
# Get-Random -Count ([int]::MaxValue)) Randomizes the entire list of commands
$fRandomNames += (((Get-Command) | Select-Object -Property Name).Name | `
Where-Object {($_ -inotmatch ":") -and ($_ -inotlike "")} )
$fRandomName = ($fRandomNames | Get-Random -Count 1) -replace("-","")
Return $fRandomName
}
Function CreateDir{
Param(
$fParentDir,
$fMinNumberOfLeaves = 2,
$fMaxNumberOfLeaves = 3,
$fMaxDepth = 3,
$fMinDepth = 2,
$fRandomDepth = 2
)
For($d=1;$d -le ($fRandomDepth);$d++)
{
$fNumOfLeaves = Get-Random -Minimum $fMinNumberOfLeaves -Maximum $fMaxNumberOfLeaves
#$fNumOfLeaves = 4
For($l=1;$l -le $fNumOfLeaves;$l++)
{
$fSubDirName = CreateDirName
$fFullDirPath = $fParentDir + '\' + $fSubDirName
$fFullDirPath | Out-File -Append -FilePath 'c:\a\Paths.txt' -Encoding ascii
#New-Item -Path $fFullDirPath -ItemType Directory
$SubDirs = CreateDir -fParentDir $fFullDirPath -fRandomDepth ($fRandomDepth-1)
Out-Null
}
Out-Null
}
Out-Null
}
"Dirs"| Set-Content 'c:\a\Paths.txt' -Force -Encoding ascii
$RandomDepth = Get-Random -Minimum $MinDepth -Maximum $MaxDepth
#$RandomDepth = 3
CreateDir -fParentDir $ParentDir -fRandomDepth $RandomDepth
Out-Null
英文:
I am trying to generate a random directory structure with 10s of thousands of directories, but this script takes way too long. Is it the recursive call to CreateDir that is causing the slowness. I using a file to store the the directory list, so I can create all the directories at once AND because I could not figure out how to use an array. For example, it took 3 minutes for 1200 directories.
<!-- language-all: lang-sh -->
Clear-Host
$ParentDir = 'c:\a\'
$MinNumberOfLeaves = 2
$MaxNumberOfLeaves = 10
$MaxDepth = 5
$MinDepth = 2
Function CreateDirName{
$fRandomNames = [System.Collections.Arraylist]@()
# Get-Random -Count ([int]::MaxValue)) Randomizes the entire list of commands
$fRandomNames += (((Get-Command) | Select-Object -Property Name).Name | `
Where-Object {($_ -inotmatch ":") -and ($_ -inotlike "")} )
$fRandomName = ($fRandomNames | Get-Random -Count 1) -replace("-","")
Return $fRandomName
}
Function CreateDir{
Param(
$fParentDir,
$fMinNumberOfLeaves = 2,
$fMaxNumberOfLeaves = 3,
$fMaxDepth = 3,
$fMinDepth = 2,
$fRandomDepth = 2
)
For($d=1;$d -le ($fRandomDepth);$d++)
{
$fNumOfLeaves = Get-Random -Minimum $fMinNumberOfLeaves -Maximum $fMaxNumberOfLeaves
#$fNumOfLeaves = 4
For($l=1;$l -le $fNumOfLeaves;$l++)
{
$fSubDirName = CreateDirName
$fFullDirPath = $fParentDir + '\' + $fSubDirName
$fFullDirPath | Out-File -Append -FilePath c:\a\Paths.txt -Encoding ascii
#New-Item -Path $fFullDirPath -ItemType Directory
$SubDirs = CreateDir -fParentDir $fFullDirPath -fRandomDepth ($fRandomDepth-1)
Out-Null
}
Out-Null
}
Out-Null
}
"Dirs"| Set-content c:\a\Paths.txt -Force -Encoding ascii
$RandomDepth = Get-Random -Minimum $MinDepth -Maximum $MaxDepth
#$RandomDepth = 3
CreateDir -fParentDir $ParentDir -fRandomDepth $RandomDepth
Out-Null
答案1
得分: 1
好的,你的翻译:
好的,你的脚本有很多方面使其变得很慢。
我认为这是一个很好的学习机会。
首先,让我们来解决你的'CreateDirName'函数。
当前的流程是:
- 创建一个新的ArrayList
- 获取所有命令的列表,选择属性'Name',然后再次选择'Name' '().Name',使用后置过滤器来排除不匹配':'且不为空字符串的字符串。
- 将结果传递给'Get-Random',并使用'-replace'来移除'-'
首先,删除不必要的'Select-Object'。你可以像这样展开属性'Name':'(Get-Command).Name'。
然后让我们看看你的过滤器。'Where-Object'本身是昂贵的,但你通过使用'-inotmatch'和'-inotlike'使其变得更糟糕。
'-match'运算符使用正则表达式。
'-like'运算符使用通配符,这也很昂贵。
由于字符串是字符数组,我们可以使用'.Contains()'来过滤':',为了避免空或null值,我们可以使用'String.IsNullOrEmpty()'。
'-replace'也使用正则表达式,我们可以用'.Replace()'来替换它。(双关语不是故意的)。
突然间,我们的代码看起来像这样:
$fRandomNames = [System.Collections.ArrayList]@()
$fRandomNames += ((Get-Command).Name | Where-Object { !$_.Contains(':') -and ![string]::IsNullOrEmpty($_) })
$fRandomName = ($fRandomNames | Get-Random -Count 1).Replace('-', '')
现在,在这个函数中,由于数组是不可变对象,'+='运算符会创建一个新的数组以支持对象的大小。
为了改进这一点,我们可以使用ArrayList.AddRange()。
由于我们正在讨论性能,使用管道始终会增加成本,所以让我们从C#中借用LINQ,并更改这个'Where-Object'。
现在我们有:
$fRandomNames = [System.Collections.ArrayList]@()
$fRandomNames.AddRange([System.Linq.Enumerable]::Where((Get-Command).Name, [Func[object, bool]] { param($c) !$c.Contains(':') -and ![string]::IsNullOrEmpty($c) }).ToList())
$fRandomName = (Get-Random -InputObject $fRandomNames -Count 1).Replace('-', '')
变得更好了。我用这些选项进行了一些测试,我们得到了36%的改进。
[![在此输入图片说明][1]][1]
现在让我们攻击主函数体。
你不需要每次都调用'Get-Command',你可以获取列表并重复使用它进行进一步的操作。
有了这个,我们可以完全放弃'CreateDirName',这避免了内存泄漏,并提高了性能。
我们还可以避免使用'for'循环,特别是如果你不需要索引号。
让我们改用'do-while'循环。
在我粘贴下一个代码会话之前,让我们看看你是如何将数据写入文件的。
你在每个操作中都调用'Out-File'。
这涉及到:
- 获取字符串并将其传递到管道。
- 检查文件和目录是否存在。
- 打开文件流以写入数据。
- 写入数据。
- 关闭流。
- 处理非托管对象。
我们可以将结果存储在另一个ArrayList中,并在最后一次性写入所有内容。
现在我们有:
$fRandomNames = [System.Collections.ArrayList]@()
$fRandomNames.AddRange([System.Linq.Enumerable]::Where((Get-Command).Name, [Func[object, bool]] { param($c) !$c.Contains(':') -and ![string]::IsNullOrEmpty($c) }).ToList())
$newPathList = [System.Collections.ArrayList]@()
$currentDepth = 0
do {
$leafNumber = 0
do {
$newPathList.Add("$fParentDir\$((Get-Random -InputObject $fRandomNames -Count 1).Replace('-', ''))")
New-CustomDirectory -fParentDir $fFullDirPath -fRandomDepth ($fRandomDepth - 1)
$leafNumber++
} while ($leafNumber -lt (Get-Random -Minimum $fMinNumberOfLeaves -Maximum $fMaxNumberOfLeaves))
$currentDepth++
} while ($currentDepth -lt $fRandomDepth)
我想要谈论的最后一件事是'Out-File'和'Out-Null'。
我们知道写文件的步骤,为什么不纯粹使用.NET呢?
还有所有的'Out-Null'是怎么回事?每个'Out-Null'都意味着一笔费用。你应该考虑永远不使用'Out-Null'。
而是使用$result = $null,或[void]$result.DoWork()。
而且你只需要在主函数调用时执行一次。
让我们先处理文件写入。
再次运行一个示例以显示使用纯.NET的好处:
[![在此输入图片说明][2]][2]
最后,我们有这样的东西:
$ParentDir = 'c:\a\'
$MaxDepth = 5
$MinDepth = 2
function New-CustomDirectory {
Param(
$fParentDir,
[int]$fMinNumberOfLeaves = 2,
[int]$fMaxNumberOfLeaves = 10,
[int]$fMaxDepth = 3,
[int]$fMinDepth = 2,
[int]$fRandomDepth = 2
)
$fRandomNames = [System.Collections.ArrayList]@()
$fRandomNames.AddRange([System.Linq.Enumerable]::Where((Get-Command).Name, [Func[object, bool]] { param($c) !$c.Contains(':') -and ![string]::IsNullOrEmpty($c) }).ToList())
$newPathList = [System.Collections.ArrayList]@()
$currentDepth = 0
do {
$leafNumber =
<details>
<summary>英文:</summary>
Ok, there are many aspects of your approach that makes this script slow.
I think is a great learning opportunity.
First, let's address your 'CreateDirName' function.
The current workflow is:
- Create a new ArrayList
- Get a list of ALL commands, Select the property 'Name', then select 'Name' again '().Name', use a post filter for strings that don't match ':' and it's not like ''.
- Pipe the result to 'Get-Random' and use '-replace' to remove '-'
First, remove the unnecessary 'Select-Object'. You can expand the property 'Name' like this: '(Get-Command).Name'.
Then let's look at your filter. 'Where-Object', by itself is costly, but you make it worse by using '-inotmatch' and '-inotlike'.
'-match' operators use RegEx.
'-like' operators works with wildcards, which is also costly.
Since a string is an array of characters, we can use '.Contains()' to filter ':', and to avoid empty or null values we can use 'String.IsNullOrEmpty()'.
'-replace' also uses regex, we can replace it with '.Replace()'. (Pun not intended).
Suddenly, our code looks like this:
$fRandomNames = [System.Collections.Arraylist]@()
$fRandomNames += ((Get-Command).Name | Where-Object { !$_.Contains(':') -and ![string]::IsNullOrEmpty($_) })
$fRandomName = ($fRandomNames | Get-Random -Count 1).Replace('-', '')
Now, still on this function, since arrays are immutable objects, the '+=' operator creates a new array to support the object's size.
To improve that, we can use ArrayList.AddRange().
And since we are talking performance, using the pipeline always adds cost, so let's borrow LINQ from C#, and change this 'Where-Object'.
And now we have:
$fRandomNames = [System.Collections.Arraylist]@()
$fRandomNames.AddRange([System.Linq.Enumerable]::Where((Get-Command).Name, [Func[object, bool]] { param($c) !$c.Contains(':') -and ![string]::IsNullOrEmpty($c) }).ToList())
$fRandomName = (Get-Random -InputObject $fRandomNames -Count 1).Replace('-', '')
Getting better. I ran some tests with these options, and we got a 36% improvement.
[![enter image description here][1]][1]
Now let's attack the main function body.
You don't need to call 'Get-Command' every time, you can get the list and re-use it for further operations.
With this, we can ditch 'CreateDirName' completely, which avoids a memory lap, and contributes to our performance.
We can also avoid using 'for' loops, specially if you don't need the index number.
Let's use a 'do-while' loop instead.
Before I paste the next code session, let's look at how you write your data into the file.
You are calling 'Out-File' in every single operation.
That involves:
- Getting the string and pass it to the pipe line.
- Check if the file and directory exists.
- Open a file stream to write data.
- Write data.
- Close the stream.
- Dispose of unmanaged objects.
We can store the results in another ArrayList and write everything at the end.
And now we have:
$fRandomNames = [System.Collections.Arraylist]@()
$fRandomNames.AddRange([System.Linq.Enumerable]::Where((Get-Command).Name, [Func[object, bool]] { param($c) !$c.Contains(':') -and ![string]::IsNullOrEmpty($c) }).ToList())
$newPathList = [System.Collections.Arraylist]@()
$currentDepth = 0
do {
$leafNumber = 0
do {
$newPathList.Add("$fParentDir\$((Get-Random -InputObject $fRandomNames -Count 1).Replace('-', ''))")
New-CustomDirectory -fParentDir $fFullDirPath -fRandomDepth ($fRandomDepth - 1)
$leafNumber++
} while ($leafNumber -lt (Get-Random -Minimum $fMinNumberOfLeaves -Maximum $fMaxNumberOfLeaves))
$currentDepth++
} while ($currentDepth -lt $fRandomDepth)
Last thing I want to talk about is 'Out-File' and 'Out-Null'.
We now the steps for writing a file, why don't we use pure .NET instead?
And what's with all the 'Out-Null's?????
Every 'Out-Null' implies a cost. You should consider never using 'Out-Null'.
Instead, use $result = $null, or [void]$result.DoWork().
And you only need to do it once, at the main function call.
Let's tackle the file writing first.
Ran another sample to show you the benefits of using pure .NET:
[![enter image description here][2]][2]
And at the end, we have something like this:
$ParentDir = 'c:\a\'
$MaxDepth = 5
$MinDepth = 2
function New-CustomDirectory {
Param(
$fParentDir,
[int]$fMinNumberOfLeaves = 2,
[int]$fMaxNumberOfLeaves = 10,
[int]$fMaxDepth = 3,
[int]$fMinDepth = 2,
[int]$fRandomDepth = 2
)
$fRandomNames = [System.Collections.Arraylist]@()
$fRandomNames.AddRange([System.Linq.Enumerable]::Where((Get-Command).Name, [Func[object, bool]] { param($c) !$c.Contains(':') -and ![string]::IsNullOrEmpty($c) }).ToList())
$newPathList = [System.Collections.Arraylist]@()
$currentDepth = 0
do {
$leafNumber = 0
do {
$newPathList.Add("$fParentDir\$((Get-Random -InputObject $fRandomNames -Count 1).Replace('-', ''))")
New-CustomDirectory -fParentDir $fFullDirPath -fRandomDepth ($fRandomDepth - 1)
$leafNumber++
} while ($leafNumber -lt (Get-Random -Minimum $fMinNumberOfLeaves -Maximum $fMaxNumberOfLeaves))
$currentDepth++
} while ($currentDepth -lt $fRandomDepth)
$stream = [System.IO.File]::AppendText('C:\a\Paths.txt')
$stream.Write($newPathList)
$stream.Dispose()
}
"Dirs"| Set-content c:\a\Paths.txt -Force -Encoding ascii
$RandomDepth = Get-Random -Minimum $MinDepth -Maximum $MaxDepth
[void](New-CustomDirectory -fParentDir $ParentDir -fRandomDepth $RandomDepth)
**DISCLAIMER!** I didn't study what your script does and didn't test my version's output. This was only a block performance study.
You might need to change it to suit your needs.
Source:
- [PowerShell scripting performance considerations][3]
- [High performance PowerShell with LINQ][4]
- [Under the stairs: Performance with PowerShell][5]
Hope it helps!
Happy scripting!
[1]: https://i.stack.imgur.com/RLK7L.png
[2]: https://i.stack.imgur.com/ksr5Y.png
[3]: https://learn.microsoft.com/en-us/powershell/scripting/dev-cross-plat/performance/script-authoring-considerations?view=powershell-7.3
[4]: https://www.red-gate.com/simple-talk/development/dotnet-development/high-performance-powershell-linq/
[5]: https://tfl09.blogspot.com/2011/11/performance-with-powershell.html
</details>
# 答案2
**得分**: 0
// 评论指出了多个问题。但是首要的变化是从CreateDirName函数中删除这些行,并将它们放置在脚本的顶部:
```powershell
$fRandomNames = [System.Collections.Arraylist]@()
# Get-Random -Count ([int]::MaxValue)) 随机排列命令列表
$fRandomNames += (((Get-Command) | Select-Object -Property Name).Name | `
Where-Object {($_ -inotmatch ":") -and ($_ -inotlike "")} )
(Get-Command)
在我的电脑上大约需要11秒,反复调用它是疯狂的!在脚本启动时构建 $fRandomNames
数组列表,并使用CreateDirName仅从已创建的 $fRandomNames
数组列表中选择一个随机名称。
英文:
The comments are correct in pointing out multiple issues. But the first and primary needed change is removing these lines from the CreateDirName function and placing them near the top of the script:
$fRandomNames = [System.Collections.Arraylist]@()
# Get-Random -Count ([int]::MaxValue)) Randomizes the entire list of commands
$fRandomNames += (((Get-Command) | Select-Object -Property Name).Name | `
Where-Object {($_ -inotmatch ":") -and ($_ -inotlike "")} )
The (Get-Command)
takes about 11 seconds on my computer, and calling it over and over again is insane! Build the $fRandomNames
arraylist when the script starts, and use CreateDirName to only pick a random name from the already created $fRandomNames
arraylist.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论