英文:
Tabulate All Folder Contents Recursively
问题
背景
我正在比较同一驱动器的本地版本和OneDrive版本,以便识别同步中的差异。总共有近20,000个文件,分布在深层嵌套的文件夹中。
我尝试过其他解决方案,但我更喜欢将所有内容都保存在Excel中,原因太多,无法在此详细说明。因此,我正在使用PowerQuery来列出每个驱动器的内容,然后我将使用各种Table.*Join()
来比较这些内容。
问题
不幸的是,没有原生选项满足我的需求。虽然Folder.Files()
递归列出所有文件,但它未包括文件夹。相比之下,Folder.Contents()
包括文件夹,但未递归列出内容,而只显示“第一层”。
因此,我创建了自定义函数Folder_FullContents()
,以递归列出文件和文件夹:
let Folder_FullContents = (
path as text,
optional options as nullable record
) as any =>
let
contents = Folder.Contents(path, options),
subfolders = Table.AddColumn(
Table.SelectRows(contents, each [Attributes][Directory]),
"File Path",
each Text.Combine({[Folder Path], [Name]}),
type text
)[File Path],
result = Table.Combine(List.Combine({{contents},
List.Transform(subfolders, Folder_FullContents)
}))
in result
in Folder_FullContents
虽然**Folder_FullContents()
从技术上讲确实有效,但对于20,000个文件来说速度太慢**。
问题
在PowerQuery M中是否有可靠的解决方案,可以:
- 递归列出
- 目录下所有文件和文件夹,且
- 在大规模情况下性能不差于
.Files()
或.Contents()
?
注
这些驱动器上有一些空文件夹。因此,简单地将Folder.Files()
的[Folder Path]
的.Distinct()
集合追加到文件数据集中是不足够的。
这样做会遗漏空文件夹,因为空文件夹不会出现在任何文件路径中。
英文:
Background
I am comparing a local and a OneDrive version of the same drive, in order to identify discrepancies in the sync. There are nearly 20,000 files in total, among deeply nested folders.
I have tried other solutions, but I prefer to keep everything in Excel, for reasons too numerous to detail here. As such, I am using PowerQuery to list the contents of each drive, and I will then use various Table.*Join()
s to compare those contents.
Problem
Unfortunately, no native option meets my needs. While Folder.Files()
does list all files recursively, it fails to include the folders. By contrast, Folder.Contents()
does include the folders, but it fails to list the contents recursively—rather, it shows only the "first level".
As such, I created the custom function Folder_FullContents()
, to recursively list both files and folders:
let Folder_FullContents = (
path as text,
optional options as nullable record
) as any =>
let
contents = Folder.Contents(path, options),
subfolders = Table.AddColumn(
Table.SelectRows(contents, each [Attributes][Directory]),
"File Path",
each Text.Combine({[Folder Path], [Name]}),
type text
)[File Path],
result = Table.Combine(List.Combine({{contents},
List.Transform(subfolders, Folder_FullContents)
}))
in result
in Folder_FullContents
While Folder_FullContents()
does technically work, it is prohibitively slow for 20,000 files.
Question
Is there a reliable solution in PowerQuery M, that
- recursively lists
- all files and folders beneath a directory, with
- performance no worse than
.Files()
or.Contents()
at scale?
Note
There are some empty folders on these drives. As such, it is insufficient to simply append the .Distinct()
set of [Folder Path]
s from Folder.Files()
, to the dataset of the files themselves.
Doing so would omit the empty folders, which do not appear in any filepaths.
答案1
得分: 2
Buffering通常可以解决这类问题。我已尝试以下方法,大约在6秒内返回121k行。
let Folder_FullContents = (path as text, optional options as nullable record) as any =>
let
contents = Table.Buffer( Folder.Contents(path, options)),
subfolders = Table.AddColumn(
Table.SelectRows(contents, each [Attributes][Directory]),
"File Path",
each Text.Combine({[Folder Path], [Name]}),
type text
)[File Path],
result = Table.Combine(List.Combine({{contents},
List.Transform(subfolders, @Folder_FullContents)
}))
in result
in Folder_FullContents
英文:
Buffering usually solves problems like this. I have tried the following which returns 121k rows in about 6 seconds.
let Folder_FullContents = (path as text, optional options as nullable record) as any =>
let
contents = Table.Buffer( Folder.Contents(path, options)),
subfolders = Table.AddColumn(
Table.SelectRows(contents, each [Attributes][Directory]),
"File Path",
each Text.Combine({[Folder Path], [Name]}),
type text
)[File Path],
result = Table.Combine(List.Combine({{contents},
List.Transform(subfolders, @Folder_FullContents)
}))
in result
in Folder_FullContents
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论