英文:
How fast is fs.readdirSync and can I speed it up?
问题
我有一个使用fs.readdirSync递归获取目录中所有文件的函数。
它在我用作测试的小目录上运行得很好,但现在我正在运行它在一个超过100GB的大目录上,它需要很长时间才能完成。有没有关于如何加快速度或者是否有更好的方法来做这个的想法?最终,我将不得不在一些拥有几TB数据的目录上运行这个。
// 递归获取文件的函数
function getFiles(dir, files = []) {
// 使用fs.readdirSync获取传入目录中所有文件和目录的数组
const fileList = fs.readdirSync(dir);
// 通过连接传入目录和文件/目录名来创建文件/目录的完整路径
for (const file of fileList) {
const name = `${dir}/${file}`;
// 使用fs.statSync检查当前文件/目录是否是一个目录
if (fs.statSync(name).isDirectory()) {
// 如果是目录,则使用目录路径和文件数组递归调用getFiles函数
getFiles(name, files);
} else {
// 如果是文件,则将完整路径推送到文件数组中
files.push(name);
}
}
return files;
}
英文:
I have a function that gets all the files in a directory recursively using fs.readdirSync.
It works well with the small directory I ran it through as a test, but now that I am running this on a directory that is over 100GB large, it is taking a very long time to complete. Any ideas on how I can speed this up or if there's a better way of doing this? I'm eventually going to have to run this over some directories with Terabytes of data.
// Recursive function to get files
function getFiles(dir, files = []) {
// Get an array of all files and directories in the passed directory using fs.readdirSync
const fileList = fs.readdirSync(dir);
// Create the full path of the file/directory by concatenating the passed directory and file/directory name
for (const file of fileList) {
const name = `${dir}/${file}`;
// Check if the current file/directory is a directory using fs.statSync
if (fs.statSync(name).isDirectory()) {
// If it is a directory, recursively call the getFiles function with the directory path and the files array
getFiles(name, files);
} else {
// If it is a file, push the full path to the files array
files.push(name);
}
}
return files;
}
答案1
得分: 2
不幸的是,使用async
会变慢。所以我们需要优化您的代码。您可以使用{withFileTypes:true}
选项进行优化,速度提高了2倍。
此外,我尝试过Node v20的{recursive:true}
选项,但它甚至比您的解决方案慢。而且它不适用于withFileTypes
。
也许一个更好的SSD,具有高读取速度,会有所帮助。尽管我猜文件条目是从文件系统索引中读取的,但不确定硬件会如何影响这一点。
英文:
Unfortunately going async
is slower. So we need to optimize your code. You can do it with {withFileTypes:true}
option and it gets 2x faster.
Also I've tried node v20's {recursive:true}
option but it's slower than even your solution. And it didn't work with withFileTypes
.
Maybe a better SSD with high read speed would help. Though file entries are read from a file system index I guess, not sure how hardware affects this.
import fs from 'fs';
const DIR = '/bytex';
function getFiles(dir, files = []) {
// Get an array of all files and directories in the passed directory using fs.readdirSync
const fileList = fs.readdirSync(dir);
// Create the full path of the file/directory by concatenating the passed directory and file/directory name
for (const file of fileList) {
const name = `${dir}/${file}`;
// Check if the current file/directory is a directory using fs.statSync
if (fs.statSync(name).isDirectory()) {
// If it is a directory, recursively call the getFiles function with the directory path and the files array
getFiles(name, files);
} else {
// If it is a file, push the full path to the files array
files.push(name);
}
}
return files;
}
function getFiles2(dir, files = []) {
const fileList = fs.readdirSync(dir, { withFileTypes: true });
fileList.forEach(file => file.isDirectory() ? getFiles2(`${dir}/${file.name}`, files) : files.push(`${dir}/${file.name}`));
return files;
}
let start = performance.now();
let files = getFiles(DIR);
console.log(performance.now() - start);
console.log(files.length);
start = performance.now();
files = getFiles2(DIR);
console.log(performance.now() - start);
console.log(files.length);
The output:
171.66947209835052
64508
68.24071204662323
64508
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论