fs.readdirSync有多快,我可以加快速度吗?

huangapple go评论59阅读模式
英文:

How fast is fs.readdirSync and can I speed it up?

问题

我有一个使用fs.readdirSync递归获取目录中所有文件的函数。
它在我用作测试的小目录上运行得很好,但现在我正在运行它在一个超过100GB的大目录上,它需要很长时间才能完成。有没有关于如何加快速度或者是否有更好的方法来做这个的想法?最终,我将不得不在一些拥有几TB数据的目录上运行这个。

// 递归获取文件的函数
function getFiles(dir, files = []) {
    // 使用fs.readdirSync获取传入目录中所有文件和目录的数组
    const fileList = fs.readdirSync(dir);
    // 通过连接传入目录和文件/目录名来创建文件/目录的完整路径
    for (const file of fileList) {
        const name = `${dir}/${file}`;
        // 使用fs.statSync检查当前文件/目录是否是一个目录
        if (fs.statSync(name).isDirectory()) {
            // 如果是目录,则使用目录路径和文件数组递归调用getFiles函数
            getFiles(name, files);
        } else {
            // 如果是文件,则将完整路径推送到文件数组中
            files.push(name);
        }
    }
    return files;
}
英文:

I have a function that gets all the files in a directory recursively using fs.readdirSync.
It works well with the small directory I ran it through as a test, but now that I am running this on a directory that is over 100GB large, it is taking a very long time to complete. Any ideas on how I can speed this up or if there's a better way of doing this? I'm eventually going to have to run this over some directories with Terabytes of data.

// Recursive function to get files
function getFiles(dir, files = []) {
    // Get an array of all files and directories in the passed directory using fs.readdirSync
    const fileList = fs.readdirSync(dir);
    // Create the full path of the file/directory by concatenating the passed directory and file/directory name
    for (const file of fileList) {
        const name = `${dir}/${file}`;
        // Check if the current file/directory is a directory using fs.statSync
        if (fs.statSync(name).isDirectory()) {
            // If it is a directory, recursively call the getFiles function with the directory path and the files array
            getFiles(name, files);
        } else {
            // If it is a file, push the full path to the files array
            files.push(name);
        }
    }
    return files;
}

答案1

得分: 2

不幸的是,使用async会变慢。所以我们需要优化您的代码。您可以使用{withFileTypes:true}选项进行优化,速度提高了2倍。

此外,我尝试过Node v20的{recursive:true}选项,但它甚至比您的解决方案慢。而且它不适用于withFileTypes

也许一个更好的SSD,具有高读取速度,会有所帮助。尽管我猜文件条目是从文件系统索引中读取的,但不确定硬件会如何影响这一点。

英文:

Unfortunately going async is slower. So we need to optimize your code. You can do it with {withFileTypes:true} option and it gets 2x faster.

Also I've tried node v20's {recursive:true} option but it's slower than even your solution. And it didn't work with withFileTypes.

Maybe a better SSD with high read speed would help. Though file entries are read from a file system index I guess, not sure how hardware affects this.

import fs from 'fs';

const DIR = '/bytex';

function getFiles(dir, files = []) {
    // Get an array of all files and directories in the passed directory using fs.readdirSync
    const fileList = fs.readdirSync(dir);
    // Create the full path of the file/directory by concatenating the passed directory and file/directory name
    for (const file of fileList) {
        const name = `${dir}/${file}`;
        // Check if the current file/directory is a directory using fs.statSync
        if (fs.statSync(name).isDirectory()) {
            // If it is a directory, recursively call the getFiles function with the directory path and the files array
            getFiles(name, files);
        } else {
            // If it is a file, push the full path to the files array
            files.push(name);
        }
    }
    return files;
}

function getFiles2(dir, files = []) {
    const fileList = fs.readdirSync(dir, { withFileTypes: true });
    fileList.forEach(file => file.isDirectory() ? getFiles2(`${dir}/${file.name}`, files) : files.push(`${dir}/${file.name}`));
    return files;
}

let start = performance.now();
let files = getFiles(DIR);
console.log(performance.now() - start);
console.log(files.length);

start = performance.now();
files = getFiles2(DIR);
console.log(performance.now() - start);
console.log(files.length);

The output:

171.66947209835052
64508
68.24071204662323
64508

huangapple
  • 本文由 发表于 2023年7月7日 02:43:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631710.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定