高效列出具有大量条目的目录中的文件

huangapple go评论78阅读模式
英文:

Efficiently listing files in a directory having very many entries

问题

我需要递归读取一个目录结构,但是在读取完每个目录的所有条目后,我还需要执行额外的步骤。因此,我需要编写自己的递归逻辑(不能使用简单的filepath.Walk例程)。然而,ioutil.ReadDirfilepath.Glob例程只返回切片。如果我正在推动ext4xfs的极限,并且有一个包含数十亿个文件的目录,该怎么办?我希望golang有一个函数,它返回一个未排序的os.FileInfo系列(或者更好的是原始字符串)通过通道而不是排序的切片。在这种情况下,我们如何高效地读取文件条目?

上述所有的函数似乎都依赖于os/dir_unix.go中的readdirnames函数,但是由于某种原因,它只创建了一个数组,而似乎很容易生成一个gothread并将值推送到一个通道中。可能有合理的逻辑来做这个,但是不清楚是什么。我是Go的新手,所以也可能错过了一些对其他人来说很明显的原则。

以下是方便起见的源代码:

func (f *File) readdirnames(n int) (names []string, err error) {
    // If this file has no dirinfo, create one.
    if f.dirinfo == nil {
        f.dirinfo = new(dirInfo)
        // The buffer must be at least a block long.
        f.dirinfo.buf = make([]byte, blockSize)
    }
    d := f.dirinfo

    size := n
    if size <= 0 {
        size = 100
        n = -1
    }

    names = make([]string, 0, size) // Empty with room to grow.
    for n != 0 {
        // Refill the buffer if necessary
        if d.bufp >= d.nbuf {
            d.bufp = 0
            var errno error
            d.nbuf, errno = fixCount(syscall.ReadDirent(f.fd, d.buf))
            if errno != nil {
                return names, NewSyscallError("readdirent", errno)
            }
            if d.nbuf <= 0 {
                break // EOF
            }
        }

        // Drain the buffer
        var nb, nc int
        nb, nc, names = syscall.ParseDirent(d.buf[d.bufp:d.nbuf], n, names)
        d.bufp += nb
        n -= nc
    }
    if n >= 0 && len(names) == 0 {
        return names, io.EOF
    }
    return names, nil
}
英文:

I need to recursively read a directory structure, but I also need to perform an additional step once I have read through all entries for each directory. Therefore, I need to write my own recursion logic (and can't use the simplistic filepath.Walk routine). However, the ioutil.ReadDir and filepath.Glob routines only return slices. What if I'm pushing the limits of ext4 or xfs and have a directory with files numbering into the billions? I would expect golang to have a function that returns an unsorted series of os.FileInfo (or, even better, raw strings) over a channel rather than a sorted slice. How do we efficiently read file entries in this case?

All of the functions cited above seem to rely on readdirnames in os/dir_unix.go, and, for some reason, it just makes an array when it seems like it would've been easy to spawn a gothread and and push the values into a channel. There might have been sound logic to do this, but it's not clear what it is. I'm new to Go, so I also could've easy missed some principle that's obvious to everyone else.

This is the sourcecode, for convenience:

func (f *File) readdirnames(n int) (names []string, err error) {
	// If this file has no dirinfo, create one.
	if f.dirinfo == nil {
		f.dirinfo = new(dirInfo)
		// The buffer must be at least a block long.
		f.dirinfo.buf = make([]byte, blockSize)
	}
	d := f.dirinfo

	size := n
	if size &lt;= 0 {
		size = 100
		n = -1
	}

	names = make([]string, 0, size) // Empty with room to grow.
	for n != 0 {
		// Refill the buffer if necessary
		if d.bufp &gt;= d.nbuf {
			d.bufp = 0
			var errno error
			d.nbuf, errno = fixCount(syscall.ReadDirent(f.fd, d.buf))
			if errno != nil {
				return names, NewSyscallError(&quot;readdirent&quot;, errno)
			}
			if d.nbuf &lt;= 0 {
				break // EOF
			}
		}

		// Drain the buffer
		var nb, nc int
		nb, nc, names = syscall.ParseDirent(d.buf[d.bufp:d.nbuf], n, names)
		d.bufp += nb
		n -= nc
	}
	if n &gt;= 0 &amp;&amp; len(names) == 0 {
		return names, io.EOF
	}
	return names, nil
}

答案1

得分: 5

ioutil.ReadDirfilepath.Glob只是方便的函数,用于读取目录条目。

如果您提供一个大于0的n参数,您可以直接使用ReaddirReaddirnames方法以批量方式读取目录条目。

对于像读取目录条目这样基本的操作,没有必要增加goroutine和通道的开销,并提供一种替代的错误返回方式。如果您希望,您总是可以使用自己的goroutine和通道模式来包装批量调用。

英文:

ioutil.ReadDir and filepath.Glob are just convenience functions around reading directory entries.

You can read directory entries in batches by directly using the Readdir or Readdirnames methods, if you supply an n argument > 0.

For something as basic as reading directory entries, there's no need to add the overhead of a goroutine and channel, and also provide an alternate way to return the error. You can always wrap the batched calls with your own goroutine and channel pattern if you wish.

huangapple
  • 本文由 发表于 2015年12月29日 22:51:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/34513460.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定