what is best optimized way in Java to get latest (Descending Last-Modified) n files – without loading all files of a large directory

huangapple go评论63阅读模式
英文:

what is best optimized way in Java to get latest (Descending Last-Modified) n files - without loading all files of a large directory

问题

目标是获取最新的100个文件。
目前是通过扫描所有文件 - 准备一个文件列表 - 然后应用排序+限制来完成的。

这样做非常慢 - 尤其是在目录非常大的情况下。是否有任何方法或可用的API可以在不加载完整文件列表的情况下完成这一操作。

目前以下三种方法在文件数在几千个范围内时性能不理想。

  • Files.listFiles - Java 1.2
  • DirectoryStream - Java 1.7
  • Files.Walk - Java 1.8
英文:

Aim is to get latest 100 files.
Currently it is done by scanning all files - preparing a files list - and then apply sort+limit.

this is very slow - in cases when directory is too large. So is there any way or API available which does this without loading full file list.

Currently following three approaches do not give satisfactory performance when files are in range of few thousands.

  • Files.listFiles - Java 1.2
  • DirectoryStream - Java 1.7
  • Files.Walk - Java 1.8

答案1

得分: 3

你需要查看每个文件的属性以找到它们的年龄,并且你必须查看所有文件以找到最新的 N 个文件。

你唯一可以选择的自由是在如何进行查找方面。例如,没有必要读取文件内容。

我建议考虑使用Files.find()。根据其文档,它似乎可以做到最少的必要工作。

你不需要保存所有的文件。追踪最新的100个文件中最旧的文件。如果“下一个”文件比这个还要旧,你就不需要保留它。否则,你必须找出要丢弃这100个文件中的哪一个。这是在保留整个列表的开销与决定要丢弃哪个的开销之间的权衡。如果文件的数量远大于100,这可能对你有利。

在某种程度上,开销取决于文件系统。如果最后修改的时间存储在目录条目中,那么没有必要查看inode来获取它。当然,这不在你的控制之内。

英文:

You have to look at the attributes of each file to find its age, and you have to look at them all to find the N newest.

Your only freedom of choice is in how you do the looking. There's no need to read the file contents, for example.

I'd consider using Files.find(). This appears from its documentation to do the minimum work required.

You don't need to save all files. Track the oldest of the newest 100 seen. If the 'next' file is older than that, you don't need to keep it. Otherwise you have to figure out which of the 100 to discard. This trades off overhead of keeping an entire list for overhead of deciding what to discard. It could work in your favour if the number of files is much larger than 100.

To some extent the overhead is file-system dependent. If the last-modified time is stored in the directory entry then there's no need to look at the inode to get it. That's not under your control, of course.

huangapple
  • 本文由 发表于 2020年10月20日 19:38:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/64444368.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定