在golang中快速读取文件

huangapple go评论74阅读模式
英文:

Fast file reading in golang

问题

我有一个非常大的文件,我需要处理每一行(文件的每一行都是独立的)。我应该如何使用goroutines(或者我不应该使用它们?)以最快的方式读取文件?

英文:

I have a very large file and i need to process each line (each line of file is independent). How can I use goroutines (or should I not use them?) to read the file in the fastest way?

答案1

得分: 14

只要你的硬盘比你的CPU慢几个数量级,这种情况仍然很常见,那么你不能通过投入更多的CPU周期来神奇地加快文件读取(领域:从单个硬盘)的速度。(假设文件缓存是冷的和/或文件大小远远大于所有可用的文件缓存内存)。

英文:

As long as your hard disk is orders of magnitude slower than your CPU, which is still a quite common situation, then you cannot magically make the file reading (domain: from a single HD) any faster by throwing more CPU cycles onto it. (Assuming cold file caches and/or file size much bigger then all available file cache memory).

答案2

得分: 4

由于磁盘I/O是限制因素,而不是CPU周期,使用goroutine不会在纯读取吞吐量方面获得优势。

相反,您应该在读取一行后检查是否可以使用并发。如果对一行的处理需要一些处理或等待(也许您要分析它或将其发送到其他地方?),并发可能很有用:将其传递给另一个或多个goroutine,以便在该goroutine中进行读取。

英文:

As in pretty much all cases the disk I/O is the limiting factor, and not the CPU cycles, you will not get an advantage in pure reading throughput by using goroutines.

Instead, you should check if you can use concurrency one step later, after reading a line. If your processing of a line takes a bit of processing or waiting (maybe you analyse it, or send it somewhere else?) concurrency may be useful: passing it to another or several other go routine(s) so reading can go on in this goroutine.

答案3

得分: 1

你还应该尝试读取内存页大小的数据块,以最大化吞吐量(读取两个半页比读取一个完整页要慢)。页面大小取决于您的操作系统/内核配置。

英文:

Also you should try to read memory page sized blobs of data to maximize the throughput (reading two half pages is slower than reading one full page). The page size depends on your OS/Kernel configuration.

huangapple
  • 本文由 发表于 2012年10月16日 20:19:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/12914567.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定