问题

我有一个非常大的文件，我需要处理每一行（文件的每一行都是独立的）。我应该如何使用goroutines（或者我不应该使用它们？）以最快的方式读取文件？

英文:

I have a very large file and i need to process each line (each line of file is independent). How can I use goroutines (or should I not use them?) to read the file in the fastest way?

答案1

得分: 14

只要你的硬盘比你的CPU慢几个数量级，这种情况仍然很常见，那么你不能通过投入更多的CPU周期来神奇地加快文件读取（领域：从单个硬盘）的速度。（假设文件缓存是冷的和/或文件大小远远大于所有可用的文件缓存内存）。

英文:

As long as your hard disk is orders of magnitude slower than your CPU, which is still a quite common situation, then you cannot magically make the file reading (domain: from a single HD) any faster by throwing more CPU cycles onto it. (Assuming cold file caches and/or file size much bigger then all available file cache memory).

答案2

得分: 4

由于磁盘I/O是限制因素，而不是CPU周期，使用goroutine不会在纯读取吞吐量方面获得优势。

相反，您应该在读取一行后检查是否可以使用并发。如果对一行的处理需要一些处理或等待（也许您要分析它或将其发送到其他地方？），并发可能很有用：将其传递给另一个或多个goroutine，以便在该goroutine中进行读取。

英文:

As in pretty much all cases the disk I/O is the limiting factor, and not the CPU cycles, you will not get an advantage in pure reading throughput by using goroutines.

Instead, you should check if you can use concurrency one step later, after reading a line. If your processing of a line takes a bit of processing or waiting (maybe you analyse it, or send it somewhere else?) concurrency may be useful: passing it to another or several other go routine(s) so reading can go on in this goroutine.

答案3

得分: 1

你还应该尝试读取内存页大小的数据块，以最大化吞吐量（读取两个半页比读取一个完整页要慢）。页面大小取决于您的操作系统/内核配置。

英文:

Also you should try to read memory page sized blobs of data to maximize the throughput (reading two half pages is slower than reading one full page). The page size depends on your OS/Kernel configuration.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在golang中快速读取文件

问题

答案1

答案2

答案3

可以编写一个没有返回值或返回类型为Nil的函数。建议使用其他替代方案。

在Google Go中将重命名类型转换

数据存储事务 – 达到实体写入限制

how to use go channel to handle 2 process concurrently

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论