2016年2月5日 12:49:48go评论179阅读模式

英文:

Use goroutines in combination with buffered reading to optimize reading a large file

问题

给定一个要求，需要以典型的ETL（提取、转换、加载）方式处理一个大型的CSV文件（每行约300字节，以/n结尾），即每读取一行，将其拆分并组成一个JSON插入到数据库中。在处理文件时，生成一个或多个协程是否有益处？要创建一个从文件的随机位置开始读取的bufio.Scanner，需要做什么？

英文:

Given a requirement where a large csv file (about 300 bytes long lines ending in /n) needs to be processed in a typical ETL: Extract, Transform, Load fashion (each line read, split and composition of a JSON inserted in a DB). Would it be beneficial to spawn one (or more) goroutines that worked together processing the file?
What would need to be done to create a bufio.Scanner that started reading from a random position of the file?

答案1

得分: 3

是的，绝对有益。一般来说，你可以在每个 E、T、L 上同时运行 3 个 goroutine，并通过通道进行协调。

要了解更多信息，请查看 Rob Pike 的这个精彩演讲：

Concurrency is not Parallelism：https://goo.gl/cp8xgF
演讲幻灯片：http://talks.golang.org/2012/waza.slide#1

英文:

> Would it be beneficial to spawn one (or more goroutines)?

Yes, absolutely. In general, you could have 3 concurrent goroutines on each E, T, L, and have them coordinated via channels.

For more insights, check out this awesome talk from Rob Pike himself:

Concurrency is not Parallelism: https://goo.gl/cp8xgF
Talk Slides http://talks.golang.org/2012/waza.slide#1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用goroutine与缓冲读取相结合，以优化读取大文件的操作。

问题

答案1

Golang：为什么 compress/gzip 的 Read 函数不读取文件内容？

类型名称周围的括号

GoLang Gin框架的状态码不带消息体。

Golang递归修改文件和目录的权限（chmod）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论