使用goroutine与缓冲读取相结合,以优化读取大文件的操作。

huangapple go评论84阅读模式
英文:

Use goroutines in combination with buffered reading to optimize reading a large file

问题

给定一个要求,需要以典型的ETL(提取、转换、加载)方式处理一个大型的CSV文件(每行约300字节,以/n结尾),即每读取一行,将其拆分并组成一个JSON插入到数据库中。在处理文件时,生成一个或多个协程是否有益处?要创建一个从文件的随机位置开始读取的bufio.Scanner,需要做什么?

英文:

Given a requirement where a large csv file (about 300 bytes long lines ending in /n) needs to be processed in a typical ETL: Extract, Transform, Load fashion (each line read, split and composition of a JSON inserted in a DB). Would it be beneficial to spawn one (or more) goroutines that worked together processing the file?
What would need to be done to create a bufio.Scanner that started reading from a random position of the file?

答案1

得分: 3

是的,绝对有益。一般来说,你可以在每个 E、T、L 上同时运行 3 个 goroutine,并通过通道进行协调。

要了解更多信息,请查看 Rob Pike 的这个精彩演讲:

Concurrency is not Parallelism:https://goo.gl/cp8xgF
演讲幻灯片:http://talks.golang.org/2012/waza.slide#1

英文:

> Would it be beneficial to spawn one (or more goroutines)?

Yes, absolutely. In general, you could have 3 concurrent goroutines on each E, T, L, and have them coordinated via channels.

For more insights, check out this awesome talk from Rob Pike himself:

Concurrency is not Parallelism: https://goo.gl/cp8xgF
Talk Slides http://talks.golang.org/2012/waza.slide#1

huangapple
  • 本文由 发表于 2016年2月5日 12:49:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/35216475.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定