英文:
Use goroutines in combination with buffered reading to optimize reading a large file
问题
给定一个要求,需要以典型的ETL(提取、转换、加载)方式处理一个大型的CSV文件(每行约300字节,以/n结尾),即每读取一行,将其拆分并组成一个JSON插入到数据库中。在处理文件时,生成一个或多个协程是否有益处?要创建一个从文件的随机位置开始读取的bufio.Scanner,需要做什么?
英文:
Given a requirement where a large csv file (about 300 bytes long lines ending in /n) needs to be processed in a typical ETL: Extract, Transform, Load fashion (each line read, split and composition of a JSON inserted in a DB). Would it be beneficial to spawn one (or more) goroutines that worked together processing the file?
What would need to be done to create a bufio.Scanner
that started reading from a random position of the file?
答案1
得分: 3
是的,绝对有益。一般来说,你可以在每个 E、T、L 上同时运行 3 个 goroutine,并通过通道进行协调。
要了解更多信息,请查看 Rob Pike 的这个精彩演讲:
Concurrency is not Parallelism
:https://goo.gl/cp8xgF
演讲幻灯片
:http://talks.golang.org/2012/waza.slide#1
英文:
> Would it be beneficial to spawn one (or more goroutines)?
Yes, absolutely. In general, you could have 3 concurrent goroutines on each E, T, L, and have them coordinated via channels.
For more insights, check out this awesome talk from Rob Pike himself:
Concurrency is not Parallelism
: https://goo.gl/cp8xgF
Talk Slides
http://talks.golang.org/2012/waza.slide#1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论