英文:
Go read text with goroutine
问题
我想使用goroutines读取文本文件。读取文件的文本顺序并不重要。如何使用并发读取文件?
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
例如,如果文本文件包含I like Go
,我希望无需考虑顺序地读取该文件。结果可能是[]string{"Go", "like", "I"}
。
英文:
I want to read text file with goroutines. The order of text that gets read from a file does not matter. How do I read a file with concurrency?
scanner := bufio.NewScanner(file)
for scanner.Scan() {
lines = append(lines, scanner.Text())
}
For example, if the text file contains I like Go
, I want to read this file without concerning the order. It could be []string{"Go", "like", "I"}
答案1
得分: 3
首先,如果你从io.Reader中读取内容,可以将其视为从流中读取。它是单一的输入源,由于其特性,你无法“并行读取”它——在底层,你会逐个获取字节,等待下一个字节,再获取下一个字节,依此类推。将其分词成单词是在缓冲区中进行的。
其次,我希望你不要试图以“让我们添加协程,一切都会加速”的方式将goroutines当作“万能解决方案”。如果Go语言提供了如此简单的并发方式,并不意味着你应该在任何地方都使用它。
最后,如果你确实需要将大型文件并行分割成单词,并且认为分割部分将成为瓶颈(不了解你的情况,但我真的怀疑这一点)——那么你必须发明自己的算法,并使用“os”包来对文件的部分进行Seek()/Read()操作,每个部分由自己的goroutine处理,并以某种方式跟踪哪些部分已经被处理过。
英文:
First of all, if you're reading from io.Reader consider it as reading from the stream. It's the single input source, which you can't 'read in parallel' because of it's nature - under the hood, you're getting byte, waiting for another one, getting one more and so on. Tokenizing it in words comes later, in buffer.
Second, I hope you're not trying to use goroutines as a 'silver bullet' in a 'let's add gouroutines and everything will just speed up' manner. If Go gives you such an easy way to use concurrency, it doesn't mean you should use it everywhere.
And finally, if you really need to split huge file into words in parallel and you think that splitting part will be the bottleneck (don't know your case, but I really doubt that) - then you have to invent your own algorithm and use 'os' package to Seek()/Read() parts of the file, each processed by it's own gouroutine and track somehow which parts were already processed.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论