英文:
Parallel distinct word count in Go
问题
Jakob Østergaard提出了这个挑战:
编写一个程序,从标准输入读取文本,并返回(打印)在文本中找到的不同单词的总数。
我们如何使用并行编程来满足这个挑战(最好使用Go语言,但英文描述也可以)?
英文:
Jakob Østergaard presented this challenge:
> Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.
How can we meet this challenge with parallel programming (preferably in Go, but a description in English will suffice)?
答案1
得分: 3
有几种可能性,但我猜你的意思是“高效地”?
一般的想法是将文本分成可管理的块,将这些块堆叠到一个队列中,然后由多个消费者处理这些块。
对我来说,这看起来像是一个典型的Map/Reduce应用程序:
_ Worker_
/ \
/ \
Splitter--- Worker ---Aggregator
\ /
\_ Worker _/
理想情况下,“多个”队列应该是一个具有多个消费者的单一队列,这样如果一个工作线程变慢,整个过程的速度不会减慢太多。
我还会使用一个从Splitter到Workers的信号,让它们知道输入已经完全消耗完,并且可以开始将结果发送给Aggregator。
英文:
There are several possibilities, but I guess you mean "efficiently" ?
The general idea would be to split the text into manageable chunks, pile those chunks into a queue, and have multiple consumers handle the chunks.
This looks like a typical Map/Reduce application to me:
_ Worker_
/ \
/ \
Splitter--- Worker ---Aggregator
\ /
\_ Worker _/
Ideally the "multiple" queues would be a single one with multiple consumers, so that if one worker slows down the whole process does not slows as much.
I would also use a signal from the Splitter to the Workers to let them know the input has been fully consumed and they can start send their results to the Aggregator.
答案2
得分: 1
这是解决Jakob Østergaard的问题的Go代码。
/*
问题:编写一个程序,从标准输入中读取文本,并返回(打印)文本中找到的不同单词的总数。
C与C ++,Jakob Østergaard,2004年2月24日
http://unthought.net/c++/c_vs_c++.html
*/
package main
import (
"bufio"
"fmt"
"os"
"unicode"
)
func main() {
rdr := bufio.NewReader(os.Stdin)
words := make(map[string]bool, 1024*1024)
word := make([]int, 0, 256)
for {
r, n, _ := rdr.ReadRune()
if unicode.IsSpace(r) || n == 0 {
if len(word) > 0 {
words[string(word)] = true
word = word[:0]
}
if n == 0 {
break
}
} else {
word = append(word, r)
}
}
fmt.Println(len(words))
}
将短语“并行编程”添加到这个问题和大多数其他问题中,并期望有一些神奇的改进是天真的。从流中顺序读取输入并执行微不足道的计算不提供任何重要的并行计算机会。
英文:
Here's the solution, in Go, to Jakob Østergaard's problem.
/*
The problem: Write a program that reads text from
standard-input, and returns (prints) the total
number of distinct words found in the text.
C versus C++, Jakob Østergaard, February 24th, 2004
http://unthought.net/c++/c_vs_c++.html
*/
package main
import (
"bufio"
"fmt"
"os"
"unicode"
)
func main() {
rdr := bufio.NewReader(os.Stdin)
words := make(map[string]bool, 1024*1024)
word := make([]int, 0, 256)
for {
r, n, _ := rdr.ReadRune()
if unicode.IsSpace(r) || n == 0 {
if len(word) > 0 {
words[string(word)] = true
word = word[:0]
}
if n == 0 {
break
}
} else {
word = append(word, r)
}
}
fmt.Println(len(words))
}
It's naive to add the phrase "parallel programming" to this and most other problems and expect some magical improvement. Reading input sequentially from a stream and performing trivial computation provides no significant opportunities for parallel computing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论