在Go中并行计算不重复单词的数量

huangapple go评论68阅读模式
英文:

Parallel distinct word count in Go

问题

Jakob Østergaard提出了这个挑战:

编写一个程序,从标准输入读取文本,并返回(打印)在文本中找到的不同单词的总数。

我们如何使用并行编程来满足这个挑战(最好使用Go语言,但英文描述也可以)?

英文:

Jakob Østergaard presented this challenge:

> Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.

How can we meet this challenge with parallel programming (preferably in Go, but a description in English will suffice)?

答案1

得分: 3

有几种可能性,但我猜你的意思是“高效地”?

一般的想法是将文本分成可管理的块,将这些块堆叠到一个队列中,然后由多个消费者处理这些块。

对我来说,这看起来像是一个典型的Map/Reduce应用程序:

          _ Worker_
         /          \
        /            \
Splitter--- Worker ---Aggregator
        \            /
         \_ Worker _/

理想情况下,“多个”队列应该是一个具有多个消费者的单一队列,这样如果一个工作线程变慢,整个过程的速度不会减慢太多。

我还会使用一个从Splitter到Workers的信号,让它们知道输入已经完全消耗完,并且可以开始将结果发送给Aggregator。

英文:

There are several possibilities, but I guess you mean "efficiently" ?

The general idea would be to split the text into manageable chunks, pile those chunks into a queue, and have multiple consumers handle the chunks.

This looks like a typical Map/Reduce application to me:

          _ Worker_
         /          \
        /            \
Splitter--- Worker ---Aggregator
        \            /
         \_ Worker _/

Ideally the "multiple" queues would be a single one with multiple consumers, so that if one worker slows down the whole process does not slows as much.

I would also use a signal from the Splitter to the Workers to let them know the input has been fully consumed and they can start send their results to the Aggregator.

答案2

得分: 1

这是解决Jakob Østergaard的问题的Go代码。

/*
    问题:编写一个程序,从标准输入中读取文本,并返回(打印)文本中找到的不同单词的总数。

    C与C ++,Jakob Østergaard,2004年2月24日
    http://unthought.net/c++/c_vs_c++.html
*/

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

func main() {
    rdr := bufio.NewReader(os.Stdin)
    words := make(map[string]bool, 1024*1024)
    word := make([]int, 0, 256)
    for {
        r, n, _ := rdr.ReadRune()
        if unicode.IsSpace(r) || n == 0 {
            if len(word) > 0 {
                words[string(word)] = true
                word = word[:0]
            }
            if n == 0 {
                break
            }
        } else {
            word = append(word, r)
        }
    }
    fmt.Println(len(words))
}

将短语“并行编程”添加到这个问题和大多数其他问题中,并期望有一些神奇的改进是天真的。从流中顺序读取输入并执行微不足道的计算不提供任何重要的并行计算机会。

英文:

Here's the solution, in Go, to Jakob Østergaard's problem.

/*
	The problem: Write a program that reads text from 
	standard-input, and returns (prints) the total 
	number of distinct words found in the text. 

	C versus C++, Jakob Østergaard, February 24th, 2004
	http://unthought.net/c++/c_vs_c++.html
*/

package main

import (
	"bufio"
	"fmt"
	"os"
	"unicode"
)

func main() {
	rdr := bufio.NewReader(os.Stdin)
	words := make(map[string]bool, 1024*1024)
	word := make([]int, 0, 256)
	for {
		r, n, _ := rdr.ReadRune()
		if unicode.IsSpace(r) || n == 0 {
			if len(word) > 0 {
				words[string(word)] = true
				word = word[:0]
			}
			if n == 0 {
				break
			}
		} else {
			word = append(word, r)
		}
	}
	fmt.Println(len(words))
}

It's naive to add the phrase "parallel programming" to this and most other problems and expect some magical improvement. Reading input sequentially from a stream and performing trivial computation provides no significant opportunities for parallel computing.

huangapple
  • 本文由 发表于 2011年6月22日 15:10:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/6436202.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定