2011年6月22日 15:10:20go评论68阅读模式

英文:

Parallel distinct word count in Go

问题

Jakob Østergaard提出了这个挑战：

编写一个程序，从标准输入读取文本，并返回（打印）在文本中找到的不同单词的总数。

我们如何使用并行编程来满足这个挑战（最好使用Go语言，但英文描述也可以）？

英文:

Jakob Østergaard presented this challenge:

> Write a program that reads text from standard-input, and returns (prints) the total number of distinct words found in the text.

How can we meet this challenge with parallel programming (preferably in Go, but a description in English will suffice)?

答案1

得分: 3

有几种可能性，但我猜你的意思是“高效地”？

一般的想法是将文本分成可管理的块，将这些块堆叠到一个队列中，然后由多个消费者处理这些块。

对我来说，这看起来像是一个典型的Map/Reduce应用程序：

          _ Worker_
         /          \
        /            \
Splitter--- Worker ---Aggregator
        \            /
         \_ Worker _/

理想情况下，“多个”队列应该是一个具有多个消费者的单一队列，这样如果一个工作线程变慢，整个过程的速度不会减慢太多。

我还会使用一个从Splitter到Workers的信号，让它们知道输入已经完全消耗完，并且可以开始将结果发送给Aggregator。

英文:

There are several possibilities, but I guess you mean "efficiently" ?

The general idea would be to split the text into manageable chunks, pile those chunks into a queue, and have multiple consumers handle the chunks.

This looks like a typical Map/Reduce application to me:

          _ Worker_
         /          \
        /            \
Splitter--- Worker ---Aggregator
        \            /
         \_ Worker _/

Ideally the "multiple" queues would be a single one with multiple consumers, so that if one worker slows down the whole process does not slows as much.

I would also use a signal from the Splitter to the Workers to let them know the input has been fully consumed and they can start send their results to the Aggregator.

答案2

得分: 1

这是解决Jakob Østergaard的问题的Go代码。

/*
    问题：编写一个程序，从标准输入中读取文本，并返回（打印）文本中找到的不同单词的总数。

    C与C ++，Jakob Østergaard，2004年2月24日
    http://unthought.net/c++/c_vs_c++.html
*/

package main

import (
    "bufio"
    "fmt"
    "os"
    "unicode"
)

func main() {
    rdr := bufio.NewReader(os.Stdin)
    words := make(map[string]bool, 1024*1024)
    word := make([]int, 0, 256)
    for {
        r, n, _ := rdr.ReadRune()
        if unicode.IsSpace(r) || n == 0 {
            if len(word) > 0 {
                words[string(word)] = true
                word = word[:0]
            }
            if n == 0 {
                break
            }
        } else {
            word = append(word, r)
        }
    }
    fmt.Println(len(words))
}

将短语“并行编程”添加到这个问题和大多数其他问题中，并期望有一些神奇的改进是天真的。从流中顺序读取输入并执行微不足道的计算不提供任何重要的并行计算机会。

英文:

Here's the solution, in Go, to Jakob Østergaard's problem.

/*
	The problem: Write a program that reads text from 
	standard-input, and returns (prints) the total 
	number of distinct words found in the text. 

	C versus C++, Jakob &#216;stergaard, February 24th, 2004
	http://unthought.net/c++/c_vs_c++.html
*/

package main

import (
	&quot;bufio&quot;
	&quot;fmt&quot;
	&quot;os&quot;
	&quot;unicode&quot;
)

func main() {
	rdr := bufio.NewReader(os.Stdin)
	words := make(map[string]bool, 1024*1024)
	word := make([]int, 0, 256)
	for {
		r, n, _ := rdr.ReadRune()
		if unicode.IsSpace(r) || n == 0 {
			if len(word) &gt; 0 {
				words[string(word)] = true
				word = word[:0]
			}
			if n == 0 {
				break
			}
		} else {
			word = append(word, r)
		}
	}
	fmt.Println(len(words))
}

It's naive to add the phrase "parallel programming" to this and most other problems and expect some magical improvement. Reading input sequentially from a stream and performing trivial computation provides no significant opportunities for parallel computing.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Go中并行计算不重复单词的数量

问题

答案1

答案2

没有名称的Go函数/方法

在正则表达式中，如何在特定点过滤输入的数量？

如何在golang中从一个rune字符串中获取子字符串？

合并两个索引编号

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论