英文:
How to write Map/Reduce tasks in Golang?
问题
我想用Go语言编写Hadoop Map/Reduce作业(而不是使用Streaming API!)。
我尝试了解hortonworks/gohadoop和colinmarc/hdfs,但我仍然不知道如何真正编写作业。我在GitHub上搜索了导入这些模块的代码,但似乎没有相关的内容。
是否有任何地方有WordCount.go
的示例代码?
英文:
I would like to write Hadoop Map/Reduce jobs in Go (and not the Streaming API!) .
I tried to get a grasp of hortonworks/gohadoop and colinmarc/hdfs but I still don't see how to write jobs for real. I have searched on github codes importing these modules but there is nothing relevant apparently.
Is there any WordCount.go
somewhere?
答案1
得分: 2
这个GitHub链接:https://github.com/vistarmedia/gossamr 是一个使用Golang在Hadoop上运行作业的很好的例子:
代码如下:
package main
import (
"log"
"strings"
"github.com/vistarmedia/gossamr"
)
type WordCount struct{}
func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
for _, word := range strings.Fields(line) {
c.Collect(strings.ToLower(word), int64(1))
}
return nil
}
func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
var sum int64
for v := range counts {
sum += v
}
c.Collect(sum, word)
return nil
}
func main() {
wordcount := gossamr.NewTask(&WordCount{})
err := gossamr.Run(wordcount)
if err != nil {
log.Fatal(err)
}
}
启动脚本:
./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
-input /mytext.txt \
-output /output.15 \
-mapper "gossamr -task 0 -phase map" \
-reducer "gossamr -task 0 -phase reduce" \
-io typedbytes \
-file ./wordcount
-numReduceTasks 6
英文:
This github: https://github.com/vistarmedia/gossamr is a good example for starting to use a golang job on Hadoop:
Jist:
package main
import (
"log"
"strings"
"github.com/vistarmedia/gossamr"
)
type WordCount struct{}
func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
for _, word := range strings.Fields(line) {
c.Collect(strings.ToLower(word), int64(1))
}
return nil
}
func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
var sum int64
for v := range counts {
sum += v
}
c.Collect(sum, word)
return nil
}
func main() {
wordcount := gossamr.NewTask(&WordCount{})
err := gossamr.Run(wordcount)
if err != nil {
log.Fatal(err)
}
}
Kicking off the script:
./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
-input /mytext.txt \
-output /output.15 \
-mapper "gossamr -task 0 -phase map" \
-reducer "gossamr -task 0 -phase reduce" \
-io typedbytes \
-file ./wordcount
-numReduceTasks 6
答案2
得分: 0
这是一个用Golang编写的简单Map/Reduce实现的示例(可在GitHub上找到):
https://github.com/dbravender/go_mapreduce
英文:
here's a simple implementation of Map/Reduce written in Golang (available at github):
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论