如何在Golang中编写Map/Reduce任务?

huangapple go评论74阅读模式
英文:

How to write Map/Reduce tasks in Golang?

问题

我想用Go语言编写Hadoop Map/Reduce作业(而不是使用Streaming API!)。

我尝试了解hortonworks/gohadoopcolinmarc/hdfs,但我仍然不知道如何真正编写作业。我在GitHub上搜索了导入这些模块的代码,但似乎没有相关的内容。

是否有任何地方有WordCount.go的示例代码?

英文:

I would like to write Hadoop Map/Reduce jobs in Go (and not the Streaming API!) .

I tried to get a grasp of hortonworks/gohadoop and colinmarc/hdfs but I still don't see how to write jobs for real. I have searched on github codes importing these modules but there is nothing relevant apparently.

Is there any WordCount.go somewhere?

答案1

得分: 2

这个GitHub链接:https://github.com/vistarmedia/gossamr 是一个使用Golang在Hadoop上运行作业的很好的例子:

代码如下:

package main

import (
  "log"
  "strings"

  "github.com/vistarmedia/gossamr"
)

type WordCount struct{}

func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
  for _, word := range strings.Fields(line) {
    c.Collect(strings.ToLower(word), int64(1))
  }
  return nil
}

func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
  var sum int64
  for v := range counts {
    sum += v
  }
  c.Collect(sum, word)
  return nil
}

func main() {
  wordcount := gossamr.NewTask(&WordCount{})

  err := gossamr.Run(wordcount)
  if err != nil {
    log.Fatal(err)
  }
}

启动脚本:

./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
  -input /mytext.txt \
  -output /output.15 \
  -mapper "gossamr -task 0 -phase map" \
  -reducer "gossamr -task 0 -phase reduce" \
  -io typedbytes \
  -file ./wordcount
  -numReduceTasks 6
英文:

This github: https://github.com/vistarmedia/gossamr is a good example for starting to use a golang job on Hadoop:

Jist:

package main

import (
  "log"
  "strings"

  "github.com/vistarmedia/gossamr"
)

type WordCount struct{}

func (wc *WordCount) Map(p int64, line string, c gossamr.Collector) error {
  for _, word := range strings.Fields(line) {
    c.Collect(strings.ToLower(word), int64(1))
  }
  return nil
}

func (wc *WordCount) Reduce(word string, counts chan int64, c gossamr.Collector) error {
  var sum int64
  for v := range counts {
    sum += v
  }
  c.Collect(sum, word)
  return nil
}

func main() {
  wordcount := gossamr.NewTask(&WordCount{})

  err := gossamr.Run(wordcount)
  if err != nil {
    log.Fatal(err)
  }
}

Kicking off the script:

./bin/hadoop jar ./contrib/streaming/hadoop-streaming-1.2.1.jar \
  -input /mytext.txt \
  -output /output.15 \
  -mapper "gossamr -task 0 -phase map" \
  -reducer "gossamr -task 0 -phase reduce" \
  -io typedbytes \
  -file ./wordcount
  -numReduceTasks 6

答案2

得分: 0

这是一个用Golang编写的简单Map/Reduce实现的示例(可在GitHub上找到):

https://github.com/dbravender/go_mapreduce

英文:

here's a simple implementation of Map/Reduce written in Golang (available at github):

https://github.com/dbravender/go_mapreduce

huangapple
  • 本文由 发表于 2015年8月5日 20:15:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/31832266.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定