Go客户端用于Hadoop Streaming

huangapple go评论164阅读模式
英文:

Go client for Hadoop Streaming

问题

有没有一个为Go编程语言提供支持Hadoop Streaming的知名客户端?我已经搜索过了,但没有找到有价值的东西。

英文:

Is there a well known client for the Go programming language that supports Hadoop Streaming? I have searched around and was unable to find anything of value.

答案1

得分: 3

你可以直接在Go上运行你的Hadoop流作业,我听说有人这样做,并且这里有一个例子(来自一个博客),它使用Go进行Wordcount。这是mapper部分:

package main

import (
        "bufio"
        "fmt"
        "os"
        "regexp"
)

func main() {
        /* 单词正则表达式。 */
        re, _ := regexp.Compile("[a-zA-Z0-9]+")
        reader := bufio.NewReader(os.Stdin)

        for {
                line, _, err := reader.ReadLine()
                if err != nil {
                        if err != os.EOF {
                                fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err.String())
                        }
                        break
                }
                matches := re.FindAll(line, -1)
                for _, word := range(matches) {
                        fmt.Printf("%s\t1\n", word)
                }
        }
}

这是reducer部分:

package main

import (
        "bufio"
        "bytes"
        "fmt"
        "os"
        "strconv"
)

func main() {
        counts := make(map[string]uint)
        reader := bufio.NewReader(os.Stdin)

        for {
                line, _, err := reader.ReadLine()
                if err != nil {
                        if err != os.EOF {
                                fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err)
                        }
                        break
                }
                i := bytes.IndexByte(line, '\t')
                if i == -1 {
                        fmt.Fprintln(os.Stderr, "error: can't find tab")
                        continue
                }
                word := string(line[0:i])
                count, err := strconv.Atoui(string(line[i+1:]))
                if err != nil {
                        fmt.Fprintln(os.Stderr, "error: bad number - %s\n", err)
                        continue
                }

                counts[word] = counts[word] + count
        }

        /* 输出聚合计数。 */
        for word, count := range(counts) {
                fmt.Printf("%s\t%d\n", word, count)
        }
}

或者,你也可以使用dmrgo来更轻松地编写你的流作业。他们有一个可用的wordcount示例在这里

我还看到另一个叫做gomrjob的库,但它看起来没有得到很好的维护,而且还处于alpha阶段,但如果你感到有冒险精神,可以试试看!

英文:

You could run your Hadoop streaming jobs directly on Go, I've heard of people doing it and here is an example taken from a blog that does Wordcount in Go. Here is the mapper:

<pre><code>
package main

import (
"bufio"
"fmt"
"os"
"regexp"
)

func main() {
/* Word regular experssion. */
re, _ := regexp.Compile("[a-zA-Z0-9]+")
reader := bufio.NewReader(os.Stdin)

    for {
            line, _, err := reader.ReadLine()
            if err != nil {
                    if err != os.EOF {
                            fmt.Fprintf(os.Stderr, &quot;error: can&#39;t read - %s\n&quot;, err.String())
                    }
                    break
            }
            matches := re.FindAll(line, -1)
            for _, word := range(matches) {
                    fmt.Printf(&quot;%s\t1\n&quot;, word)
            }
    }

}
</code></pre>

And here is the reducer:
<pre><code>
package main

import (
"bufio"
"bytes"
"fmt"
"os"
"strconv"
)

func main() {
counts := make(map[string]uint)
reader := bufio.NewReader(os.Stdin)

    for {
            line, _, err := reader.ReadLine()
            if err != nil {
                    if err != os.EOF {
                            fmt.Fprintf(os.Stderr, &quot;error: can&#39;t read - %s\n&quot;, err)
                    }
                    break
            }
            i := bytes.IndexByte(line, &#39;\t&#39;)
            if i == -1 {
                    fmt.Fprintln(os.Stderr, &quot;error: can&#39;t find tab&quot;)
                    continue
            }
            word := string(line[0:i])
            count, err := strconv.Atoui(string(line[i+1:]))
            if err != nil {
                    fmt.Fprintln(os.Stderr, &quot;error: bad number - %s\n&quot;, err)
                    continue
            }

            counts[word] = counts[word] + count
    }

    /* Output aggregated counts. */
    for word, count := range(counts) {
            fmt.Printf(&quot;%s\t%d\n&quot;, word, count)
    }

}
</code></pre>

Alternatively, you could also use dmrgo to make it easier to write your streaming jobs. They have a wordcount example available here.

I saw another library called gomrjob but it doesn't look very well maintained and very alpha, but you could give it a try if you feel adventurous Go客户端用于Hadoop Streaming

huangapple
  • 本文由 发表于 2013年5月23日 02:12:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/16698825.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定