英文:
Go client for Hadoop Streaming
问题
有没有一个为Go编程语言提供支持Hadoop Streaming的知名客户端?我已经搜索过了,但没有找到有价值的东西。
英文:
Is there a well known client for the Go programming language that supports Hadoop Streaming? I have searched around and was unable to find anything of value.
答案1
得分: 3
你可以直接在Go上运行你的Hadoop流作业,我听说有人这样做,并且这里有一个例子(来自一个博客),它使用Go进行Wordcount。这是mapper部分:
package main
import (
"bufio"
"fmt"
"os"
"regexp"
)
func main() {
/* 单词正则表达式。 */
re, _ := regexp.Compile("[a-zA-Z0-9]+")
reader := bufio.NewReader(os.Stdin)
for {
line, _, err := reader.ReadLine()
if err != nil {
if err != os.EOF {
fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err.String())
}
break
}
matches := re.FindAll(line, -1)
for _, word := range(matches) {
fmt.Printf("%s\t1\n", word)
}
}
}
这是reducer部分:
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strconv"
)
func main() {
counts := make(map[string]uint)
reader := bufio.NewReader(os.Stdin)
for {
line, _, err := reader.ReadLine()
if err != nil {
if err != os.EOF {
fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err)
}
break
}
i := bytes.IndexByte(line, '\t')
if i == -1 {
fmt.Fprintln(os.Stderr, "error: can't find tab")
continue
}
word := string(line[0:i])
count, err := strconv.Atoui(string(line[i+1:]))
if err != nil {
fmt.Fprintln(os.Stderr, "error: bad number - %s\n", err)
continue
}
counts[word] = counts[word] + count
}
/* 输出聚合计数。 */
for word, count := range(counts) {
fmt.Printf("%s\t%d\n", word, count)
}
}
或者,你也可以使用dmrgo来更轻松地编写你的流作业。他们有一个可用的wordcount示例在这里。
我还看到另一个叫做gomrjob的库,但它看起来没有得到很好的维护,而且还处于alpha阶段,但如果你感到有冒险精神,可以试试看!
英文:
You could run your Hadoop streaming jobs directly on Go, I've heard of people doing it and here is an example taken from a blog that does Wordcount in Go. Here is the mapper:
<pre><code>
package main
import (
"bufio"
"fmt"
"os"
"regexp"
)
func main() {
/* Word regular experssion. */
re, _ := regexp.Compile("[a-zA-Z0-9]+")
reader := bufio.NewReader(os.Stdin)
for {
line, _, err := reader.ReadLine()
if err != nil {
if err != os.EOF {
fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err.String())
}
break
}
matches := re.FindAll(line, -1)
for _, word := range(matches) {
fmt.Printf("%s\t1\n", word)
}
}
}
</code></pre>
And here is the reducer:
<pre><code>
package main
import (
"bufio"
"bytes"
"fmt"
"os"
"strconv"
)
func main() {
counts := make(map[string]uint)
reader := bufio.NewReader(os.Stdin)
for {
line, _, err := reader.ReadLine()
if err != nil {
if err != os.EOF {
fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err)
}
break
}
i := bytes.IndexByte(line, '\t')
if i == -1 {
fmt.Fprintln(os.Stderr, "error: can't find tab")
continue
}
word := string(line[0:i])
count, err := strconv.Atoui(string(line[i+1:]))
if err != nil {
fmt.Fprintln(os.Stderr, "error: bad number - %s\n", err)
continue
}
counts[word] = counts[word] + count
}
/* Output aggregated counts. */
for word, count := range(counts) {
fmt.Printf("%s\t%d\n", word, count)
}
}
</code></pre>
Alternatively, you could also use dmrgo to make it easier to write your streaming jobs. They have a wordcount example available here.
I saw another library called gomrjob but it doesn't look very well maintained and very alpha, but you could give it a try if you feel adventurous
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论