问题

有没有一个为Go编程语言提供支持Hadoop Streaming的知名客户端？我已经搜索过了，但没有找到有价值的东西。

英文:

Is there a well known client for the Go programming language that supports Hadoop Streaming? I have searched around and was unable to find anything of value.

答案1

得分: 3

你可以直接在Go上运行你的Hadoop流作业，我听说有人这样做，并且这里有一个例子（来自一个博客），它使用Go进行Wordcount。这是mapper部分：

package main

import (
        "bufio"
        "fmt"
        "os"
        "regexp"
)

func main() {
        /* 单词正则表达式。 */
        re, _ := regexp.Compile("[a-zA-Z0-9]+")
        reader := bufio.NewReader(os.Stdin)

        for {
                line, _, err := reader.ReadLine()
                if err != nil {
                        if err != os.EOF {
                                fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err.String())
                        }
                        break
                }
                matches := re.FindAll(line, -1)
                for _, word := range(matches) {
                        fmt.Printf("%s\t1\n", word)
                }
        }
}

这是reducer部分：

package main

import (
        "bufio"
        "bytes"
        "fmt"
        "os"
        "strconv"
)

func main() {
        counts := make(map[string]uint)
        reader := bufio.NewReader(os.Stdin)

        for {
                line, _, err := reader.ReadLine()
                if err != nil {
                        if err != os.EOF {
                                fmt.Fprintf(os.Stderr, "error: can't read - %s\n", err)
                        }
                        break
                }
                i := bytes.IndexByte(line, '\t')
                if i == -1 {
                        fmt.Fprintln(os.Stderr, "error: can't find tab")
                        continue
                }
                word := string(line[0:i])
                count, err := strconv.Atoui(string(line[i+1:]))
                if err != nil {
                        fmt.Fprintln(os.Stderr, "error: bad number - %s\n", err)
                        continue
                }

                counts[word] = counts[word] + count
        }

        /* 输出聚合计数。 */
        for word, count := range(counts) {
                fmt.Printf("%s\t%d\n", word, count)
        }
}

或者，你也可以使用dmrgo来更轻松地编写你的流作业。他们有一个可用的wordcount示例在这里。

我还看到另一个叫做gomrjob的库，但它看起来没有得到很好的维护，而且还处于alpha阶段，但如果你感到有冒险精神，可以试试看！

英文:

You could run your Hadoop streaming jobs directly on Go, I've heard of people doing it and here is an example taken from a blog that does Wordcount in Go. Here is the mapper:

<pre><code>
package main

import (
"bufio"
"fmt"
"os"
"regexp"
)

func main() {
/* Word regular experssion. */
re, _ := regexp.Compile("[a-zA-Z0-9]+")
reader := bufio.NewReader(os.Stdin)

    for {
            line, _, err := reader.ReadLine()
            if err != nil {
                    if err != os.EOF {
                            fmt.Fprintf(os.Stderr, &quot;error: can&#39;t read - %s\n&quot;, err.String())
                    }
                    break
            }
            matches := re.FindAll(line, -1)
            for _, word := range(matches) {
                    fmt.Printf(&quot;%s\t1\n&quot;, word)
            }
    }

}
</code></pre>

And here is the reducer:
<pre><code>
package main

import (
"bufio"
"bytes"
"fmt"
"os"
"strconv"
)

func main() {
counts := make(map[string]uint)
reader := bufio.NewReader(os.Stdin)

    for {
            line, _, err := reader.ReadLine()
            if err != nil {
                    if err != os.EOF {
                            fmt.Fprintf(os.Stderr, &quot;error: can&#39;t read - %s\n&quot;, err)
                    }
                    break
            }
            i := bytes.IndexByte(line, &#39;\t&#39;)
            if i == -1 {
                    fmt.Fprintln(os.Stderr, &quot;error: can&#39;t find tab&quot;)
                    continue
            }
            word := string(line[0:i])
            count, err := strconv.Atoui(string(line[i+1:]))
            if err != nil {
                    fmt.Fprintln(os.Stderr, &quot;error: bad number - %s\n&quot;, err)
                    continue
            }

            counts[word] = counts[word] + count
    }

    /* Output aggregated counts. */
    for word, count := range(counts) {
            fmt.Printf(&quot;%s\t%d\n&quot;, word, count)
    }

}
</code></pre>

Alternatively, you could also use dmrgo to make it easier to write your streaming jobs. They have a wordcount example available here.

I saw another library called gomrjob but it doesn't look very well maintained and very alpha, but you could give it a try if you feel adventurous

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go客户端用于Hadoop Streaming

问题

答案1

无法找到导入: “code.google.com/p/goauth2/oauth”

在Go语言中有更严格的编译选项。

在准备的语句中转义百分号（%）通配符。

Why my Go program creates another Go process with the name of an open file, and why it's so big?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论