2017年7月20日 14:35:29go评论98阅读模式

英文:

Simple mapReduce operation on strings

问题

我有一个字符串列表：

elems := [n]string{...}

我想执行一个简单的mapReduce操作，使得：

将每个字符串映射到一个不同的字符串，比如 string -> $string
将所有字符串缩减为一个带有分隔符的字符串，例如 {s1, s2, s3} -> s1@s2@s3

总之：{s1, s2, s3} -> $s1@$s2@$s3

最佳的方法是什么？

我希望既高效又易读。

如果这个方法足够通用，不仅适用于字符串，那就更好了。

英文:

I have a list of strings

elems := [n]string{...}

I want to perform a simple mapReduce operation, such that I

Map every string to a different string, let's say string -> $string
Reduce all the strings to one string with a separator, e.g. {s1, s2, s3} -> s1@s2@s3

all in all: {s1, s2, s3} -> $s1@$s2@$s3

What's the best way to do this?

I'm looking for efficiency and readability

Bonus points if it's generic enough to work not only on strings

答案1

得分: 2

对于仅映射一个列表，你没有太多选择，只能遍历每个字符串。如果转换算法耗时且你需要速度，可以考虑拆分任务并使用go routine。最后，你可以使用strings.Join函数，该函数有一个选项可以指定分隔符，通常可以高效地执行reduce操作。数据集的大小也是一个考虑因素，对于较大的列表，你可能希望将strings.Join和自定义算法的性能进行比较，并决定是否使用多个go routine/通道来实现你想要的结果。

英文:

For mapping just a list, you won't have much choice other than to go over each string. If the transform algo is time-consuming and you need speed, you can consider splitting the job and use a go routine. Finally you can use the strings.Join function which has an option to specify a separator, this normally performs the reduce part efficiently. The size of the dataset can also be a consideration, and for larger sized lists you may want to compare performance with strings.Join and your own customized algo and see if you want to use multiple go routines/channels to achieve what you want to.

答案2

得分: 1

如果您不需要分别执行这两个操作，只需使用strings.Join()函数即可实现最终结果：

package main

import (
	"fmt"
	"strings"
)

func main() {
	a := []string{"a", "b", "c"}
	p := "$"
	fmt.Println(p + strings.Join(a[:], "@"+p))
}

输出结果为$a@$b@$c

playground

英文:

If you don't need to do the 2 things separately, the end result can be achieved simply by using strings.Join():

package main

import (
	&quot;fmt&quot;
	&quot;strings&quot;
)

func main() {
	a := []string{&quot;a&quot;, &quot;b&quot;, &quot;c&quot;}
	p := &quot;$&quot;
	fmt.Println(p + strings.Join(a[:], &quot;@&quot;+p))
}

prints $a@$b@$c

playground

答案3

得分: 0

Go不是一种显式的函数式编程语言。

你可以使用for循环来进行映射和归约。

a := []string{"a", "b", "c"}
result := "initvalue"
for n, i := range a {
  result += i + string(n)
}

请注意，以上是翻译的内容，不包括代码部分。

英文:

Go is explicitly NOT a functional programming language.

You map and reduce using a for loop.

a := []string{&quot;a&quot;, &quot;b&quot;, &quot;c&quot;}
result := &quot;initvalue&quot;
for n, i := range a {
  result += i + string(n)
}

答案4

得分: 0

如果您的map函数不执行任何IO操作（即它们只进行一些计算），将其并发化肯定会使其变慢，即使您正在执行一些IO操作，您也应该进行基准测试。并发不一定会使事情变得更快，有时会增加不必要的复杂性。在许多情况下，一个简单的for循环就足够了。

如果这里的map函数是IO绑定的，或者正在执行一些计算密集型的计算，并且从并发中受益，解决方案可能会有所不同。例如，可以使用NATS来超越一台机器并分发工作负载。

这是一个相对简单的示例。减少阶段不是多阶段的，是阻塞的：

import (
	"fmt"
	"strings"
	"sync"
	"testing"

	"github.com/stretchr/testify/assert"
)

type elem struct {
	index int
	value interface{}
}

func feed(elems []interface{}) <-chan elem {
	result := make(chan elem)
	go func() {
		for k, v := range elems {
			e := elem{
				index: k,
				value: v,
			}
			result <- e
		}
		close(result)
	}()
	return result
}

func mapf(
	input <-chan elem,
	mapFunc func(elem) elem) <-chan elem {
	result := make(chan elem)
	go func() {
		for e := range input {
			eres := mapFunc(e)
			result <- eres
		}
		close(result)
	}()
	return result
}

// is blocking
func reducef(
	input <-chan elem,
	reduceFunc func([]interface{}) interface{}) interface{} {
	buffer := make(map[int]interface{})
	l := 0
	for v := range input {
		buffer[v.index] = v.value
		if v.index > l {
			l = v.index
		}
	}
	data := make([]interface{}, l+1)
	for k, v := range buffer {
		data[k] = v
	}

	return reduceFunc(data)
}

func fanOutIn(
	elemFeed <-chan elem,
	mapFunc func(elem) elem, mapCount int,
	reduceFunc func([]interface{}) interface{}) interface{} {
	MR := make(chan elem)
	wg := &sync.WaitGroup{}
	for i := 0; i < mapCount; i++ {
		mapResult := mapf(elemFeed, mapFunc)

		wg.Add(1)
		go func() {
			defer wg.Done()
			for v := range mapResult {
				MR <- v
			}
		}()
	}
	go func() {
		wg.Wait()
		close(MR)
	}()
	return reducef(MR, reduceFunc)
}

func Test01(t *testing.T) {
	elemFeed := feed([]interface{}{1, 2, 3})
	finalResult := fanOutIn(
		elemFeed,
		func(e elem) elem {
			return elem{
				index: e.index,
				value: fmt.Sprintf("[%v]", e.value),
			}
		},
		3,
		func(sl []interface{}) interface{} {
			strRes := make([]string, len(sl))
			for k, v := range sl {
				strRes[k] = v.(string)
			}
			return strings.Join(strRes, ":")
		})
	assert.Equal(t, "[1]:[2]:[3]", finalResult)
}

由于它使用interface{}作为元素类型，因此可以进行泛化。

英文:

If you are not going to perform any sort of IO operations inside your map functions (means they are doing just some computations), making it concurrent would make it slower for sure and even if you are doing some IO, you should benchmark. Concurrency would not make things faster necessarily and some times add unnecessary complications. In many cases just a simple for loop is sufficient.

If the map functions here are IO bound or are doing some sort of computation heavy calculations that do benefit from going concurrent, solutions can vary. For example NATS can be used to go beyond one machine and distribute the workload.

This is a relatively simple sample. Reduce phase is not multistage and is blocking:

import (
&quot;fmt&quot;
&quot;strings&quot;
&quot;sync&quot;
&quot;testing&quot;
&quot;github.com/stretchr/testify/assert&quot;
)
type elem struct {
index int
value interface{}
}
func feed(elems []interface{}) &lt;-chan elem {
result := make(chan elem)
go func() {
for k, v := range elems {
e := elem{
index: k,
value: v,
}
result &lt;- e
}
close(result)
}()
return result
}
func mapf(
input &lt;-chan elem,
mapFunc func(elem) elem) &lt;-chan elem {
result := make(chan elem)
go func() {
for e := range input {
eres := mapFunc(e)
result &lt;- eres
}
close(result)
}()
return result
}
// is blocking
func reducef(
input &lt;-chan elem,
reduceFunc func([]interface{}) interface{}) interface{} {
buffer := make(map[int]interface{})
l := 0
for v := range input {
buffer[v.index] = v.value
if v.index &gt; l {
l = v.index
}
}
data := make([]interface{}, l+1)
for k, v := range buffer {
data[k] = v
}
return reduceFunc(data)
}
func fanOutIn(
elemFeed &lt;-chan elem,
mapFunc func(elem) elem, mapCount int,
reduceFunc func([]interface{}) interface{}) interface{} {
MR := make(chan elem)
wg := &amp;sync.WaitGroup{}
for i := 0; i &lt; mapCount; i++ {
mapResult := mapf(elemFeed, mapFunc)
wg.Add(1)
go func() {
defer wg.Done()
for v := range mapResult {
MR &lt;- v
}
}()
}
go func() {
wg.Wait()
close(MR)
}()
return reducef(MR, reduceFunc)
}
func Test01(t *testing.T) {
elemFeed := feed([]interface{}{1, 2, 3})
finalResult := fanOutIn(
elemFeed,
func(e elem) elem {
return elem{
index: e.index,
value: fmt.Sprintf(&quot;[%v]&quot;, e.value),
}
},
3,
func(sl []interface{}) interface{} {
strRes := make([]string, len(sl))
for k, v := range sl {
strRes[k] = v.(string)
}
return strings.Join(strRes, &quot;:&quot;)
})
assert.Equal(t, &quot;[1]:[2]:[3]&quot;, finalResult)
}

And since it uses interface{} as the element type, it can get generalized.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

简单的字符串MapReduce操作

问题

答案1

答案2

答案3

答案4

防止 json.Marshal 移除时间戳后面的零

GO是一个复杂的嵌套结构。

如何在Beg ORM中获取最终的SQL字符串

嵌套模板路径作为变量

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论