代码块在 goroutine 中产生了奇怪的错误结果。

huangapple go评论85阅读模式
英文:

code block in goroutine produces strange wrong results

问题

我有一个大的N*1名称数组。
我目前正在使用goroutine来计算每个名称之间的编辑距离。

问题是[B] [C]处的结果不同,可能是这样的:

ABC BCD 7
ABC BCD 3

名称中有20000个记录。

将名称分成两个块:

var names []string

通道:

ch := make(chan int)
var wg sync.WaitGroup

for i := 0; i < procs; i++ { //创建两个goroutine
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //获取切片的起始和结束位置
wg.Add(1)
go func(slices []string, allnames []string) {
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name) //获取LD [A]
sum += 1
if distance > 0 && distance < minDistance {
minDistance = distance
fmt.Println(slice, name, distance) //[B]
fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
} else if distance == minDistance {
fmt.Println(slice, name, distance)
fmt.Println(slice, name, calcEditDist(slice, name))
}
}
ch <- sum
break
}
wg.Done()
}(names[start:end], names)
}

我将calcEditDist放在https://github.com/copywrite/keyboardDistance/blob/master/parallel.go中。

PS:
如果我在calcEditDist中将var dp [max][max]int声明为局部变量而不是全局变量,结果是正确的,但速度非常慢。

更新1:
感谢大家的建议,我按照以下三个步骤进行了改进:
1)我将dp缩小到一个更合理的大小,比如100甚至更小,已完成。
2)我在每个goroutine中放置了dp的声明,并按照Nick的建议传递其指针,已完成。
3)稍后我将尝试动态分配dp,以后再说。

性能大幅提升,╰(°▽°)╯

英文:

i have a big N*1 name array
i am currently using goroutine to calculate the edit distance of a name among each other

the question is the results at [B] [C] are different, maybe like

ABC BCD 7
ABC BCD 3

there are 20000 records in names

var names []string

divide names into two chunks

nameCount := len(names)  
procs := 2  
chunkSize := nameCount / procs 

channel

ch := make(chan int)  
var wg sync.WaitGroup


for i := 0; i &lt; procs; i++ { //create two goroutines
	start := i * chunkSize
	end := (i+1)*chunkSize - 1
	fmt.Println(start, end) //get slice start and end
	wg.Add(1)
	go func(slices []string, allnames []string) {
		for _, slice := range slices {
			minDistance = 256
			distance := 0
			sum := 0
			for _, name := range allnames {
				distance = calcEditDist(slice, name) //get the LD [A]
				sum += 1
				if distance &gt; 0 &amp;&amp; distance &lt; minDistance {
					minDistance = distance
					fmt.Println(slice, name, distance) //[B]
					fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
				} else if distance == minDistance {
					fmt.Println(slice, name, distance)
					fmt.Println(slice, name, calcEditDist(slice, name))
				}
			}
			// for _, name := range allnames {
			// 	fmt.Println(slice, name)
			// }
			ch &lt;- sum
			// fmt.Println(len(allnames), slice)
			break
		}
		wg.Done()
	}(names[start:end], names)
}

i placed the calcEditDist @https://github.com/copywrite/keyboardDistance/blob/master/parallel.go

PS:
if i declare

var dp [max][max]int

in calcEditDist as local variable instead of global, the results are right, but is incredibly slow

UPDATE 1
Thanks all buddy,i take the great advice below in three steps

  1. i shrinked the dp to a much reasonable size, like 100 or even smaller, DONE
  2. i put the dp declaration in each goroutine and pass its pointer as Nick said, DONE
  3. later i will try to dynamically alloc dp, LATER

the performance improved steeply, ╰(°▽°)╯

答案1

得分: 1

正如您在帖子中指出的那样,将dp作为全局变量是问题所在。

CalcEditDistance中每次分配它都太慢了。

您有两个可能的解决方案。

1)每个go例程只需要一个dp数组,因此在for循环中分配它,并传递一个指向它的指针(不要直接传递数组,因为数组按值传递会涉及大量的复制!)

for i := 0; i < procs; i++ { //创建两个goroutine
    start := i * chunkSize
    end := (i+1)*chunkSize - 1
    fmt.Println(start, end) //获取切片的起始和结束位置
    wg.Add(1)
    go func(slices []string, allnames []string) {
        var dp [max][max]int // 分配
        for _, slice := range slices {
            minDistance = 256
            distance := 0
            sum := 0
            for _, name := range allnames {
                distance = calcEditDist(slice, name, &dp) // 在这里传递dp指针

calcEditDist更改为接受dp参数

func CalcEditDist(A string, B string, dp *[max][max]int) int {
    lenA := len(A)
    lenB := len(B)

2)重写您的calcEditDistance,使其不需要庞大的O(N^2)的dp数组。

如果仔细研究该函数,您会发现它只会访问上一行和左边的一列,因此您实际上只需要一个先前的行和一个先前的列,您可以以非常小的代价动态分配它们。这样它就可以适应任意长度的字符串。

不过,这需要一些仔细的思考!

英文:

As you've identified in your posting, having dp as a global variable is the problem.

Allocating it each time in CalcEditDistance is too slow.

You have two possible solutions.

  1. you only need 1 dp array per go-routine, so allocate it in the for loop loop and pass a pointer to it (don't pass the array directly as arrays pass by value which will involve a lot of copying!)

    for i := 0; i < procs; i++ { //create two goroutines
    start := i * chunkSize
    end := (i+1)*chunkSize - 1
    fmt.Println(start, end) //get slice start and end
    wg.Add(1)
    go func(slices []string, allnames []string) {
    var dp [max][max]int // allocate
    for _, slice := range slices {
    minDistance = 256
    distance := 0
    sum := 0
    for _, name := range allnames {
    distance = calcEditDist(slice, name, &dp) // pass dp pointer here

Change calcEditDist to take the dp

func CalcEditDist(A string, B string, dp *[max][max]int) int {
        lenA := len(A)
        lenB := len(B)
  1. Re-write your calcEditDistance so it doesn't need the massive O(N^2) dp array.

If you study the function carefully it only ever accesses a row up and a column to the left, so all the storage you actually need is a previous row and a previous columns which you could allocate dynamically at very little cost. This would make it scale to any length of string too.

That would need a bit of careful thought though!

huangapple
  • 本文由 发表于 2013年12月4日 14:39:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/20368711.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定