英文:
code block in goroutine produces strange wrong results
问题
我有一个大的N*1名称数组。
我目前正在使用goroutine来计算每个名称之间的编辑距离。
问题是[B] [C]处的结果不同,可能是这样的:
ABC BCD 7
ABC BCD 3
名称中有20000个记录。
将名称分成两个块:
var names []string
通道:
ch := make(chan int)
var wg sync.WaitGroup
for i := 0; i < procs; i++ { //创建两个goroutine
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //获取切片的起始和结束位置
wg.Add(1)
go func(slices []string, allnames []string) {
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name) //获取LD [A]
sum += 1
if distance > 0 && distance < minDistance {
minDistance = distance
fmt.Println(slice, name, distance) //[B]
fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
} else if distance == minDistance {
fmt.Println(slice, name, distance)
fmt.Println(slice, name, calcEditDist(slice, name))
}
}
ch <- sum
break
}
wg.Done()
}(names[start:end], names)
}
我将calcEditDist放在https://github.com/copywrite/keyboardDistance/blob/master/parallel.go中。
PS:
如果我在calcEditDist中将var dp [max][max]int声明为局部变量而不是全局变量,结果是正确的,但速度非常慢。
更新1:
感谢大家的建议,我按照以下三个步骤进行了改进:
1)我将dp缩小到一个更合理的大小,比如100甚至更小,已完成。
2)我在每个goroutine中放置了dp的声明,并按照Nick的建议传递其指针,已完成。
3)稍后我将尝试动态分配dp,以后再说。
性能大幅提升,╰(°▽°)╯
英文:
i have a big N*1 name array
i am currently using goroutine to calculate the edit distance of a name among each other
the question is the results at [B] [C] are different, maybe like
ABC BCD 7
ABC BCD 3
there are 20000 records in names
var names []string
divide names into two chunks
nameCount := len(names)
procs := 2
chunkSize := nameCount / procs
channel
ch := make(chan int)
var wg sync.WaitGroup
for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name) //get the LD [A]
sum += 1
if distance > 0 && distance < minDistance {
minDistance = distance
fmt.Println(slice, name, distance) //[B]
fmt.Println(slice, name, calcEditDist(slice, name)) //[C]
} else if distance == minDistance {
fmt.Println(slice, name, distance)
fmt.Println(slice, name, calcEditDist(slice, name))
}
}
// for _, name := range allnames {
// fmt.Println(slice, name)
// }
ch <- sum
// fmt.Println(len(allnames), slice)
break
}
wg.Done()
}(names[start:end], names)
}
i placed the calcEditDist @https://github.com/copywrite/keyboardDistance/blob/master/parallel.go
PS:
if i declare
var dp [max][max]int
in calcEditDist as local variable instead of global, the results are right, but is incredibly slow
UPDATE 1
Thanks all buddy,i take the great advice below in three steps
- i shrinked the dp to a much reasonable size, like 100 or even smaller, DONE
- i put the dp declaration in each goroutine and pass its pointer as Nick said, DONE
- later i will try to dynamically alloc dp, LATER
the performance improved steeply, ╰(°▽°)╯
答案1
得分: 1
正如您在帖子中指出的那样,将dp
作为全局变量是问题所在。
在CalcEditDistance
中每次分配它都太慢了。
您有两个可能的解决方案。
1)每个go例程只需要一个dp
数组,因此在for循环中分配它,并传递一个指向它的指针(不要直接传递数组,因为数组按值传递会涉及大量的复制!)
for i := 0; i < procs; i++ { //创建两个goroutine
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //获取切片的起始和结束位置
wg.Add(1)
go func(slices []string, allnames []string) {
var dp [max][max]int // 分配
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name, &dp) // 在这里传递dp指针
将calcEditDist
更改为接受dp
参数
func CalcEditDist(A string, B string, dp *[max][max]int) int {
lenA := len(A)
lenB := len(B)
2)重写您的calcEditDistance
,使其不需要庞大的O(N^2)的dp
数组。
如果仔细研究该函数,您会发现它只会访问上一行和左边的一列,因此您实际上只需要一个先前的行和一个先前的列,您可以以非常小的代价动态分配它们。这样它就可以适应任意长度的字符串。
不过,这需要一些仔细的思考!
英文:
As you've identified in your posting, having dp
as a global variable is the problem.
Allocating it each time in CalcEditDistance
is too slow.
You have two possible solutions.
-
you only need 1
dp
array per go-routine, so allocate it in the for loop loop and pass a pointer to it (don't pass the array directly as arrays pass by value which will involve a lot of copying!)for i := 0; i < procs; i++ { //create two goroutines
start := i * chunkSize
end := (i+1)*chunkSize - 1
fmt.Println(start, end) //get slice start and end
wg.Add(1)
go func(slices []string, allnames []string) {
var dp [max][max]int // allocate
for _, slice := range slices {
minDistance = 256
distance := 0
sum := 0
for _, name := range allnames {
distance = calcEditDist(slice, name, &dp) // pass dp pointer here
Change calcEditDist to take the dp
func CalcEditDist(A string, B string, dp *[max][max]int) int {
lenA := len(A)
lenB := len(B)
- Re-write your
calcEditDistance
so it doesn't need the massive O(N^2) dp array.
If you study the function carefully it only ever accesses a row up and a column to the left, so all the storage you actually need is a previous row and a previous columns which you could allocate dynamically at very little cost. This would make it scale to any length of string too.
That would need a bit of careful thought though!
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论