Go并发:Chudnovky算法,比sync慢。

huangapple go评论128阅读模式
英文:

Go Concurrency: Chudnovky's algorithm, slower than sync

问题

我最近在朋友的推荐下开始学习Go语言。到目前为止,我非常喜欢它,但是我写了一个(我认为会是)轻量级并发能力的完美示例,结果却出乎意料...所以我怀疑我做错了什么,或者我对goroutine的开销理解有误。我希望这里的一些Go语言专家能够提供一些见解。

我使用goroutine和简单的同步执行来编写了Chudnovsky算法的Go语言版本。我假设,由于每个计算都是相互独立的,使用并发运行至少会稍微快一点。

注意:我在一台第五代i7上运行这个程序,所以如果goroutine像我被告知的那样被复用到线程上,这应该是并发和并行的。

  1. package main
  2. import (
  3. "fmt"
  4. "math"
  5. "strconv"
  6. "time"
  7. )
  8. func main() {
  9. var input string
  10. var sum float64
  11. var pi float64
  12. c := make(chan float64)
  13. fmt.Print("How many iterations? ")
  14. fmt.Scanln(&input)
  15. max,err := strconv.Atoi(input)
  16. if err != nil {
  17. panic("You did not enter a valid integer")
  18. }
  19. start := time.Now() //start timing execution of concurrent routine
  20. for i := 0; i < max; i++ {
  21. go chudnovskyConcurrent(i,c)
  22. }
  23. for i := 0; i < max; i++ {
  24. sum += <-c
  25. }
  26. end := time.Now() //end of concurrent routine
  27. fmt.Println("Duration of concurrent calculation: ",end.Sub(start))
  28. pi = 1/(12*sum)
  29. fmt.Println(pi)
  30. start = time.Now() //start timing execution of syncronous routine
  31. sum = 0
  32. for i := 0; i < max; i++ {
  33. sum += chudnovskySync(i)
  34. }
  35. end = time.Now() //end of syncronous routine
  36. fmt.Println("Duration of synchronous calculation: ",end.Sub(start))
  37. pi = 1/(12*sum)
  38. fmt.Println(pi)
  39. }
  40. func chudnovskyConcurrent(i int, c chan<- float64) {
  41. var numerator float64
  42. var denominator float64
  43. ifloat := float64(i)
  44. iun := uint64(i)
  45. numerator = math.Pow(-1, ifloat) * float64(factorial(6*iun)) * (545140134*ifloat+13591409)
  46. denominator = float64(factorial(3*iun)) * math.Pow(float64(factorial(iun)),3) * math.Pow(math.Pow(640320,3),ifloat+0.5)
  47. c <- numerator/denominator
  48. }
  49. func chudnovskySync(i int) (r float64) {
  50. var numerator float64
  51. var denominator float64
  52. ifloat := float64(i)
  53. iun := uint64(i)
  54. numerator = math.Pow(-1, ifloat) * float64(factorial(6*iun)) * (545140134*ifloat+13591409)
  55. denominator = float64(factorial(3*iun)) * math.Pow(float64(factorial(iun)),3) * math.Pow(math.Pow(640320,3),ifloat+0.5)
  56. r = numerator/denominator
  57. return
  58. }
  59. func factorial(n uint64) (res uint64) {
  60. if ( n > 0 ) {
  61. res = n * factorial(n-1)
  62. return res
  63. }
  64. return 1
  65. }

这是我的结果:

  1. How many iterations? 20
  2. Duration of concurrent calculation: 573.944μs
  3. 3.1415926535897936
  4. Duration of synchronous calculation: 63.056μs
  5. 3.1415926535897936
英文:

I just recently started learning go at the recommendation of a friend. So far, I am loving it, but I wrote (what I thought would be) the perfect example of the power of lightweight concurrency, and got a surprising result... so I suspect I'm doing something wrong, or I'm misunderstanding how expensive goroutines are. I'm hoping some gophers here can provide insight.

I wrote Chudnovsky's algorithm in Go using both goroutines and simple synchronous execution. I assumed, with each calculation being independent of the others, it'd be at least a little faster running concurrently.

note: I am running this on a 5th gen i7, so if goroutines are multiplexed onto threads as I was told, this should be both concurrent and parallel.

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;math&quot;
  5. &quot;strconv&quot;
  6. &quot;time&quot;
  7. )
  8. func main() {
  9. var input string
  10. var sum float64
  11. var pi float64
  12. c := make(chan float64)
  13. fmt.Print(&quot;How many iterations? &quot;)
  14. fmt.Scanln(&amp;input)
  15. max,err := strconv.Atoi(input)
  16. if err != nil {
  17. panic(&quot;You did not enter a valid integer&quot;)
  18. }
  19. start := time.Now() //start timing execution of concurrent routine
  20. for i := 0; i &lt; max; i++ {
  21. go chudnovskyConcurrent(i,c)
  22. }
  23. for i := 0; i &lt; max; i++ {
  24. sum += &lt;-c
  25. }
  26. end := time.Now() //end of concurrent routine
  27. fmt.Println(&quot;Duration of concurrent calculation: &quot;,end.Sub(start))
  28. pi = 1/(12*sum)
  29. fmt.Println(pi)
  30. start = time.Now() //start timing execution of syncronous routine
  31. sum = 0
  32. for i := 0; i &lt; max; i++ {
  33. sum += chudnovskySync(i)
  34. }
  35. end = time.Now() //end of syncronous routine
  36. fmt.Println(&quot;Duration of synchronous calculation: &quot;,end.Sub(start))
  37. pi = 1/(12*sum)
  38. fmt.Println(pi)
  39. }
  40. func chudnovskyConcurrent(i int, c chan&lt;- float64) {
  41. var numerator float64
  42. var denominator float64
  43. ifloat := float64(i)
  44. iun := uint64(i)
  45. numerator = math.Pow(-1, ifloat) * float64(factorial(6*iun)) * (545140134*ifloat+13591409)
  46. denominator = float64(factorial(3*iun)) * math.Pow(float64(factorial(iun)),3) * math.Pow(math.Pow(640320,3),ifloat+0.5)
  47. c &lt;- numerator/denominator
  48. }
  49. func chudnovskySync(i int) (r float64) {
  50. var numerator float64
  51. var denominator float64
  52. ifloat := float64(i)
  53. iun := uint64(i)
  54. numerator = math.Pow(-1, ifloat) * float64(factorial(6*iun)) * (545140134*ifloat+13591409)
  55. denominator = float64(factorial(3*iun)) * math.Pow(float64(factorial(iun)),3) * math.Pow(math.Pow(640320,3),ifloat+0.5)
  56. r = numerator/denominator
  57. return
  58. }
  59. func factorial(n uint64) (res uint64) {
  60. if ( n &gt; 0 ) {
  61. res = n * factorial(n-1)
  62. return res
  63. }
  64. return 1
  65. }

And here are my results:

  1. How many iterations? 20
  2. Duration of concurrent calculation: 573.944&#181;s
  3. 3.1415926535897936
  4. Duration of synchronous calculation: 63.056&#181;s
  5. 3.1415926535897936

答案1

得分: 3

你正在做的计算太简单了,每个计算都在单独的goroutine中执行是没有必要的。你在运行时花费的时间(创建goroutines、多路复用、调度等)比实际计算时间还要多。你所尝试的做法更适合使用GPU,例如,在那里你有大量的并行执行单元,可以在瞬间完成这些简单的计算。但是你需要其他的语言和API来实现这一点。

你可以为每个硬件线程创建一个软件执行线程。你想要将max变量分成大块,并并行执行它们。这里有一个非常简单的示例来说明这个想法:

  1. package main
  2. import (
  3. "fmt"
  4. "math"
  5. "strconv"
  6. "time"
  7. "runtime"
  8. )
  9. func main() {
  10. var input string
  11. var sum float64
  12. var pi float64
  13. c := make(chan float64, runtime.GOMAXPROCS(-1))
  14. fmt.Print("How many iterations? ")
  15. fmt.Scanln(&input)
  16. max,err := strconv.Atoi(input)
  17. if err != nil {
  18. panic("You did not enter a valid integer")
  19. }
  20. start := time.Now() //start timing execution of concurrent routine
  21. for i := 0; i < runtime.GOMAXPROCS(-1); i++ {
  22. go func(i int){
  23. var sum float64
  24. for j := 0; j < max/runtime.GOMAXPROCS(-1); j++ {
  25. sum += chudnovskySync(j + i*max/runtime.GOMAXPROCS(-1))
  26. }
  27. c <- sum
  28. }(i)
  29. }
  30. for i := 0; i < runtime.GOMAXPROCS(-1); i++ {
  31. sum += <-c
  32. }
  33. end := time.Now() //end of concurrent routine
  34. fmt.Println("Duration of concurrent calculation: ",end.Sub(start))
  35. pi = 1/(12*sum)
  36. fmt.Println(pi)
  37. start = time.Now() //start timing execution of syncronous routine
  38. sum = 0
  39. for i := 0; i < max; i++ {
  40. sum += chudnovskySync(i)
  41. }
  42. end = time.Now() //end of syncronous routine
  43. fmt.Println("Duration of synchronous calculation: ",end.Sub(start))
  44. pi = 1/(12*sum)
  45. fmt.Println(pi)
  46. }
  47. func chudnovskySync(i int) (r float64) {
  48. var numerator float64
  49. var denominator float64
  50. ifloat := float64(i)
  51. iun := uint64(i)
  52. numerator = math.Pow(-1, ifloat) * float64(factorial(6*iun)) * (545140134*ifloat+13591409)
  53. denominator = float64(factorial(3*iun)) * math.Pow(float64(factorial(iun)),3) * math.Pow(math.Pow(640320,3),ifloat+0.5)
  54. r = numerator/denominator
  55. return
  56. }
  57. func factorial(n uint64) (res uint64) {
  58. if ( n > 0 ) {
  59. res = n * factorial(n-1)
  60. return res
  61. }
  62. return 1
  63. }

这是结果:

  1. $ go version
  2. go version go1.5.2 windows/amd64
  3. $ go run main.go
  4. GOMAXPROCS = 4
  5. How many iterations? 10000
  6. Duration of concurrent calculation: 932.8916ms
  7. NaN
  8. Duration of synchronous calculation: 2.0639744s
  9. NaN
英文:

The calculations you're doing are too simple to do each one of them in a separate goroutine. You're loosing more time in the runtime (creating goroutines, multiplexing, scheduling etc) than on actual calculations. What you're trying to do is more suited for GPU, for example, where you have massive number of parallel execution units that can do this simple calculations all at ones in an instant. But you would need other languages and APIs to do that.

What you can do is to create software thread of execution for every hardware thread of execution. You want to split your max variable into big chunks and execute them in parallel. Here's very simple take on it just to illustrate the idea:

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;math&quot;
  5. &quot;strconv&quot;
  6. &quot;time&quot;
  7. &quot;runtime&quot;
  8. )
  9. func main() {
  10. var input string
  11. var sum float64
  12. var pi float64
  13. c := make(chan float64, runtime.GOMAXPROCS(-1))
  14. fmt.Print(&quot;How many iterations? &quot;)
  15. fmt.Scanln(&amp;input)
  16. max,err := strconv.Atoi(input)
  17. if err != nil {
  18. panic(&quot;You did not enter a valid integer&quot;)
  19. }
  20. start := time.Now() //start timing execution of concurrent routine
  21. for i := 0; i &lt; runtime.GOMAXPROCS(-1); i++ {
  22. go func(i int){
  23. var sum float64
  24. for j := 0; j &lt; max/runtime.GOMAXPROCS(-1); j++ {
  25. sum += chudnovskySync(j + i*max/runtime.GOMAXPROCS(-1))
  26. }
  27. c &lt;- sum
  28. }(i)
  29. }
  30. for i := 0; i &lt; runtime.GOMAXPROCS(-1); i++ {
  31. sum += &lt;-c
  32. }
  33. end := time.Now() //end of concurrent routine
  34. fmt.Println(&quot;Duration of concurrent calculation: &quot;,end.Sub(start))
  35. pi = 1/(12*sum)
  36. fmt.Println(pi)
  37. start = time.Now() //start timing execution of syncronous routine
  38. sum = 0
  39. for i := 0; i &lt; max; i++ {
  40. sum += chudnovskySync(i)
  41. }
  42. end = time.Now() //end of syncronous routine
  43. fmt.Println(&quot;Duration of synchronous calculation: &quot;,end.Sub(start))
  44. pi = 1/(12*sum)
  45. fmt.Println(pi)
  46. }
  47. func chudnovskySync(i int) (r float64) {
  48. var numerator float64
  49. var denominator float64
  50. ifloat := float64(i)
  51. iun := uint64(i)
  52. numerator = math.Pow(-1, ifloat) * float64(factorial(6*iun)) * (545140134*ifloat+13591409)
  53. denominator = float64(factorial(3*iun)) * math.Pow(float64(factorial(iun)),3) * math.Pow(math.Pow(640320,3),ifloat+0.5)
  54. r = numerator/denominator
  55. return
  56. }
  57. func factorial(n uint64) (res uint64) {
  58. if ( n &gt; 0 ) {
  59. res = n * factorial(n-1)
  60. return res
  61. }
  62. return 1
  63. }

And here's the results

  1. $ go version
  2. go version go1.5.2 windows/amd64
  3. $ go run main.go
  4. GOMAXPROCS = 4
  5. How many iterations? 10000
  6. Duration of concurrent calculation: 932.8916ms
  7. NaN
  8. Duration of synchronous calculation: 2.0639744s
  9. NaN

答案2

得分: 0

我同意,你的计算量不足以弥补使用多个goroutine所带来的开销。只是为了好玩,我修改了你的代码,在返回结果之前多次进行计算(1000次、10000次、100000次、1000000次)。我在运行这个代码(进行了20次迭代)时使用的是运行在四核Xeon上的Mac OS X Yosemite,正如你所预期的那样,同步版本所需的时间大约是并行版本的四倍。

有趣的是,我注意到,当重复次数很大时,同步版本实际上所需的时间超过了并行版本的四倍。我猜测这可能与英特尔的超线程架构有关,该架构允许在每个核心内部进行一定程度的并行处理,但我对此并不确定。

英文:

I agree, your calculations don't do enough processing to overcome the overhead of having multiple goroutines. Just for fun, I modified your code to do the calculation many times (1000, 10000, 100000, 1000000) before returning the result. I ran this (with 20 iterations) under Mac OS X Yosemite running on a quad core Xeon, and, as you might expect, the synchronous version takes in the neighborhood of four times as long as the parallel version.

One interesting thing I noticed is that, with a large number of repetitions, the synchronous version actually takes more than four times as long as the parallel version. I'm guessing this has something to do with Intel's hyperthreading architecture which allows some level of parallelism within each core, but I'm not sure about that.

答案3

得分: 0

你可能需要在每个线程中执行多个序列项,以提高速度,因为生成线程的开销较大。以下是使用较少线程和每个线程更多项的代码示例(但使用了不同的算法):

  1. package main
  2. import (
  3. "fmt"
  4. "math/rand"
  5. "runtime"
  6. "time"
  7. )
  8. var precision int = 2147483647
  9. var inside int = 0
  10. var size int = 2147483647
  11. var threads int = runtime.NumCPU()
  12. func getPi(total int) float64 {
  13. return (float64(inside) / float64(total)) * 4
  14. }
  15. func addpoints(times int, out chan int) {
  16. r1 := rand.New(rand.NewSource(time.Now().UnixNano()))
  17. var x, y, ret int
  18. ret = 0
  19. for i := 0; i < times; i++ {
  20. x = r1.Intn(size)
  21. y = r1.Intn(size)
  22. if ((x * x) + (y * y)) < (size * size) {
  23. ret += 1
  24. }
  25. }
  26. out <- ret
  27. }
  28. func main() {
  29. fmt.Println("running on " + fmt.Sprint(threads) + " threads")
  30. start := time.Now()
  31. results := make(chan int, threads)
  32. for i := 0; i < threads; i++ {
  33. go addpoints(precision/threads, results)
  34. }
  35. for i := 0; i < threads; i++ {
  36. inside += <-results
  37. }
  38. duration := time.Since(start)
  39. fmt.Println(getPi(precision))
  40. fmt.Println("compute time: " + fmt.Sprint(duration))
  41. }

希望对你有帮助!

英文:

You probably need to do multiple terms of the sequence per thread to be faster due to overhead spawning threads.
such as the following code which uses less threads and more terms per thread (but with a different algorithm)

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;math/rand&quot;
  5. &quot;runtime&quot;
  6. &quot;time&quot;
  7. )
  8. var precision int = 2147483647
  9. var inside int = 0
  10. var size int = 2147483647
  11. var threads int = runtime.NumCPU()
  12. func getPi(total int) float64 {
  13. return (float64(inside) / float64(total)) * 4
  14. }
  15. func addpoints(times int, out chan int) {
  16. r1 := rand.New(rand.NewSource(time.Now().UnixNano()))
  17. var x, y, ret int
  18. ret = 0
  19. for i := 0; i &lt; times; i++ {
  20. x = r1.Intn(size)
  21. y = r1.Intn(size)
  22. if ((x * x) + (y * y)) &lt; (size * size) {
  23. ret += 1
  24. }
  25. }
  26. out &lt;- ret
  27. }
  28. func main() {
  29. fmt.Println(&quot;running on &quot; + fmt.Sprint(threads) + &quot; threads&quot;)
  30. start := time.Now()
  31. results := make(chan int, threads)
  32. for i := 0; i &lt; threads; i++ {
  33. go addpoints(precision/threads, results)
  34. }
  35. for i := 0; i &lt; threads; i++ {
  36. inside += &lt;-results
  37. }
  38. duration := time.Since(start)
  39. fmt.Println(getPi(precision))
  40. fmt.Println(&quot;compute time: &quot; + fmt.Sprint(duration))
  41. }

huangapple
  • 本文由 发表于 2016年3月20日 00:14:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/36104038.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定