并发代码在并行问题上比顺序代码慢吗?

huangapple go评论84阅读模式
英文:

Concurrent code slower than sequential code on parallel problem?

问题

我写了一些代码来执行蒙特卡洛模拟。我首先写的是这个顺序版本:

func simulationSequential(experiment func() bool, numTrials int) float64 {
    ocurrencesEvent := 0

    for trial := 0; trial < numTrials; trial++ {
        eventHappend := experiment()
        if eventHappend {
            ocurrencesEvent++
        }
    }

    return float64(ocurrencesEvent) / float64(numTrials)
}

然后,我想到可以并发地运行一些实验,并利用我的笔记本电脑的多个核心来更快地得到结果。所以,我写了以下版本:

func simulationConcurrent(experiment func() bool, numTrials, nGoroutines int) float64 {
    ch := make(chan int)
    var wg sync.WaitGroup

    // 启动多个 goroutine 进行工作
    for i := 0; i < nGoroutines; i++ {
        wg.Add(1)
        go func() {
            localOcurrences := 0
            for j := 0; j < numTrials/nGoroutines; j++ {
                eventHappend := experiment()
                if eventHappend {
                    localOcurrences++
                }
            }
            ch <- localOcurrences
            wg.Done()
        }()
    }

    // 当所有 goroutine 完成时关闭通道
    go func() {
        wg.Wait()
        close(ch)
    }()

    // 累积每个实验的结果
    ocurrencesEvent := 0
    for localOcurrences := range ch {
        ocurrencesEvent += localOcurrences
    }

    return float64(ocurrencesEvent) / float64(numTrials)
}

令我惊讶的是,当我对这两个版本进行基准测试时,我发现顺序版本比并发版本更快,而且随着我减少 goroutine 的数量,并发版本的性能反而变得更好。为什么会这样呢?我原以为并发版本会更快,因为这是一个高度可并行化的问题。

这是我的基准测试代码:

func tossEqualToSix() bool {
    // 模拟掷一个六面骰子
    roll := rand.Intn(6) + 1

    if roll != 6 {
        return false
    }
    return true
}

const (
    numsSimBenchmark       = 1_000_000
    numGoroutinesBenckmark = 10
)

func BenchmarkSimulationSequential(b *testing.B) {
    for i := 0; i < b.N; i++ {
        simulationSequential(tossEqualToSix, numsSimBenchmark)
    }
}

func BenchmarkSimulationConcurrent(b *testing.B) {
    for i := 0; i < b.N; i++ {
        simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenckmark)
    }
}

以下是结果:

goos: linux
goarch: amd64
pkg: github.com/jpuriol/montecarlo
cpu: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
BenchmarkSimulationSequential-8               36          30453588 ns/op
BenchmarkSimulationConcurrent-8                9         117462720 ns/op
PASS
ok      github.com/jpuriol/montecarlo   2.478s

你可以从Github上下载我的代码。

英文:

I wrote some code to execute Monte Carlo simulations. The first thing I wrote was this sequential version:

 func simulationSequential(experiment func() bool, numTrials int) float64 {
	ocurrencesEvent := 0

	for trial := 0; trial &lt; numTrials; trial++ {
		eventHappend := experiment()
		if eventHappend {
			ocurrencesEvent++
		}
	}

	return float64(ocurrencesEvent) / float64(numTrials)
}

Then, I figured I could run some of the experiments concurrently and get a result faster using my laptop's multiple cores. So, I wrote the following version:

func simulationConcurrent(experiment func() bool, numTrials, nGoroutines int) float64 {
	ch := make(chan int)
	var wg sync.WaitGroup

	// Launch work in multiple goroutines
	for i := 0; i &lt; nGoroutines; i++ {
		wg.Add(1)
		go func() {
			localOcurrences := 0
			for j := 0; j &lt; numTrials/nGoroutines; j++ {
				eventHappend := experiment()
				if eventHappend {
					localOcurrences++
				}
			}
			ch &lt;- localOcurrences
			wg.Done()
		}()
	}

	// Close the channel when all the goroutines are done
	go func() {
		wg.Wait()
		close(ch)
	}()

	// Acummulate the results of each experiment
	ocurrencesEvent := 0
	for localOcurrences := range ch {
		ocurrencesEvent += localOcurrences
	}

	return float64(ocurrencesEvent) / float64(numTrials)
}

To my surprise, when I run benchmarks on the two versions, I get that the sequential is faster than the concurrent one, with the concurrent version getting better as I decrease the number of goroutines. Why does this happen? I thought the concurrent version will be faster since this is a highly parallelizable problem.

Here is my benchmark's code:

func tossEqualToSix() bool {
	// Simulate the toss of a six-sided die
	roll := rand.Intn(6) + 1

	if roll != 6 {
		return false
	}
	return true
}

const (
	numsSimBenchmark       = 1_000_000
	numGoroutinesBenckmark = 10
)

func BenchmarkSimulationSequential(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		simulationSequential(tossEqualToSix, numsSimBenchmark)
	}
}

func BenchmarkSimulationConcurrent(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenckmark)
	}
}

And the results:

goos: linux
goarch: amd64
pkg: github.com/jpuriol/montecarlo
cpu: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
BenchmarkSimulationSequential-8               36          30453588 ns/op
BenchmarkSimulationConcurrent-8                9         117462720 ns/op
PASS
ok      github.com/jpuriol/montecarlo   2.478s

You can download my code from Github.

答案1

得分: 3

我想详细说明一下我的评论,并将其与代码和基准结果一起发布。

Examine函数使用rand包中的包级别rand函数。这些函数在底层使用rand.RandglobalRand实例。例如func Intn(n int) int { return globalRand.Intn(n) }。由于随机数生成器不是线程安全的,因此globalRand以以下方式实例化:

/*
 * Top-level convenience functions
 */

var globalRand = New(&lockedSource{src: NewSource(1).(*rngSource)})

type lockedSource struct {
	lk  sync.Mutex
	src *rngSource
}

func (r *lockedSource) Int63() (n int64) {
	r.lk.Lock()
	n = r.src.Int63()
	r.lk.Unlock()
	return
}
...

这意味着所有对rand.Intn的调用都受到全局锁的保护。结果是examine函数"按顺序工作",因为有锁的存在。更具体地说,每次调用rand.Intn都不会在前一次调用完成之前开始生成随机数。

这是重新设计的代码。每个examine函数都有自己的随机生成器。假设单个examine函数由一个goroutine使用,因此不需要锁保护。

package main

import (
	"math/rand"
	"sync"
	"testing"
	"time"
)

func simulationSequential(experimentFuncFactory func() func() bool, numTrials int) float64 {
	experiment := experimentFuncFactory()
	ocurrencesEvent := 0

	for trial := 0; trial < numTrials; trial++ {
		eventHappend := experiment()
		if eventHappend {
			ocurrencesEvent++
		}
	}

	return float64(ocurrencesEvent) / float64(numTrials)
}

func simulationConcurrent(experimentFuncFactory func() func() bool, numTrials, nGoroutines int) float64 {
	ch := make(chan int)
	var wg sync.WaitGroup

	// Launch work in multiple goroutines
	for i := 0; i < nGoroutines; i++ {
		wg.Add(1)
		go func() {
			experiment := experimentFuncFactory()
			localOcurrences := 0
			for j := 0; j < numTrials/nGoroutines; j++ {
				eventHappend := experiment()
				if eventHappend {
					localOcurrences++
				}
			}
			ch <- localOcurrences
			wg.Done()
		}()
	}

	// Close the channel when all the goroutines are done
	go func() {
		wg.Wait()
		close(ch)
	}()

	// Acummulate the results of each experiment
	ocurrencesEvent := 0
	for localOcurrences := range ch {
		ocurrencesEvent += localOcurrences
	}

	return float64(ocurrencesEvent) / float64(numTrials)
}

func tossEqualToSix() func() bool {
	prng := rand.New(rand.NewSource(time.Now().UnixNano()))
	return func() bool {
		// Simulate the toss of a six-sided die
		roll := prng.Intn(6) + 1

		if roll != 6 {
			return false
		}
		return true
	}
}

const (
	numsSimBenchmark       = 5_000_000
	numGoroutinesBenchmark = 10
)

func BenchmarkSimulationSequential(b *testing.B) {
	for i := 0; i < b.N; i++ {
		simulationSequential(tossEqualToSix, numsSimBenchmark)
	}
}

func BenchmarkSimulationConcurrent(b *testing.B) {
	for i := 0; i < b.N; i++ {
		simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenchmark)
	}
}

基准测试结果如下:

goos: darwin
goarch: arm64
pkg: scratchpad
BenchmarkSimulationSequential-8   	      20	  55142896 ns/op
BenchmarkSimulationConcurrent-8   	      82	  12944360 ns/op
英文:

I thought I would elaborate on my comment and post it with code and benchmark result.

Examine function uses package level rand functions from rand package. These functions under the hood uses globalRand instance of rand.Rand. For example func Intn(n int) int { return globalRand.Intn(n) }. As the random number generator is not thread safe, the globalRand is instantiated in following way:

/*
* Top-level convenience functions
*/
var globalRand = New(&amp;lockedSource{src: NewSource(1).(*rngSource)})
type lockedSource struct {
lk  sync.Mutex
src *rngSource
}
func (r *lockedSource) Int63() (n int64) {
r.lk.Lock()
n = r.src.Int63()
r.lk.Unlock()
return
}
...

This means that all invocations of rand.Intn are guarded by the global lock. The consequence is that examine function "works sequentially", because of the lock. More specifically, each call to rand.Intn will not start generating a random number before the previous call completes.

Here is redesigned code. Each examine function has its own random generator. The assumption is that single examine function is used by one goroutine, so it does not require lock protection.

package main
import (
&quot;math/rand&quot;
&quot;sync&quot;
&quot;testing&quot;
&quot;time&quot;
)
func simulationSequential(experimentFuncFactory func() func() bool, numTrials int) float64 {
experiment := experimentFuncFactory()
ocurrencesEvent := 0
for trial := 0; trial &lt; numTrials; trial++ {
eventHappend := experiment()
if eventHappend {
ocurrencesEvent++
}
}
return float64(ocurrencesEvent) / float64(numTrials)
}
func simulationConcurrent(experimentFuncFactory func() func() bool, numTrials, nGoroutines int) float64 {
ch := make(chan int)
var wg sync.WaitGroup
// Launch work in multiple goroutines
for i := 0; i &lt; nGoroutines; i++ {
wg.Add(1)
go func() {
experiment := experimentFuncFactory()
localOcurrences := 0
for j := 0; j &lt; numTrials/nGoroutines; j++ {
eventHappend := experiment()
if eventHappend {
localOcurrences++
}
}
ch &lt;- localOcurrences
wg.Done()
}()
}
// Close the channel when all the goroutines are done
go func() {
wg.Wait()
close(ch)
}()
// Acummulate the results of each experiment
ocurrencesEvent := 0
for localOcurrences := range ch {
ocurrencesEvent += localOcurrences
}
return float64(ocurrencesEvent) / float64(numTrials)
}
func tossEqualToSix() func() bool {
prng := rand.New(rand.NewSource(time.Now().UnixNano()))
return func() bool {
// Simulate the toss of a six-sided die
roll := prng.Intn(6) + 1
if roll != 6 {
return false
}
return true
}
}
const (
numsSimBenchmark       = 5_000_000
numGoroutinesBenchmark = 10
)
func BenchmarkSimulationSequential(b *testing.B) {
for i := 0; i &lt; b.N; i++ {
simulationSequential(tossEqualToSix, numsSimBenchmark)
}
}
func BenchmarkSimulationConcurrent(b *testing.B) {
for i := 0; i &lt; b.N; i++ {
simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenchmark)
}
}

Benchmark result are as follows:

goos: darwin
goarch: arm64
pkg: scratchpad
BenchmarkSimulationSequential-8   	      20	  55142896 ns/op
BenchmarkSimulationConcurrent-8   	      82	  12944360 ns/op

huangapple
  • 本文由 发表于 2021年7月27日 14:28:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/68539951.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定