2021年7月27日 14:28:51go评论114阅读模式

英文:

Concurrent code slower than sequential code on parallel problem?

问题

我写了一些代码来执行蒙特卡洛模拟。我首先写的是这个顺序版本：

func simulationSequential(experiment func() bool, numTrials int) float64 {
    ocurrencesEvent := 0
    for trial := 0; trial < numTrials; trial++ {
        eventHappend := experiment()
        if eventHappend {
            ocurrencesEvent++
        }
    }
    return float64(ocurrencesEvent) / float64(numTrials)
}

然后，我想到可以并发地运行一些实验，并利用我的笔记本电脑的多个核心来更快地得到结果。所以，我写了以下版本：

func simulationConcurrent(experiment func() bool, numTrials, nGoroutines int) float64 {
    ch := make(chan int)
    var wg sync.WaitGroup
    // 启动多个 goroutine 进行工作
    for i := 0; i < nGoroutines; i++ {
        wg.Add(1)
        go func() {
            localOcurrences := 0
            for j := 0; j < numTrials/nGoroutines; j++ {
                eventHappend := experiment()
                if eventHappend {
                    localOcurrences++
                }
            }
            ch <- localOcurrences
            wg.Done()
        }()
    }
    // 当所有 goroutine 完成时关闭通道
    go func() {
        wg.Wait()
        close(ch)
    }()
    // 累积每个实验的结果
    ocurrencesEvent := 0
    for localOcurrences := range ch {
        ocurrencesEvent += localOcurrences
    }
    return float64(ocurrencesEvent) / float64(numTrials)
}

令我惊讶的是，当我对这两个版本进行基准测试时，我发现顺序版本比并发版本更快，而且随着我减少 goroutine 的数量，并发版本的性能反而变得更好。为什么会这样呢？我原以为并发版本会更快，因为这是一个高度可并行化的问题。

这是我的基准测试代码：

func tossEqualToSix() bool {
    // 模拟掷一个六面骰子
    roll := rand.Intn(6) + 1
    if roll != 6 {
        return false
    }
    return true
}
const (
    numsSimBenchmark       = 1_000_000
    numGoroutinesBenckmark = 10
)
func BenchmarkSimulationSequential(b *testing.B) {
    for i := 0; i < b.N; i++ {
        simulationSequential(tossEqualToSix, numsSimBenchmark)
    }
}
func BenchmarkSimulationConcurrent(b *testing.B) {
    for i := 0; i < b.N; i++ {
        simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenckmark)
    }
}

以下是结果：

goos: linux
goarch: amd64
pkg: github.com/jpuriol/montecarlo
cpu: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
BenchmarkSimulationSequential-8               36          30453588 ns/op
BenchmarkSimulationConcurrent-8                9         117462720 ns/op
PASS
ok      github.com/jpuriol/montecarlo   2.478s

你可以从Github上下载我的代码。

英文:

I wrote some code to execute Monte Carlo simulations. The first thing I wrote was this sequential version:

 func simulationSequential(experiment func() bool, numTrials int) float64 {
	ocurrencesEvent := 0
	for trial := 0; trial &lt; numTrials; trial++ {
		eventHappend := experiment()
		if eventHappend {
			ocurrencesEvent++
		}
	}
	return float64(ocurrencesEvent) / float64(numTrials)
}

Then, I figured I could run some of the experiments concurrently and get a result faster using my laptop's multiple cores. So, I wrote the following version:

func simulationConcurrent(experiment func() bool, numTrials, nGoroutines int) float64 {
	ch := make(chan int)
	var wg sync.WaitGroup
	// Launch work in multiple goroutines
	for i := 0; i &lt; nGoroutines; i++ {
		wg.Add(1)
		go func() {
			localOcurrences := 0
			for j := 0; j &lt; numTrials/nGoroutines; j++ {
				eventHappend := experiment()
				if eventHappend {
					localOcurrences++
				}
			}
			ch &lt;- localOcurrences
			wg.Done()
		}()
	}
	// Close the channel when all the goroutines are done
	go func() {
		wg.Wait()
		close(ch)
	}()
	// Acummulate the results of each experiment
	ocurrencesEvent := 0
	for localOcurrences := range ch {
		ocurrencesEvent += localOcurrences
	}
	return float64(ocurrencesEvent) / float64(numTrials)
}

To my surprise, when I run benchmarks on the two versions, I get that the sequential is faster than the concurrent one, with the concurrent version getting better as I decrease the number of goroutines. Why does this happen? I thought the concurrent version will be faster since this is a highly parallelizable problem.

Here is my benchmark's code:

func tossEqualToSix() bool {
	// Simulate the toss of a six-sided die
	roll := rand.Intn(6) + 1
	if roll != 6 {
		return false
	}
	return true
}
const (
	numsSimBenchmark       = 1_000_000
	numGoroutinesBenckmark = 10
)
func BenchmarkSimulationSequential(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		simulationSequential(tossEqualToSix, numsSimBenchmark)
	}
}
func BenchmarkSimulationConcurrent(b *testing.B) {
	for i := 0; i &lt; b.N; i++ {
		simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenckmark)
	}
}

And the results:

goos: linux
goarch: amd64
pkg: github.com/jpuriol/montecarlo
cpu: Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
BenchmarkSimulationSequential-8               36          30453588 ns/op
BenchmarkSimulationConcurrent-8                9         117462720 ns/op
PASS
ok      github.com/jpuriol/montecarlo   2.478s

You can download my code from Github.

答案1

得分: 3

我想详细说明一下我的评论，并将其与代码和基准结果一起发布。

Examine函数使用rand包中的包级别rand函数。这些函数在底层使用rand.Rand的globalRand实例。例如func Intn(n int) int { return globalRand.Intn(n) }。由于随机数生成器不是线程安全的，因此globalRand以以下方式实例化：

/*
 * Top-level convenience functions
 */
var globalRand = New(&lockedSource{src: NewSource(1).(*rngSource)})
type lockedSource struct {
	lk  sync.Mutex
	src *rngSource
}
func (r *lockedSource) Int63() (n int64) {
	r.lk.Lock()
	n = r.src.Int63()
	r.lk.Unlock()
	return
}
...

这意味着所有对rand.Intn的调用都受到全局锁的保护。结果是examine函数"按顺序工作"，因为有锁的存在。更具体地说，每次调用rand.Intn都不会在前一次调用完成之前开始生成随机数。

这是重新设计的代码。每个examine函数都有自己的随机生成器。假设单个examine函数由一个goroutine使用，因此不需要锁保护。

package main
import (
	"math/rand"
	"sync"
	"testing"
	"time"
)
func simulationSequential(experimentFuncFactory func() func() bool, numTrials int) float64 {
	experiment := experimentFuncFactory()
	ocurrencesEvent := 0
	for trial := 0; trial < numTrials; trial++ {
		eventHappend := experiment()
		if eventHappend {
			ocurrencesEvent++
		}
	}
	return float64(ocurrencesEvent) / float64(numTrials)
}
func simulationConcurrent(experimentFuncFactory func() func() bool, numTrials, nGoroutines int) float64 {
	ch := make(chan int)
	var wg sync.WaitGroup
	// Launch work in multiple goroutines
	for i := 0; i < nGoroutines; i++ {
		wg.Add(1)
		go func() {
			experiment := experimentFuncFactory()
			localOcurrences := 0
			for j := 0; j < numTrials/nGoroutines; j++ {
				eventHappend := experiment()
				if eventHappend {
					localOcurrences++
				}
			}
			ch <- localOcurrences
			wg.Done()
		}()
	}
	// Close the channel when all the goroutines are done
	go func() {
		wg.Wait()
		close(ch)
	}()
	// Acummulate the results of each experiment
	ocurrencesEvent := 0
	for localOcurrences := range ch {
		ocurrencesEvent += localOcurrences
	}
	return float64(ocurrencesEvent) / float64(numTrials)
}
func tossEqualToSix() func() bool {
	prng := rand.New(rand.NewSource(time.Now().UnixNano()))
	return func() bool {
		// Simulate the toss of a six-sided die
		roll := prng.Intn(6) + 1
		if roll != 6 {
			return false
		}
		return true
	}
}
const (
	numsSimBenchmark       = 5_000_000
	numGoroutinesBenchmark = 10
)
func BenchmarkSimulationSequential(b *testing.B) {
	for i := 0; i < b.N; i++ {
		simulationSequential(tossEqualToSix, numsSimBenchmark)
	}
}
func BenchmarkSimulationConcurrent(b *testing.B) {
	for i := 0; i < b.N; i++ {
		simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenchmark)
	}
}

基准测试结果如下：

goos: darwin
goarch: arm64
pkg: scratchpad
BenchmarkSimulationSequential-8   	      20	  55142896 ns/op
BenchmarkSimulationConcurrent-8   	      82	  12944360 ns/op

英文:

I thought I would elaborate on my comment and post it with code and benchmark result.

Examine function uses package level rand functions from rand package. These functions under the hood uses globalRand instance of rand.Rand. For example func Intn(n int) int { return globalRand.Intn(n) }. As the random number generator is not thread safe, the globalRand is instantiated in following way:

/*
* Top-level convenience functions
*/
var globalRand = New(&amp;lockedSource{src: NewSource(1).(*rngSource)})
type lockedSource struct {
lk  sync.Mutex
src *rngSource
}
func (r *lockedSource) Int63() (n int64) {
r.lk.Lock()
n = r.src.Int63()
r.lk.Unlock()
return
}
...

This means that all invocations of rand.Intn are guarded by the global lock. The consequence is that examine function "works sequentially", because of the lock. More specifically, each call to rand.Intn will not start generating a random number before the previous call completes.

Here is redesigned code. Each examine function has its own random generator. The assumption is that single examine function is used by one goroutine, so it does not require lock protection.

package main
import (
&quot;math/rand&quot;
&quot;sync&quot;
&quot;testing&quot;
&quot;time&quot;
)
func simulationSequential(experimentFuncFactory func() func() bool, numTrials int) float64 {
experiment := experimentFuncFactory()
ocurrencesEvent := 0
for trial := 0; trial &lt; numTrials; trial++ {
eventHappend := experiment()
if eventHappend {
ocurrencesEvent++
}
}
return float64(ocurrencesEvent) / float64(numTrials)
}
func simulationConcurrent(experimentFuncFactory func() func() bool, numTrials, nGoroutines int) float64 {
ch := make(chan int)
var wg sync.WaitGroup
// Launch work in multiple goroutines
for i := 0; i &lt; nGoroutines; i++ {
wg.Add(1)
go func() {
experiment := experimentFuncFactory()
localOcurrences := 0
for j := 0; j &lt; numTrials/nGoroutines; j++ {
eventHappend := experiment()
if eventHappend {
localOcurrences++
}
}
ch &lt;- localOcurrences
wg.Done()
}()
}
// Close the channel when all the goroutines are done
go func() {
wg.Wait()
close(ch)
}()
// Acummulate the results of each experiment
ocurrencesEvent := 0
for localOcurrences := range ch {
ocurrencesEvent += localOcurrences
}
return float64(ocurrencesEvent) / float64(numTrials)
}
func tossEqualToSix() func() bool {
prng := rand.New(rand.NewSource(time.Now().UnixNano()))
return func() bool {
// Simulate the toss of a six-sided die
roll := prng.Intn(6) + 1
if roll != 6 {
return false
}
return true
}
}
const (
numsSimBenchmark       = 5_000_000
numGoroutinesBenchmark = 10
)
func BenchmarkSimulationSequential(b *testing.B) {
for i := 0; i &lt; b.N; i++ {
simulationSequential(tossEqualToSix, numsSimBenchmark)
}
}
func BenchmarkSimulationConcurrent(b *testing.B) {
for i := 0; i &lt; b.N; i++ {
simulationConcurrent(tossEqualToSix, numsSimBenchmark, numGoroutinesBenchmark)
}
}

Benchmark result are as follows:

goos: darwin
goarch: arm64
pkg: scratchpad
BenchmarkSimulationSequential-8   	      20	  55142896 ns/op
BenchmarkSimulationConcurrent-8   	      82	  12944360 ns/op

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

并发代码在并行问题上比顺序代码慢吗？

问题

答案1

泛型：处理具有相同数据成员类型的不同结构类型

Go和pgx：无法将[+11111111111]转换为Text

使用Golang将一千个节点插入到Neo4j中。

在Golang中仅导出结构体以进行测试。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。