2013年1月13日 06:19:03go评论119阅读模式

英文:

Why does adding concurrency slow down this golang code?

问题

我有一段Go代码，我一直在调试它，以回答我姐夫玩的一个视频游戏中的一个小问题。

基本上，下面的代码模拟了与游戏中的怪物的交互以及他在击败怪物后可以期望它们掉落物品的频率。我遇到的问题是，我希望像这样的代码可以很好地并行化，但是当我添加并发性时，执行所有模拟所需的时间往往会比没有并发性时慢4-6倍。

为了让您更好地了解代码的工作原理，我有三个主要函数：interaction函数是玩家与怪物之间的简单交互。如果怪物掉落物品，则返回1，否则返回0。simulation函数运行多个交互并返回表示结果的切片（即表示成功/失败交互的1和0）。最后，有一个test函数，它运行一组模拟并返回表示掉落物品的总交互次数的模拟结果的切片。我正试图并行运行最后一个函数。

现在，我可以理解为什么如果我为每个要运行的测试创建一个goroutine，代码会变慢。假设我运行100个测试，那么在我MacBook Air上的4个CPU之间的每个goroutine之间的上下文切换将降低性能，但是我只创建了与处理器数量相同的goroutine，并将测试数量分配给这些goroutine。我期望这实际上会加快代码的性能，因为我在并行运行每个测试，但是，当然，我得到了一个主要的减速。

我希望能够弄清楚为什么会发生这种情况，所以非常感谢任何帮助。

以下是没有goroutine的常规代码：

package main
import (
    "fmt"
    "math/rand"
    "time"
)
const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)
/**
 * 模拟与怪物的单次交互
 *
 * 如果怪物掉落物品，则返回1，否则返回0
 */
func interaction() int {
    if rand.Float64() <= DROP_RATE {
        return 1
    }
    return 0
}
/**
 * 运行多个交互并返回表示结果的切片
 */
func simulation(n int) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction()
    }
    return interactions
}
/**
 * 运行多个模拟并返回结果
 */
func test(n int) []int {
    simulations := make([]int, n)
    for i := range simulations {
        successes := 0
        for _, v := range simulation(NUMBER_OF_INTERACTIONS) {
            successes += v
        }
        simulations[i] = successes
    }
    return simulations
}
func main() {
    rand.Seed(time.Now().UnixNano())
    fmt.Println("成功的交互次数：", test(NUMBER_OF_SIMULATIONS))
}

以下是带有goroutine的并发代码：

package main
import (
    "fmt"
    "math/rand"
    "time"
    "runtime"
)
const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)
/**
 * 模拟与怪物的单次交互
 *
 * 如果怪物掉落物品，则返回1，否则返回0
 */
func interaction() int {
    if rand.Float64() <= DROP_RATE {
        return 1
    }
    return 0
}
/**
 * 运行多个交互并返回表示结果的切片
 */
func simulation(n int) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction()
    }
    return interactions
}
/**
 * 运行多个模拟并返回结果
 */
func test(n int, c chan []int) {
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS) {
            simulations[i] += v
        }
    }
    c <- simulations
}
func main() {
    rand.Seed(time.Now().UnixNano())
    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println("CPU数量：", nCPU)
    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }
    // 合并测试结果
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], <-c)
    }
    fmt.Println("成功的交互次数：", results)
}

更新（01/12/13 18:05）

我在下面添加了并发代码的新版本，每个goroutine根据“系统”下面的建议为其创建了一个新的Rand实例。与串行版本的代码相比，我现在看到的速度稍微加快了一些（总体上减少了15-20%的时间）。我想知道为什么我没有看到接近75%的时间减少，因为我将工作负载分散在了我的MBA的4个核心上。有没有人有进一步的建议可以帮助我？

package main
import (
    "fmt"
    "math/rand"
    "time"
    "runtime"
)
const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)
/**
 * 模拟与怪物的单次交互
 *
 * 如果怪物掉落物品，则返回1，否则返回0
 */
func interaction(generator *rand.Rand) int {
    if generator.Float64() <= DROP_RATE {
        return 1
    }
    return 0
}
/**
 * 运行多个交互并返回表示结果的切片
 */
func simulation(n int, generator *rand.Rand) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction(generator)
    }
    return interactions
}
/**
 * 运行多个模拟并返回结果
 */
func test(n int, c chan []int) {
    source := rand.NewSource(time.Now().UnixNano())
    generator := rand.New(source)
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) {
            simulations[i] += v
        }
    }
    c <- simulations
}
func main() {
    rand.Seed(time.Now().UnixNano())
    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println("CPU数量：", nCPU)
    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }
    // 合并测试结果
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], <-c)
    }
    fmt.Println("成功的交互次数：", results)
}

更新（01/13/13 17:58）

感谢大家帮助我找到问题的答案。我最终得到了我想要的答案，所以我想在这里总结一下，以供有相同问题的人参考。

基本上，我有两个主要问题：首先，即使我的代码是尴尬并行的，但当我将其分割到可用处理器中时，它运行得更慢，并且其解决方案引发了另一个问题，即我的串行代码运行速度是并发代码在单个处理器上运行速度的两倍，这应该是大致相同的。在这两种情况下，问题都是随机数生成器函数rand.Float64。基本上，这是rand包提供的一个方便函数。在该包中，创建了一个全局Rand结构的实例，并由每个方便函数使用。这个全局Rand实例有一个与之关联的互斥锁。由于我使用了这个方便函数，我无法真正并行化我的代码，因为每个goroutine都必须排队等待访问全局Rand实例。解决方案（如下面的“系统”建议）是为每个goroutine创建一个单独的Rand结构实例。这解决了第一个问题，但引发了第二个问题。

第二个问题是，我的非并行并发代码（即仅在单个处理器上运行的并发代码）的运行速度是串行代码的两倍。原因是，即使我只使用了一个处理器和一个goroutine，该goroutine也有自己创建的Rand结构实例，并且我创建它时没有使用互斥锁。串行代码仍然使用了rand.Float64方便函数，该函数使用了全局互斥锁保护的Rand实例。获取该锁的成本导致串行代码运行速度减慢了两倍。

所以，故事的寓意是，每当性能很重要时，请确保为Rand结构创建一个实例，并从中调用所需的函数，而不是使用包提供的方便函数。

英文:

I've got a bit of Go code that I've been tinkering with to answer a little curiosity of mine related to a video game my brother-in-law plays.

Essentially, the code below simulates interactions with monsters in the game and how often he can expect them to drop items upon their defeat. The problem I'm having is that I would expect a piece of code like this to be perfect for parallelization, but when I add in concurrency the time it takes to do all of the simulations tends to slow down by 4-6 times the original without concurrency.

To give you a better understanding of how the code works, I have three main functions: The interaction function which is a simple interaction between the player and a monster. It returns 1 if the monster drops an item, and 0 otherwise. The simulation function runs several interactions and returns a slice of interaction results (i.e., 1's and 0's representing successful/unsuccessful interactions). Finally, there is the test function which runs a set of simulations and returns a slice of simulation results which are the total number of interactions that resulted in a dropped item. It's the last function which I am trying to run in parallel.

Now, I could understand why the code would slow down if I created a goroutine for each test that I want to run. Assuming I'm running 100 tests, the context switching between each of the goroutines across the 4 CPUs my MacBook Air has would kill the performance, but I'm only creating as many goroutines as I have processors and dividing the number of tests between the goroutines. I would expect this to actually speed up the code's performance since I am running each of my tests in parallel, but, of course, I'm getting a major slow down instead.

I'd love to figure out why this is happening, so any help would be greatly appreciated.

Below is the regular code without the go routines:

package main
import (
    &quot;fmt&quot;
    &quot;math/rand&quot;
    &quot;time&quot;
)
const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)
/**
 * Simulates a single interaction with a monster
 *
 * Returns 1 if the monster dropped an item and 0 otherwise
 */
func interaction() int {
    if rand.Float64() &lt;= DROP_RATE {
        return 1
    }
    return 0
}
/**
 * Runs several interactions and retuns a slice representing the results
 */
func simulation(n int) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction()
    }
    return interactions
}
/**
 * Runs several simulations and returns the results
 */
func test(n int) []int {
    simulations := make([]int, n)
    for i := range simulations {
        successes := 0
        for _, v := range simulation(NUMBER_OF_INTERACTIONS) {
            successes += v
        }
        simulations[i] = successes
    }
    return simulations
}
func main() {
    rand.Seed(time.Now().UnixNano())
    fmt.Println(&quot;Successful interactions: &quot;, test(NUMBER_OF_SIMULATIONS))
}

And, here is the concurrent code with the goroutines:

package main
import (
    &quot;fmt&quot;
    &quot;math/rand&quot;
    &quot;time&quot;
    &quot;runtime&quot;
)
const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)
/**
 * Simulates a single interaction with a monster
 *
 * Returns 1 if the monster dropped an item and 0 otherwise
 */
func interaction() int {
    if rand.Float64() &lt;= DROP_RATE {
        return 1
    }
    return 0
}
/**
 * Runs several interactions and retuns a slice representing the results
 */
func simulation(n int) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction()
    }
    return interactions
}
/**
 * Runs several simulations and returns the results
 */
func test(n int, c chan []int) {
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS) {
            simulations[i] += v
        }
    }
    c &lt;- simulations
}
func main() {
    rand.Seed(time.Now().UnixNano())
    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println(&quot;Number of CPUs: &quot;, nCPU)
    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }
    // Concatentate the test results
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], &lt;-c)
    }
    fmt.Println(&quot;Successful interactions: &quot;, results)
}

UPDATE (01/12/13 18:05)

I've added a new version of the concurrent code below that creates a new Rand instance for each goroutine per "the system"'s suggestion below. I'm now seeing a very slight speed up compared to the serial version of the code (around a 15-20% reduction in overall time taken). I'd love to know why I don't see something closer to a 75% reduction in time since I'm spreading the workload over my MBA's 4 cores. Does anyone have any further suggestions that could help out?

package main
import (
    &quot;fmt&quot;
    &quot;math/rand&quot;
    &quot;time&quot;
    &quot;runtime&quot;
)
const (
    NUMBER_OF_SIMULATIONS = 1000
    NUMBER_OF_INTERACTIONS = 1000000
    DROP_RATE = 0.0003
)
/**
 * Simulates a single interaction with a monster
 *
 * Returns 1 if the monster dropped an item and 0 otherwise
 */
func interaction(generator *rand.Rand) int {
    if generator.Float64() &lt;= DROP_RATE {
        return 1
    }
    return 0
}
/**
 * Runs several interactions and retuns a slice representing the results
 */
func simulation(n int, generator *rand.Rand) []int {
    interactions := make([]int, n)
    for i := range interactions {
        interactions[i] = interaction(generator)
    }
    return interactions
}
/**
 * Runs several simulations and returns the results
 */
func test(n int, c chan []int) {
    source := rand.NewSource(time.Now().UnixNano())
    generator := rand.New(source)
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) {
            simulations[i] += v
        }
    }
    c &lt;- simulations
}
func main() {
    rand.Seed(time.Now().UnixNano())
    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println(&quot;Number of CPUs: &quot;, nCPU)
    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }
    // Concatentate the test results
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], &lt;-c)
    }
    fmt.Println(&quot;Successful interactions: &quot;, results)
}

UPDATE (01/13/13 17:58)

Thanks everyone for the help in figuring out my problem. I did finally get the answer I was looking for and so I thought I would just summarize here for anyone who has the same problem.

Essentially I had two main issues: first, even though my code was embarrassingly parallel, it was running slower when I split it up amongst the available processors, and second, the solution opened up another issue, which was my serial code was running twice as slow as the concurrent code running on single processor, which you would expect to be roughly the same . In both cases the issue was the random number generator function rand.Float64. Basically, this is a convenience function provided by the rand package. In that package, a global instance of the Rand struct is created and used by each of the convenience functions. This global Rand instance has a mutex lock associated with it. Since I was using this convenience function, I wasn't truly able to parallelize my code since each of the goroutines would have to line up for access to the global Rand instance. The solution (as "the system" suggests below) is to create a separate instance of the Rand struct for each goroutine. This solved the first problem but created the second one.

The second problem was that my non-parallel concurrent code (i.e., my concurrent code running with only a single processor) was running twice as fast as the sequential code. The reason for this was that, even though I was only running with a single processor and a single goroutine, that goroutine had its own instance of the Rand struct that I had created, and I had created it without the mutex lock. The sequential code was still using the rand.Float64 convenience function which made use of the global mutex protected Rand instance. The cost of acquiring that lock was causing the sequential code to run twice as slow.

So, the moral of the story is, whenever performance matters, make sure you create an instance of the Rand struct and call the function you need off of it rather than using the convenience functions provided by the package.

答案1

得分: 44

问题似乎出在你使用了rand.Float64()，它使用了一个带有互斥锁的共享全局对象。

相反，如果为每个CPU创建一个单独的rand.New()，将其传递给interactions()并使用它来创建Float64()，会有巨大的改进。

更新以显示问题中新示例代码的更改，现在使用了rand.New()

test()函数被修改为要么使用给定的通道，要么返回结果。

func test(n int, c chan []int) []int {
    source := rand.NewSource(time.Now().UnixNano())
    generator := rand.New(source)
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) {
            simulations[i] += v
        }   
    }   
    if c == nil {
        return simulations
    }   
    c <- simulations
    return nil 
}

main()函数被更新为同时运行两个测试，并输出计时结果。

func main() {
    rand.Seed(time.Now().UnixNano())
    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println("Number of CPUs: ", nCPU)
    start := time.Now()
    fmt.Println("Successful interactions: ", len(test(NUMBER_OF_SIMULATIONS, nil)))
    fmt.Println(time.Since(start))
    start = time.Now()
    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }
    // Concatentate the test results
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], <-c)
    }
    fmt.Println("Successful interactions: ", len(results))
    fmt.Println(time.Since(start))
}

我收到的输出是：

> Number of CPUs:  2 
>
> Successful interactions:  1000 
> 1m20.39959s
>
> Successful interactions:  1000
> 41.392299s

英文:

The issue seems to come from your use of rand.Float64(), which uses a shared global object with a Mutex lock on it.

Instead, if for each CPU you create a separate rand.New(), pass it through to the interactions(), and use it to create the Float64(), there's a massive improvement.

Update to show the changes to the new example code in the question that now uses rand.New()

The test() function was modified to either use a given channel, or return the result.

func test(n int, c chan []int) []int {
    source := rand.NewSource(time.Now().UnixNano())
    generator := rand.New(source)
    simulations := make([]int, n)
    for i := range simulations {
        for _, v := range simulation(NUMBER_OF_INTERACTIONS, generator) {
            simulations[i] += v
        }   
    }   
    if c == nil {
        return simulations
    }   
    c &lt;- simulations
    return nil 
}

The main() function was updated to run both tests, and output the timed result.

func main() {
    rand.Seed(time.Now().UnixNano())
    nCPU := runtime.NumCPU()
    runtime.GOMAXPROCS(nCPU)
    fmt.Println(&quot;Number of CPUs: &quot;, nCPU)
    start := time.Now()
    fmt.Println(&quot;Successful interactions: &quot;, len(test(NUMBER_OF_SIMULATIONS, nil)))
    fmt.Println(time.Since(start))
    start = time.Now()
    tests := make([]chan []int, nCPU)
    for i := range tests {
        c := make(chan []int)
        go test(NUMBER_OF_SIMULATIONS/nCPU, c)
        tests[i] = c
    }
    // Concatentate the test results
    results := make([]int, NUMBER_OF_SIMULATIONS)
    for i, c := range tests {
        start := (NUMBER_OF_SIMULATIONS/nCPU) * i
        stop := (NUMBER_OF_SIMULATIONS/nCPU) * (i+1)
        copy(results[start:stop], &lt;-c)
    }
    fmt.Println(&quot;Successful interactions: &quot;, len(results))
    fmt.Println(time.Since(start))
}

The output is I received:

<pre>
> Number of CPUs: 2
>
> Successful interactions: 1000
> 1m20.39959s
>
> Successful interactions: 1000
> 41.392299s
</pre>

答案2

得分: 7

在我的Linux四核i7笔记本上测试你的代码，我得到了这个结果。

这是一个Google电子表格。

为什么在这个golang代码中添加并发会减慢速度？

这显示在Linux下，至少每个核心的缩放几乎是线性的。

我认为有两个原因你没有看到这一点。

第一个原因是你的MacBook Air只有2个真实的核心。它有4个超线程，这就是为什么它报告最大CPU数为4。超线程通常只比单个核心多出15%的性能，而不是你可能期望的100%。所以在MacBook Air上只进行1或2个CPU的基准测试！

另一个原因可能是OS X线程性能与Linux的比较。它们使用不同的线程模型，可能会影响性能。

英文:

Testing your code on my Linux quad core i7 laptop I get this

Here is a Google Spreadsheet

为什么在这个golang代码中添加并发会减慢速度？

This shows that under Linux at least the scaling is very nearly linear per core.

I think there may be two reasons why you aren't seeing this.

The first is that your macbook air only has 2 real cores. It has 4 hyperthreads though which is why it reports 4 as max cpus. A hyperthread typically only gives an extra 15% performance over a single core rather than the 100% you might expect. So stick to benchmarking 1 or 2 CPUs only on the macbook air!

The other reason might be OS X thread performance compared to Linux. They use different threading models which may be affecting performance.

答案3

得分: 3

你的代码正在对二项式随机变量B(N, p)进行抽样，其中N是试验次数（这里是100万次），p是每次试验成功的概率（这里是0.0003）。

一种方法是构建一个累积概率表T，其中T[i]包含总试验次数小于或等于i的概率。然后，为了产生一个样本，你可以选择一个均匀分布的随机变量（通过rand.Float64）并找到表中第一个概率大于或等于它的索引。

这里稍微复杂一些，因为你有一个非常大的N和一个相对较小的p，所以如果你尝试构建表，会遇到非常小的数字和算术精度的问题。但是你可以构建一个较小的表（比如1000个元素），并对其进行1000次抽样，以获得100万次试验。

下面是一些完成所有这些工作的代码。它不太优雅（1000是硬编码的），但在我的旧笔记本电脑上可以在不到一秒钟内生成1000个模拟。可以进一步优化，例如将BinomialSampler的构建提出循环，或者使用二分搜索而不是线性扫描来找到表索引。

package main
import (
	"fmt"
	"math"
	"math/rand"
)
type BinomialSampler []float64
func (bs BinomialSampler) Sample() int {
	r := rand.Float64()
	for i := 0; i < len(bs); i++ {
		if bs[i] >= r {
			return i
		}
	}
	return len(bs)
}
func NewBinomialSampler(N int, p float64) BinomialSampler {
	r := BinomialSampler(make([]float64, N+1))
	T := 0.0
	choice := 1.0
	for i := 0; i <= N; i++ {
		T += choice * math.Pow(p, float64(i)) * math.Pow(1-p, float64(N-i))
		r[i] = T
		choice *= float64(N-i) / float64(i+1)
	}
	return r
}
func WowSample(N int, p float64) int {
	if N%1000 != 0 {
		panic("N must be a multiple of 1000")
	}
	bs := NewBinomialSampler(1000, p)
	r := 0
	for i := 0; i < N; i += 1000 {
		r += bs.Sample()
	}
	return r
}
func main() {
	for i := 0; i < 1000; i++ {
		fmt.Println(WowSample(1000000, 0.0003))
	}
}

英文:

Your code is sampling a binomial random variable, B(N, p) where N is the number of trials (here 1M), and p is the probability of a successful individual trial (here 0.0003).

One way to do this is to build a table T of cumulative probabilities, where T[i] contains the probability that the total number of trials is less than or equal to i. To then produce a sample, you can pick a uniform random variable (via rand.Float64) and find the first index in the table that contains a probability greater than or equal to it.

It's a little more complicated here because you've got a really big N and a fairly small p, so if you try to build the table you get trouble with really small numbers and arithmetic accuracy. But you can build a smaller table (say 1000 large) and sample it 1000 times to get your 1 million trials.

Here's some code that does all this. It's not too elegant (1000 is hard-coded in), but it generates 1000 simulations in less than a second on my old laptop. It's easy to optimise further, by for example lifting the construction of the BinomialSampler out of the loop, or by using binary search rather than a linear scan to find the table index.

package main
import (
	&quot;fmt&quot;
	&quot;math&quot;
	&quot;math/rand&quot;
)
type BinomialSampler []float64
func (bs BinomialSampler) Sample() int {
	r := rand.Float64()
	for i := 0; i &lt; len(bs); i++ {
		if bs[i] &gt;= r {
			return i
		}
	}
	return len(bs)
}
func NewBinomialSampler(N int, p float64) BinomialSampler {
	r := BinomialSampler(make([]float64, N+1))
	T := 0.0
	choice := 1.0
	for i := 0; i &lt;= N; i++ {
		T += choice * math.Pow(p, float64(i)) * math.Pow(1-p, float64(N-i))
		r[i] = T
		choice *= float64(N-i) / float64(i+1)
	}
	return r
}
func WowSample(N int, p float64) int {
	if N%1000 != 0 {
		panic(&quot;N must be a multiple of 1000&quot;)
	}
	bs := NewBinomialSampler(1000, p)
	r := 0
	for i := 0; i &lt; N; i += 1000 {
		r += bs.Sample()
	}
	return r
}
func main() {
	for i := 0; i &lt; 1000; i++ {
		fmt.Println(WowSample(1000000, 0.0003))
	}
}

答案4

得分: 1

我的结果显示，使用4个CPU与使用1个CPU相比，存在显著的并发性：

Intel Core 2 Quad CPU Q8300 @ 2.50GHz x 4

源代码：更新（01/12/13 18:05）

$ go version
go version devel +adf4e96e9aa4 Thu Jan 10 09:57:01 2013 +1100 linux/amd64
$ time  go run temp.go
CPU数量：1
真实时间	0m30.305s
用户时间	0m30.210s
系统时间	0m0.044s
$ time  go run temp.go
CPU数量：4
真实时间	0m9.980s
用户时间	0m35.146s
系统时间	0m0.204s

英文:

My results, which show substantial concurrency for 4 CPUs versus 1 CPU:

Intel Core 2 Quad CPU Q8300 @ 2.50GHz x 4

Source code: UPDATE (01/12/13 18:05)

$ go version
go version devel +adf4e96e9aa4 Thu Jan 10 09:57:01 2013 +1100 linux/amd64
$ time  go run temp.go
Number of CPUs:  1
real	0m30.305s
user	0m30.210s
sys		0m0.044s
$ time  go run temp.go
Number of CPUs:  4
real	0m9.980s
user	0m35.146s
sys		0m0.204s

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在这个golang代码中添加并发会减慢速度？

问题

答案1

答案2

答案3

答案4

在Go中合并地图

Google Go的goroutine中断模式（速度问题）

Golang []map[string]struct 赋值错误。

Golang – 解析额外的 XML 属性

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论