Probability distribution of results from one, two and more draws

huangapple go评论107阅读模式
英文:

Probability distribution of results from one, two and more draws

问题

我正在学习Python,我发现从我的角度来看有一些不直观的事情。我试图打印高斯曲线,基于彩票的输出。在该程序中,我可以设置绘制范围、每个游戏中的绘制次数以及游戏次数。我将每场比赛中的绘制结果相加。我记录了结果发生的次数,并根据这些数据绘制图表。

当我在游戏中设置一次绘制时,那么每个值的概率都是相同的。这在附图上以红色可见。这是我预期的。

当我设置三次或更多次绘制时,中间值的概率较高。例如,如果我有3次绘制,在范围从0到100之间,那么我可以期望值的总和将在0到300之间,最有可能的值将是150。当我在图表上绘制时,我得到了高斯曲线。在图表上是蓝色的。

不直观的情况是当我设置两次绘制时。我预期曲线将与前一种情况相同,但我看到输出与三角形相似。它是绿色的曲线。

图像链接

问题如下:

  1. 两次绘制和三次或更多次绘制之间的基本区别是什么,为什么输出曲线不同?

  2. 为什么当我设置两次绘制时,我不会得到高斯曲线?

Python 代码:(代码部分不翻译,仅返回原文)


import random
import matplotlib.pyplot as plt
import collections

class GaussGame():
    def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000):
        self.draw_range = draw_range
        self.number_of_draws = number_of_draws
        self.number_of_games = number_of_games

    def start(self):
        # 创建包含可能赢的金额和每个可能金额的赢的次数的字典。
        win_dict = collections.OrderedDict()
        for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
            win_dict[x] = 0

        # 循环进行所有游戏
        for x in range(self.number_of_games):
            # 循环进行一场比赛
            d_sum = 0  # 绘制值的总和
            for x in range(self.number_of_draws):
                d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
            win_dict[d_sum] += 1
        return win_dict

def main():
    # 当我运行游戏多次,使用不同的number_of_draws参数,并将其绘制在一个图表上,我可以得到有趣的图像 :-D
    g1 = GaussGame({min: 0, max: 100}, 1, 10000000)
    g2 = GaussGame({min: 0, max: 100}, 2, 10000000)
    g3 = GaussGame({min: 0, max: 100}, 3, 10000000)
    g4 = GaussGame({min: 0, max: 100}, 4, 10000000)
    g5 = GaussGame({min: 0, max: 100}, 5, 10000000)

    d1 = g1.start()
    d2 = g2.start()
    d3 = g3.start()
    d4 = g4.start()
    d5 = g5.start()

    plt.plot(d1.keys(), d1.values(), 'r.')
    plt.plot(d2.keys(), d2.values(), 'g.')
    plt.plot(d3.keys(), d3.values(), 'b.')
    plt.plot(d4.keys(), d4.values(), 'b.')
    plt.plot(d5.keys(), d5.values(), 'b.')
    plt.show()

if __name__ == "__main__":
    main()
英文:

I am learning python and I found something not intuitive from my perspective. I was trying to print Gausses curve, based on output from lottery. In that program I can set draw range, number of draws in one game and number of games. I sum results of draws in each game. I record how many times the result occurred, and based on that data I draw the graph.

When I set one draw in game, then each value probability is the same. It is visible in red colour on attached graph. And I expected that.

When I set three or more draws, the middle value probability is high. For example, if I have 3 draw in range from 0 to 100 then I can expect that sum of value will be in range from 0 to 300 and most probable value will be 150. When I draw in on graph, then I get Gauss curve. It is blue in graph.

The non intuitive case is when I set two draws. I expected that curve will be the same like in previous case, but I see that output is similar to triangular. It is green curve.

--> Graph image <--

The questions are:

  1. What is fundamental difference between two and more draw and why the output curves is different?

  2. Why when I set two draw then I will not get Gauss curve?

Python code:


import random
import matplotlib.pyplot as plt
import collections
class GaussGame():
def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000) -&gt; None:
self.draw_range = draw_range
self.number_of_draws = number_of_draws
self.number_of_games = number_of_games
def start(self):
#Create win dictionary which contains amounts of possible wins as a key and, number of wins for each possible amounts as a value.
win_dict = collections.OrderedDict()
for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
win_dict[x]=0
#Loop for all games
for x in range(self.number_of_games):
#Loop for one game
d_sum = 0 #Sum of the drawn values
d_sum
for x in range(self.number_of_draws):
d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
win_dict[d_sum] += 1
return win_dict
def main():
#When I run game several times, with different number_of_draws parameter and draw it on one graph, then I can get interesting picture :-D
g1 = GaussGame({min: 0, max: 100},1,10000000)
g2 = GaussGame({min: 0, max: 100},2,10000000)
g3 = GaussGame({min: 0, max: 100},3,10000000)
g4 = GaussGame({min: 0, max: 100},4,10000000)
g5 = GaussGame({min: 0, max: 100},5,10000000)
d1 = g1.start()
d2 = g2.start()
d3 = g3.start()
d4 = g4.start()
d5 = g5.start()
plt.plot(d1.keys(), d1.values(), &#39;r.&#39;)
plt.plot(d2.keys(), d2.values(), &#39;g.&#39;)
plt.plot(d3.keys(), d3.values(), &#39;b.&#39;)
plt.plot(d4.keys(), d4.values(), &#39;b.&#39;)
plt.plot(d5.keys(), d5.values(), &#39;b.&#39;)
plt.show()
if __name__ == &quot;__main__&quot;:
main()

答案1

得分: 1

这看起来差不多正确。我相信你所看到的是伊尔温-霍尔分布或其变种。

当你对少数样本求和时,它不服从高斯分布,但一旦有许多样本,它会收敛到高斯分布,参见中心极限定理

英文:

That looks about right. What you see, I believe, is Irwin-Hall distribution, or its variation.

When you sum small number of samples, it is not gaussian, but converges to it as soon as there are many samples, see CLT

huangapple
  • 本文由 发表于 2023年2月26日 22:12:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75572563.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定