英文:
Probability distribution of results from one, two and more draws
问题
我正在学习Python,我发现从我的角度来看有一些不直观的事情。我试图打印高斯曲线,基于彩票的输出。在该程序中,我可以设置绘制范围、每个游戏中的绘制次数以及游戏次数。我将每场比赛中的绘制结果相加。我记录了结果发生的次数,并根据这些数据绘制图表。
当我在游戏中设置一次绘制时,那么每个值的概率都是相同的。这在附图上以红色可见。这是我预期的。
当我设置三次或更多次绘制时,中间值的概率较高。例如,如果我有3次绘制,在范围从0到100之间,那么我可以期望值的总和将在0到300之间,最有可能的值将是150。当我在图表上绘制时,我得到了高斯曲线。在图表上是蓝色的。
不直观的情况是当我设置两次绘制时。我预期曲线将与前一种情况相同,但我看到输出与三角形相似。它是绿色的曲线。
问题如下:
-
两次绘制和三次或更多次绘制之间的基本区别是什么,为什么输出曲线不同?
-
为什么当我设置两次绘制时,我不会得到高斯曲线?
Python 代码:(代码部分不翻译,仅返回原文)
import random
import matplotlib.pyplot as plt
import collections
class GaussGame():
def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000):
self.draw_range = draw_range
self.number_of_draws = number_of_draws
self.number_of_games = number_of_games
def start(self):
# 创建包含可能赢的金额和每个可能金额的赢的次数的字典。
win_dict = collections.OrderedDict()
for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
win_dict[x] = 0
# 循环进行所有游戏
for x in range(self.number_of_games):
# 循环进行一场比赛
d_sum = 0 # 绘制值的总和
for x in range(self.number_of_draws):
d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
win_dict[d_sum] += 1
return win_dict
def main():
# 当我运行游戏多次,使用不同的number_of_draws参数,并将其绘制在一个图表上,我可以得到有趣的图像 :-D
g1 = GaussGame({min: 0, max: 100}, 1, 10000000)
g2 = GaussGame({min: 0, max: 100}, 2, 10000000)
g3 = GaussGame({min: 0, max: 100}, 3, 10000000)
g4 = GaussGame({min: 0, max: 100}, 4, 10000000)
g5 = GaussGame({min: 0, max: 100}, 5, 10000000)
d1 = g1.start()
d2 = g2.start()
d3 = g3.start()
d4 = g4.start()
d5 = g5.start()
plt.plot(d1.keys(), d1.values(), 'r.')
plt.plot(d2.keys(), d2.values(), 'g.')
plt.plot(d3.keys(), d3.values(), 'b.')
plt.plot(d4.keys(), d4.values(), 'b.')
plt.plot(d5.keys(), d5.values(), 'b.')
plt.show()
if __name__ == "__main__":
main()
英文:
I am learning python and I found something not intuitive from my perspective. I was trying to print Gausses curve, based on output from lottery. In that program I can set draw range, number of draws in one game and number of games. I sum results of draws in each game. I record how many times the result occurred, and based on that data I draw the graph.
When I set one draw in game, then each value probability is the same. It is visible in red colour on attached graph. And I expected that.
When I set three or more draws, the middle value probability is high. For example, if I have 3 draw in range from 0 to 100 then I can expect that sum of value will be in range from 0 to 300 and most probable value will be 150. When I draw in on graph, then I get Gauss curve. It is blue in graph.
The non intuitive case is when I set two draws. I expected that curve will be the same like in previous case, but I see that output is similar to triangular. It is green curve.
The questions are:
-
What is fundamental difference between two and more draw and why the output curves is different?
-
Why when I set two draw then I will not get Gauss curve?
Python code:
import random
import matplotlib.pyplot as plt
import collections
class GaussGame():
def __init__(self, draw_range = {min: 0, max: 100}, number_of_draws = 5, number_of_games = 100000) -> None:
self.draw_range = draw_range
self.number_of_draws = number_of_draws
self.number_of_games = number_of_games
def start(self):
#Create win dictionary which contains amounts of possible wins as a key and, number of wins for each possible amounts as a value.
win_dict = collections.OrderedDict()
for x in range(self.draw_range[min]*self.number_of_draws, self.draw_range[max]*self.number_of_draws+1):
win_dict[x]=0
#Loop for all games
for x in range(self.number_of_games):
#Loop for one game
d_sum = 0 #Sum of the drawn values
d_sum
for x in range(self.number_of_draws):
d_sum += random.randrange(self.draw_range[min], self.draw_range[max]+1)
win_dict[d_sum] += 1
return win_dict
def main():
#When I run game several times, with different number_of_draws parameter and draw it on one graph, then I can get interesting picture :-D
g1 = GaussGame({min: 0, max: 100},1,10000000)
g2 = GaussGame({min: 0, max: 100},2,10000000)
g3 = GaussGame({min: 0, max: 100},3,10000000)
g4 = GaussGame({min: 0, max: 100},4,10000000)
g5 = GaussGame({min: 0, max: 100},5,10000000)
d1 = g1.start()
d2 = g2.start()
d3 = g3.start()
d4 = g4.start()
d5 = g5.start()
plt.plot(d1.keys(), d1.values(), 'r.')
plt.plot(d2.keys(), d2.values(), 'g.')
plt.plot(d3.keys(), d3.values(), 'b.')
plt.plot(d4.keys(), d4.values(), 'b.')
plt.plot(d5.keys(), d5.values(), 'b.')
plt.show()
if __name__ == "__main__":
main()
答案1
得分: 1
这看起来差不多正确。我相信你所看到的是伊尔温-霍尔分布或其变种。
当你对少数样本求和时,它不服从高斯分布,但一旦有许多样本,它会收敛到高斯分布,参见中心极限定理。
英文:
That looks about right. What you see, I believe, is Irwin-Hall distribution, or its variation.
When you sum small number of samples, it is not gaussian, but converges to it as soon as there are many samples, see CLT
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论