将条形图转换为正态分布

huangapple go评论80阅读模式
英文:

Turning bars to a normal distribution

问题

我是新手学习Python。

我有两个数组和一个漂亮的柱状图:

# 买家的百分比
h = [1, 1, 3, 5, 9, 13, 16, 16, 14, 10, 5, 4, 2, 1, 0]

# 服装尺码
x = [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]

# P(X=40) = 16% // 有些买家购买40号尺码的概率为16%
# P(37 <= X <= 40) = 5+9+13+16 = 43% // 有些买家购买尺码在37到40之间的概率为43%

plt.ylabel('买家百分比')
plt.xlabel('服装尺码')
plt.bar(x, height=h)
plt.grid(True)
plt.show()
  1. 我该如何使用seaborn或scipy.stats.norm将其转换为密度曲线和正态分布,并在柱状图上绘制它?
  2. 之后,我如何使用正态分布来计算P(X<40)?

谢谢。

英文:

I'm new to python .

I have 2 arrays, and a nice bars graph :

# Buyers in %
h =[1,1,3,5,9,13,16,16,14,10,5,4,2,1,0]

# Clothes size
x =  [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]

# P(X=40) =  16 % // The probability that some buyers gets a 40 sized clothe is 16 %
# P(37 &lt;= X &lt;= 40)  = 5+9+13+16 = 43 % // The probability that somes buyers gets between 37 and 40 sized clothes is 43 %

plt.ylabel(&#39;Buyers % &#39;)
plt.xlabel(&#39;Clothes Size&#39;)
plt.bar(x, height = h)
plt.grid(True)
plt.show()

将条形图转换为正态分布

  1. How could I turn that to a density line and a normal distribution using seaborn or scipy.stats.norm and draw it over the bars ?
  2. After that , How could I calculate P(X<40) using the normal distribution ?

Thank you.

答案1

得分: 1

使用seaborn:

# 买家比例
h = [1, 1, 3, 5, 9, 13, 16, 16, 14, 10, 5, 4, 2, 1, 0]

# 服装尺寸
x = [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
import seaborn as sns
from scipy.stats import norm
data = []
for i in range(len(x)): data += [x[i]] * h[i] 
sns.set()
plt.figure(figsize=(10,5), dpi=300)
sns.distplot(data, fit=norm, kde=False)

将条形图转换为正态分布

要获取概率:

from scipy.stats import norm
import numpy as np
sample = data
sample_mean = np.array(data).mean()
sample_std = np.array(data).std()
min_value = int(sample_mean - 4 * sample_std)
max_value = int(sample_mean + 4 * sample_std)
dist = norm(sample_mean, sample_std)
values = [value for value in range(min_value, max_value)]
probabilities = [dist.pdf(value) for value in values]

# plt.plot(values, probabilities)

def prob(min_lim, max_lim):
    p = (np.array(values) > min_lim).astype(int) * (np.array(values) < max_lim).astype(int)
    prob = (np.array(probabilities)

).sum() return prob prob(0, 40) Out[2]: 0.3230891372830226

注意:这与计算得到的值不同,因为它使用了从数据的均值和标准差连续估计的正态分布。

如果您不想使用连续估计,代码只需为:

len(np.array(data)[np.array(data) < 40]) / len(data)
Out[2]: 0.32
英文:

Using seaborn:

# Buyers in %
h =[1,1,3,5,9,13,16,16,14,10,5,4,2,1,0]

# Clothes size
x =  [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]
import seaborn as sns
from scipy.stats import norm
data = []
for i in range(len(x)): data += [x[i]]*h[i] 
sns.set()
plt.figure(figsize=(10,5),dpi=300)
sns.distplot(data, fit=norm, kde=False)

将条形图转换为正态分布

To get the probability:

from scipy.stats import norm
import numpy as np
sample = data
sample_mean = np.array(data).mean()
sample_std = np.array(data).std()
min_value = int(sample_mean-4*sample_std)
max_value = int(sample_mean+4*sample_std)
dist = norm(sample_mean, sample_std)
values = [value for value in range(min_value, max_value)]
probabilities = [dist.pdf(value) for value in values]

#plt.plot(values,probabilities)

def prob(min_lim,max_lim):
    p = (np.array(values)&gt;min_lim).astype(int)* (np.array(values)&lt;max_lim).astype(int)
    prob = (np.array(probabilities)

).sum() return prob prob(0,40) Out[2]: 0.3230891372830226

NOTE: it's different from the calculated one because it's using a continuous estimated normal distribution from the mean and standard deviation of your data.

If you don't want to use the continuous estimation, the code is just:

len(np.array(data)[np.array(data)&lt;40])/len(data)
Out[2]: 0.32

huangapple
  • 本文由 发表于 2020年1月4日 01:14:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582648.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定