英文:
Turning bars to a normal distribution
问题
我是新手学习Python。
我有两个数组和一个漂亮的柱状图:
# 买家的百分比
h = [1, 1, 3, 5, 9, 13, 16, 16, 14, 10, 5, 4, 2, 1, 0]
# 服装尺码
x = [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
# P(X=40) = 16% // 有些买家购买40号尺码的概率为16%
# P(37 <= X <= 40) = 5+9+13+16 = 43% // 有些买家购买尺码在37到40之间的概率为43%
plt.ylabel('买家百分比')
plt.xlabel('服装尺码')
plt.bar(x, height=h)
plt.grid(True)
plt.show()
- 我该如何使用seaborn或scipy.stats.norm将其转换为密度曲线和正态分布,并在柱状图上绘制它?
- 之后,我如何使用正态分布来计算P(X<40)?
谢谢。
英文:
I'm new to python .
I have 2 arrays, and a nice bars graph :
# Buyers in %
h =[1,1,3,5,9,13,16,16,14,10,5,4,2,1,0]
# Clothes size
x = [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]
# P(X=40) = 16 % // The probability that some buyers gets a 40 sized clothe is 16 %
# P(37 <= X <= 40) = 5+9+13+16 = 43 % // The probability that somes buyers gets between 37 and 40 sized clothes is 43 %
plt.ylabel('Buyers % ')
plt.xlabel('Clothes Size')
plt.bar(x, height = h)
plt.grid(True)
plt.show()
- How could I turn that to a density line and a normal distribution using seaborn or scipy.stats.norm and draw it over the bars ?
- After that , How could I calculate P(X<40) using the normal distribution ?
Thank you.
答案1
得分: 1
使用seaborn:
# 买家比例
h = [1, 1, 3, 5, 9, 13, 16, 16, 14, 10, 5, 4, 2, 1, 0]
# 服装尺寸
x = [34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48]
import seaborn as sns
from scipy.stats import norm
data = []
for i in range(len(x)): data += [x[i]] * h[i]
sns.set()
plt.figure(figsize=(10,5), dpi=300)
sns.distplot(data, fit=norm, kde=False)
要获取概率:
from scipy.stats import norm
import numpy as np
sample = data
sample_mean = np.array(data).mean()
sample_std = np.array(data).std()
min_value = int(sample_mean - 4 * sample_std)
max_value = int(sample_mean + 4 * sample_std)
dist = norm(sample_mean, sample_std)
values = [value for value in range(min_value, max_value)]
probabilities = [dist.pdf(value) for value in values]
# plt.plot(values, probabilities)
def prob(min_lim, max_lim):
p = (np.array(values) > min_lim).astype(int) * (np.array(values) < max_lim).astype(int)
prob = (np.array(probabilities)).sum()
return prob
prob(0, 40)
Out[2]: 0.3230891372830226
注意:这与计算得到的值不同,因为它使用了从数据的均值和标准差连续估计的正态分布。
如果您不想使用连续估计,代码只需为:
len(np.array(data)[np.array(data) < 40]) / len(data)
Out[2]: 0.32
英文:
Using seaborn:
# Buyers in %
h =[1,1,3,5,9,13,16,16,14,10,5,4,2,1,0]
# Clothes size
x = [34,35,36,37,38,39,40,41,42,43,44,45,46,47,48]
import seaborn as sns
from scipy.stats import norm
data = []
for i in range(len(x)): data += [x[i]]*h[i]
sns.set()
plt.figure(figsize=(10,5),dpi=300)
sns.distplot(data, fit=norm, kde=False)
To get the probability:
from scipy.stats import norm
import numpy as np
sample = data
sample_mean = np.array(data).mean()
sample_std = np.array(data).std()
min_value = int(sample_mean-4*sample_std)
max_value = int(sample_mean+4*sample_std)
dist = norm(sample_mean, sample_std)
values = [value for value in range(min_value, max_value)]
probabilities = [dist.pdf(value) for value in values]
#plt.plot(values,probabilities)
def prob(min_lim,max_lim):
p = (np.array(values)>min_lim).astype(int)* (np.array(values)<max_lim).astype(int)
prob = (np.array(probabilities)).sum()
return prob
prob(0,40)
Out[2]: 0.3230891372830226
NOTE: it's different from the calculated one because it's using a continuous estimated normal distribution from the mean and standard deviation of your data.
If you don't want to use the continuous estimation, the code is just:
len(np.array(data)[np.array(data)<40])/len(data)
Out[2]: 0.32
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论