英文:
How could get weights considered in DataFrame.describe?
问题
我有一个包含学生成绩和成绩人口的样本:
# 创建DataFrame
sample = pd.DataFrame(
{'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576],
'population':[ 705, 745, 716, 742, 722, 746, 796, 750, 816, 809, 815,821, 820, 865, 876, 886, 947, 949, 1018, 967]})
然后,我计算了加权平均分数:
np.average(sample['score'], weights=sample['population'])
# 584.9062443219672
然而,当我运行sample.describe()时,它没有考虑权重:
sample.describe()
score population
count 20.00000 20.000000
mean 585.50000 825.550000
std 5.91608 91.465539
min 576.00000 705.000000
25% 580.75000 745.750000
50% 585.50000 815.500000
75% 590.25000 878.500000
max 595.00000 1018.000000
如何在sample.describe()中考虑权重?
英文:
I have such a sample with student's score and population of the score:
# Create the DataFrame
sample = pd.DataFrame(
{'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576],
'population':[ 705, 745, 716, 742, 722, 746, 796, 750, 816, 809, 815,821, 820, 865, 876, 886, 947, 949, 1018, 967]})
The I calculate it's weigthed average of scores:
np.average(sample['score'], weights=sample['population'])
# 584.9062443219672
However, when I run sample.describe(), it not get weights considered:
sample.describe()
score population
count 20.00000 20.000000
mean 585.50000 825.550000
std 5.91608 91.465539
min 576.00000 705.000000
25% 580.75000 745.750000
50% 585.50000 815.500000
75% 590.25000 878.500000
max 595.00000 1018.000000
How could get weights included in sample.describe()?
答案1
得分: 1
你需要自定义函数,因为输出是标量,所有列中的值都相同:
def describe(df, stats):
d = df.describe()
d.loc[stats] = np.average(df['score'], weights=df['population'])
return d
out = describe(sample, 'wa')
print(out)
score population
count 20.000000 20.000000
mean 585.500000 825.550000
std 5.916080 91.465539
min 576.000000 705.000000
25% 580.750000 745.750000
50% 585.500000 815.500000
75% 590.250000 878.500000
max 595.000000 1018.000000
wa 584.906244 584.906244
英文:
You need custom function, because ouput is scalar get same values in all columns:
def describe(df, stats):
d = df.describe()
d.loc[stats] = np.average(df['score'], weights=df['population'])
return d
out = describe(sample, 'wa')
print (out)
score population
count 20.000000 20.000000
mean 585.500000 825.550000
std 5.916080 91.465539
min 576.000000 705.000000
25% 580.750000 745.750000
50% 585.500000 815.500000
75% 590.250000 878.500000
max 595.000000 1018.000000
wa 584.906244 584.906244
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论