问题

我有一个包含学生成绩和成绩人口的样本：

# 创建DataFrame
sample = pd.DataFrame(
{'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576], 
'population':[ 705,  745,  716,  742,  722,  746,  796,  750,  816,  809,  815,821,  820,  865,  876,  886,  947,  949, 1018,  967]})

然后，我计算了加权平均分数：

np.average(sample['score'], weights=sample['population'])
# 584.9062443219672

然而，当我运行sample.describe()时，它没有考虑权重：

sample.describe()

           score   population
count   20.00000    20.000000
mean   585.50000   825.550000
std      5.91608    91.465539
min    576.00000   705.000000
25%    580.75000   745.750000
50%    585.50000   815.500000
75%    590.25000   878.500000
max    595.00000  1018.000000

如何在sample.describe()中考虑权重？

英文:

I have such a sample with student's score and population of the score:

# Create the DataFrame
sample = pd.DataFrame(
{&#39;score&#39;:[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576], 
&#39;population&#39;:[ 705,  745,  716,  742,  722,  746,  796,  750,  816,  809,  815,821,  820,  865,  876,  886,  947,  949, 1018,  967]})

The I calculate it's weigthed average of scores：

np.average(sample[&#39;score&#39;], weights=sample[&#39;population&#39;])
# 584.9062443219672

However, when I run sample.describe(), it not get weights considered:

sample.describe()

           score   population
count   20.00000    20.000000
mean   585.50000   825.550000
std      5.91608    91.465539
min    576.00000   705.000000
25%    580.75000   745.750000
50%    585.50000   815.500000
75%    590.25000   878.500000
max    595.00000  1018.000000

How could get weights included in sample.describe()?

答案1

得分: 1

你需要自定义函数，因为输出是标量，所有列中的值都相同：

def describe(df, stats):
    d = df.describe()
    d.loc[stats] = np.average(df['score'], weights=df['population'])
    return d

out = describe(sample, 'wa')
print(out)

               score  population
count     20.000000   20.000000
mean     585.500000  825.550000
std        5.916080   91.465539
min      576.000000  705.000000
25%      580.750000  745.750000
50%      585.500000  815.500000
75%      590.250000  878.500000
max      595.000000 1018.000000
wa       584.906244  584.906244

英文:

You need custom function, because ouput is scalar get same values in all columns:

def describe(df, stats):
    d = df.describe()
    d.loc[stats] = np.average(df[&#39;score&#39;], weights=df[&#39;population&#39;])
    return d

out = describe(sample, &#39;wa&#39;)
print (out)
            score   population
count   20.000000    20.000000
mean   585.500000   825.550000
std      5.916080    91.465539
min    576.000000   705.000000
25%    580.750000   745.750000
50%    585.500000   815.500000
75%    590.250000   878.500000
max    595.000000  1018.000000
wa     584.906244   584.906244

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在DataFrame.describe中考虑权重？

问题

答案1

根据匹配的值和使用索引作为参考更新值。

`AttributeError`在使用tikzplotlib绘制图例时发生。

为什么在整数运算函数中我会得到一个浮点数据类型？

同时按照分组变量重新排序矩阵的行和列。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论