Pandas多列条件下的平均值

huangapple go评论61阅读模式
英文:

Pandas Average If Across Multiple Columns

问题

# 在 pandas 中,我想要计算每种运动中参与者的平均年龄和体重。我知道我可以使用循环,但想知道最高效的方法是什么。

df = pd.DataFrame([
    [0, 1, 0, 30, 150],
    [1, 1, 1, 25, 200],
    [1, 0, 0, 20, 175]
], columns=[
    "Plays Basketball",
    "Plays Soccer",
    "Plays Football",
    "Age",
    "Weight"
])

# 我尝试过使用 `groupby`,但它会为每种可能的运动组合创建一个分组。我只需要每种运动的平均年龄和体重。

结果应为

|                  | 年龄  | 体重 |
| ---------------- | ---- | ------ |
| 打篮球 | 22.5 | 187.5  |
| 踢足球     | 27.5 | 175.0  |
| 踢足球   | 25.0 | 200.0  |
英文:

In pandas, I'd like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.

df = pd.DataFrame([
    [0, 1, 0, 30, 150],
    [1, 1, 1, 25, 200],
    [1, 0, 0, 20, 175]
], columns=[
    "Plays Basketball",
    "Plays Soccer",
    "Plays Football",
    "Age",
    "Weight"
])
Plays Basketball Plays Soccer Plays Football Age Weight
0 1 0 30 150
1 1 1 25 200
1 0 0 20 175

I tried groupby but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.

Result should be:

Age Weight
Plays Basketball 22.5 187.5
Plays Soccer 27.5 175.0
Plays Football 25.0 200.0

答案1

得分: 5

使用 dot 乘积并通过计数进行归一化以获取均值:

df2 = df.filter(like='Plays')

out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)

输出:

                   Age  Weight
Plays Basketball  22.5   187.5
Plays Soccer      27.5   175.0
Plays Football    25.0   200.0
英文:

Use a dot product and normalize by the count to get the mean:

df2 = df.filter(like='Plays')

out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)

Output:

                   Age  Weight
Plays Basketball  22.5   187.5
Plays Soccer      27.5   175.0
Plays Football    25.0   200.0

答案2

得分: 1

你可以为要总结的每一列使用一个groupby:

import pandas as pd

indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
    average = df.groupby(indicator).mean()
    rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)
英文:

You could use one groupby for each column you want to summarize:

import pandas as pd

indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
    average = df.groupby(indicator).mean()
    rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)

huangapple
  • 本文由 发表于 2023年3月4日 00:52:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/75629831.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定