英文:
Pandas Average If Across Multiple Columns
问题
# 在 pandas 中,我想要计算每种运动中参与者的平均年龄和体重。我知道我可以使用循环,但想知道最高效的方法是什么。
df = pd.DataFrame([
[0, 1, 0, 30, 150],
[1, 1, 1, 25, 200],
[1, 0, 0, 20, 175]
], columns=[
"Plays Basketball",
"Plays Soccer",
"Plays Football",
"Age",
"Weight"
])
# 我尝试过使用 `groupby`,但它会为每种可能的运动组合创建一个分组。我只需要每种运动的平均年龄和体重。
结果应为:
| | 年龄 | 体重 |
| ---------------- | ---- | ------ |
| 打篮球 | 22.5 | 187.5 |
| 踢足球 | 27.5 | 175.0 |
| 踢足球 | 25.0 | 200.0 |
英文:
In pandas, I'd like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.
df = pd.DataFrame([
[0, 1, 0, 30, 150],
[1, 1, 1, 25, 200],
[1, 0, 0, 20, 175]
], columns=[
"Plays Basketball",
"Plays Soccer",
"Plays Football",
"Age",
"Weight"
])
Plays Basketball | Plays Soccer | Plays Football | Age | Weight |
---|---|---|---|---|
0 | 1 | 0 | 30 | 150 |
1 | 1 | 1 | 25 | 200 |
1 | 0 | 0 | 20 | 175 |
I tried groupby
but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.
Result should be:
Age | Weight | |
---|---|---|
Plays Basketball | 22.5 | 187.5 |
Plays Soccer | 27.5 | 175.0 |
Plays Football | 25.0 | 200.0 |
答案1
得分: 5
使用 dot
乘积并通过计数进行归一化以获取均值:
df2 = df.filter(like='Plays')
out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)
输出:
Age Weight
Plays Basketball 22.5 187.5
Plays Soccer 27.5 175.0
Plays Football 25.0 200.0
英文:
Use a dot
product and normalize by the count to get the mean:
df2 = df.filter(like='Plays')
out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)
Output:
Age Weight
Plays Basketball 22.5 187.5
Plays Soccer 27.5 175.0
Plays Football 25.0 200.0
答案2
得分: 1
你可以为要总结的每一列使用一个groupby:
import pandas as pd
indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
average = df.groupby(indicator).mean()
rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)
英文:
You could use one groupby for each column you want to summarize:
import pandas as pd
indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
average = df.groupby(indicator).mean()
rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论