2023年3月4日 00:52:29go评论61阅读模式

英文:

Pandas Average If Across Multiple Columns

问题

# 在 pandas 中，我想要计算每种运动中参与者的平均年龄和体重。我知道我可以使用循环，但想知道最高效的方法是什么。

df = pd.DataFrame([
    [0, 1, 0, 30, 150],
    [1, 1, 1, 25, 200],
    [1, 0, 0, 20, 175]
], columns=[
    "Plays Basketball",
    "Plays Soccer",
    "Plays Football",
    "Age",
    "Weight"
])

# 我尝试过使用 `groupby`，但它会为每种可能的运动组合创建一个分组。我只需要每种运动的平均年龄和体重。

结果应为：

|                  | 年龄  | 体重 |
| ---------------- | ---- | ------ |
| 打篮球 | 22.5 | 187.5  |
| 踢足球     | 27.5 | 175.0  |
| 踢足球   | 25.0 | 200.0  |

英文:

In pandas, I'd like to calculate the average age and weight for people playing each sport. I know I can loop, but was wondering what the most efficient way is.

df = pd.DataFrame([
    [0, 1, 0, 30, 150],
    [1, 1, 1, 25, 200],
    [1, 0, 0, 20, 175]
], columns=[
    &quot;Plays Basketball&quot;,
    &quot;Plays Soccer&quot;,
    &quot;Plays Football&quot;,
    &quot;Age&quot;,
    &quot;Weight&quot;
])

Plays Basketball	Plays Soccer	Plays Football	Age	Weight
0	1	0	30	150
1	1	1	25	200
1	0	0	20	175

I tried groupby but it creates a group for every possible combination of sports played. I just need an average age and weight for each sport.

Result should be:

	Age	Weight
Plays Basketball	22.5	187.5
Plays Soccer	27.5	175.0
Plays Football	25.0	200.0

答案1

得分: 5

使用 dot 乘积并通过计数进行归一化以获取均值：

df2 = df.filter(like='Plays')

out = df2.T.dot(df[['Age', 'Weight']]).div(df2.sum(), axis=0)

输出：

                   Age  Weight
Plays Basketball  22.5   187.5
Plays Soccer      27.5   175.0
Plays Football    25.0   200.0

英文:

Use a dot product and normalize by the count to get the mean:

df2 = df.filter(like=&#39;Plays&#39;)

out = df2.T.dot(df[[&#39;Age&#39;, &#39;Weight&#39;]]).div(df2.sum(), axis=0)

Output:

                   Age  Weight
Plays Basketball  22.5   187.5
Plays Soccer      27.5   175.0
Plays Football    25.0   200.0

答案2

得分: 1

你可以为要总结的每一列使用一个groupby：

import pandas as pd

indicators = ["Plays Basketball", "Plays Soccer", "Plays Football"]
rows = []
keep_cols = ["Age", "Weight"]
for indicator in indicators:
    average = df.groupby(indicator).mean()
    rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)

英文:

You could use one groupby for each column you want to summarize:

import pandas as pd

indicators = [&quot;Plays Basketball&quot;, &quot;Plays Soccer&quot;, &quot;Plays Football&quot;]
rows = []
keep_cols = [&quot;Age&quot;, &quot;Weight&quot;]
for indicator in indicators:
    average = df.groupby(indicator).mean()
    rows.append(average.loc[1][keep_cols].rename(indicator))
output = pd.DataFrame(rows)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas多列条件下的平均值

问题

答案1

答案2

如何创建一个交互式窗口，其中显示图像变化？

创建一组字典时出现错误。

创建一个基于其他列计数的新列。

有没有一种简单的 “类似tqdm” 的方法来使for循环运行多进程？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论