英文:
Displaying duplicates in pandas
问题
我想展示数据框的重复行,以便更好地理解。我想要按重复行进行分组。
这个示例希望能够澄清我的意图。假设我们有以下数据框:
CC BF FA WC Strength
1 2 3 4 1
2 3 4 5 6
1 2 3 4 8
1 2 3 4 4
2 3 4 5 7
在去除 Strength 列后,行1,3,4和行2,5是重复的。我想要获得一个新的数据框,显示如下:
CC BF FA WC Strength_min Strength_max Count
1 2 3 4 1 8 3
2 3 4 5 6 7 2
英文:
I would like to display the duplicates of a dataframe in order to get a better understanding. I would like to groupby the duplicated rows
This example hopefully clarifies what I want to do. Assume we have given the dataframe below
CC BF FA WC Strength
1 2 3 4 1
2 3 4 5 6
1 2 3 4 8
1 2 3 4 4
2 3 4 5 7
Here rows 1,3,4 and 2,5 are duplicates after removing Strength. I would like to get a new dataframe that displays
CC BF FA WC Strength_min Strength_max Count
1 2 3 4 1 8 3
2 3 4 5 6 7 2
答案1
得分: 4
你需要一个自定义的 groupby.agg
,其中使用 Index.difference
的输出作为分组依据:
(df.groupby(list(df.columns.difference(['Strength'], sort=False)))[['Strength']]
.agg({'Strength_min': 'min', 'Strength_max': 'max', 'Count': 'count'})
.reset_index()
)
输出:
CC BF FA WC Strength_min Strength_max Count
0 1 2 3 4 1 8 3
1 2 3 4 5 6 7 2
英文:
You need a custom groupby.agg
with the output from Index.difference
as grouper:
(df.groupby(list(df.columns.difference(['Strength'], sort=False)))['Strength']
.agg(**{'Strength_min': 'min', 'Strength_max': 'max', 'Count': 'count'})
.reset_index()
)
Output:
CC BF FA WC Strength_min Strength_max Count
0 1 2 3 4 1 8 3
1 2 3 4 5 6 7 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论