英文:
Groupby Multi-Column Sort based on hierarchy
问题
我正在尝试按列的层次结构/顺序对groupby的结果进行排序。
数据框示例:
df = pd.DataFrame({'A':['A1','A1','A1','A2','A2','A2','A1','A1'],
'B':['B2','B1','B1','B1','B1','B2','B2','B2'],
'C':['C1','C3','C1','C3','C2','C2','C1','C3'],
'V':[100,240,110,120,50,60,40,40]})
接着进行groupby操作:
df.groupby(['A','B','C']).agg({'V':'sum'}).sort_values(by=['A','V','B','V'], ascending=[False,False,False,False])
输出结果:
V
A B C
A2 B1 C3 120
B2 C2 60
B1 C2 50
A1 B1 C3 240
B2 C1 140
B1 C1 110
B2 C3 40
然而,我想要的是首先根据A1的总值(与B和C的值无关)对A1进行排序,然后根据B的总值对B进行排序。
因此,输出应该类似于这样:
V
A B C
A1 B1 C3 240
B1 C1 110
B2 C1 140
B2 C3 40
A2 B1 C3 120
B1 C2 50
B2 C2 60
请让我知道是否我解释得清楚。
英文:
I am trying to sort outcome of a groupby by the hierarchy / order of the columns.
Example of dataframe:
df = pd.DataFrame({'A':['A1','A1','A1','A2','A2','A2','A1','A1'],
'B':['B2','B1','B1','B1','B1','B2','B2','B2'],
'C':['C1','C3','C1','C3','C2','C2','C1','C3'],
'V':[100,240,110,120,50,60,40,40]})
Followed by the groupby operation:
df.groupby(['A','B','C']).agg({'V':'sum'}).sort_values(by=['A','V','B','V'],ascending=[False,False,False,False])
Output:
V
A B C
A2 B1 C3 120
B2 C2 60
B1 C2 50
A1 B1 C3 240
B2 C1 140
B1 C1 110
B2 C3 40
What I want however is A1 should be sorted first based on it's own total value (irrespective of B & C values), followed by B based on it's own total.
So the output should look something like this:
V
A B C
A1 B1 C3 240
B1 C1 110
B2 C1 140
B2 C3 40
A2 B1 C3 120
B1 C2 50
B2 C2 60
Let me know if I could explain it well.
答案1
得分: 1
df = pd.DataFrame({'A':['A1','A1','A1','A2','A2','A2','A1','A1'],
'B':['B2','B1','B1','B1','B1','B2','B2','B2'],
'C':['C1','C3','C1','C3','C2','C2','C1','C3'],
'V':[100,240,110,120,50,60,40,40]})
aa = df.groupby(['A','B','C']).sum()
print(aa)
"""
V
A B C
A1 B1 C1 110
C3 240
B2 C1 140
C3 40
A2 B1 C2 50
C3 120
B2 C2 60
"""
t1= (
df.groupby(['A','B','C']).sum().sort_values(['V','C'],ascending=False)
.sort_index(level=[0,1],sort_remaining = False)
)
print(t1)
"""
V
A B C
A1 B1 C3 240
C1 110
B2 C1 140
C3 40
A2 B1 C3 120
C2 50
B2 C2 60
"""
英文:
df = pd.DataFrame({'A':['A1','A1','A1','A2','A2','A2','A1','A1'],
'B':['B2','B1','B1','B1','B1','B2','B2','B2'],
'C':['C1','C3','C1','C3','C2','C2','C1','C3'],
'V':[100,240,110,120,50,60,40,40]})
aa = df.groupby(['A','B','C']).sum()
print(aa)
"""
V
A B C
A1 B1 C1 110
C3 240
B2 C1 140
C3 40
A2 B1 C2 50
C3 120
B2 C2 60
"""
t1= (
df.groupby(['A','B','C']).sum().sort_values(['V','C'],ascending=False)
.sort_index(level=[0,1],sort_remaining = False)
)
print(t1)
"""
V
A B C
A1 B1 C3 240
C1 110
B2 C1 140
C3 40
A2 B1 C3 120
C2 50
B2 C2 60
"""
答案2
得分: 0
IIUC,您希望同时按A
、B
和V
列进行排序
out = (df.groupby(['A','B','C']).agg({'V':'sum'})
.sort_values(by=['A','B','V'], ascending=[True,True,False]))
print(out)
V
A B C
A1 B1 C3 240
C1 110
B2 C1 140
C3 40
A2 B1 C3 120
C2 50
B2 C2 60
英文:
IIUC, you want to sort by A
, B
and V
column in the same time
out = (df.groupby(['A','B','C']).agg({'V':'sum'})
.sort_values(by=['A','B','V'], ascending=[True,True,False]))
print(out)
V
A B C
A1 B1 C3 240
C1 110
B2 C1 140
C3 40
A2 B1 C3 120
C2 50
B2 C2 60
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论