英文:
Python: obtaining a value per group using a hierarchical structure in pandas
问题
我有以下数据:
df =
id interval name
1 8 A
1 8 B
1 8 C
2 2 B
2 2 C
2 3 A
我想要每个id和interval对应一个name。具体来说,我想要A在B和C之上,B在C之上。就像这样:
df =
id interval name
1 8 A
2 2 B
2 3 A
英文:
I have the following data:
df =
id interval name
1 8 A
1 8 B
1 8 C
2 2 B
2 2 C
2 3 A
I want to have a name per id and interval. In particular, I want A over B and C and B over C. Such as:
df =
id interval name
1 8 A
2 2 B
2 3 A
答案1
得分: 2
使用有序的 Categorical
结合 DataFrameGroupBy.idxmin
来按最小值的索引进行操作:
df['name'] = pd.Categorical(df['name'], categories=['A', 'B', 'C'], ordered=True)
df = df.loc[df.groupby(['id','interval'])['name'].idxmin()]
或者:
df['name'] = pd.Categorical(df['name'], categories=['A', 'B', 'C'], ordered=True)
df = df.sort_values('name').drop_duplicates(['id','interval']).sort_index(ignore_index=True)
print(df)
id interval name
0 1 8 A
1 2 2 B
2 2 3 A
另一个方法是使用由 enumerate
生成的字典进行映射:
d = {v: k for k, v in enumerate(['A', 'B', 'C'])}
df = df.loc[df['name'].map(d).groupby([df['id'], df['interval']]).idxmin()]
print(df)
id interval name
0 1 8 A
3 2 2 B
5 2 3 A
英文:
Use ordered Categorical
with DataFrameGroupBy.idxmin
for indexes by minimal values:
df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)
df = df.loc[df.groupby(['id','interval'])['name'].idxmin()]
Or:
df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)
df = df.sort_values('name').drop_duplicates(['id','interval']).sort_index(ignore_index=True)
print (df)
id interval name
0 1 8 A
1 2 2 B
2 2 3 A
Another idea with mapping with dictionary generated by enumerate
:
d = {v: k for k, v in enumerate(['A','B','C'])}
df = df.loc[df['name'].map(d).groupby([df['id'],df['interval']]).idxmin()]
print (df)
id interval name
0 1 8 A
3 2 2 B
5 2 3 A
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论