Python: 在 pandas 中使用分层结构按组获取值

huangapple go评论58阅读模式
英文:

Python: obtaining a value per group using a hierarchical structure in pandas

问题

我有以下数据:

df =
id interval name
1  8        A
1  8        B
1  8        C
2  2        B
2  2        C
2  3        A

我想要每个id和interval对应一个name。具体来说,我想要A在B和C之上,B在C之上。就像这样:

df =
id interval name
1  8        A
2  2        B
2  3        A
英文:

I have the following data:

df =
id interval name
1  8        A
1  8        B
1  8        C
2  2        B
2  2        C
2  3        A

I want to have a name per id and interval. In particular, I want A over B and C and B over C. Such as:

df =
id interval name
1  8        A
2  2        B
2  3        A

答案1

得分: 2

使用有序的 Categorical 结合 DataFrameGroupBy.idxmin 来按最小值的索引进行操作:

df['name'] = pd.Categorical(df['name'], categories=['A', 'B', 'C'], ordered=True)

df = df.loc[df.groupby(['id','interval'])['name'].idxmin()]

或者:

df['name'] = pd.Categorical(df['name'], categories=['A', 'B', 'C'], ordered=True)

df = df.sort_values('name').drop_duplicates(['id','interval']).sort_index(ignore_index=True)
print(df)
   id  interval name
0   1         8    A
1   2         2    B
2   2         3    A

另一个方法是使用由 enumerate 生成的字典进行映射:

d = {v: k for k, v in enumerate(['A', 'B', 'C'])}

df = df.loc[df['name'].map(d).groupby([df['id'], df['interval']]).idxmin()]
print(df)
   id  interval name
0   1         8    A
3   2         2    B
5   2         3    A
英文:

Use ordered Categorical with DataFrameGroupBy.idxmin for indexes by minimal values:

df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)

df = df.loc[df.groupby(['id','interval'])['name'].idxmin()]

Or:

df['name'] = pd.Categorical(df['name'], categories=['A','B','C'], ordered=True)

df = df.sort_values('name').drop_duplicates(['id','interval']).sort_index(ignore_index=True)
print (df)
   id  interval name
0   1         8    A
1   2         2    B
2   2         3    A

Another idea with mapping with dictionary generated by enumerate:

d = {v: k for k, v in enumerate(['A','B','C'])}

df = df.loc[df['name'].map(d).groupby([df['id'],df['interval']]).idxmin()]
print (df)
   id  interval name
0   1         8    A
3   2         2    B
5   2         3    A

huangapple
  • 本文由 发表于 2023年3月9日 19:53:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75684249.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定