在带有重复值的轴上重新索引数据框。

huangapple go评论97阅读模式
英文:

reindex df on axis with duplicates

问题

我有以下的数据框:

df = pd.DataFrame({"col1": [0,0,0,1,1,1], "col2": ["a", "b", "c", "D", "E", "F"] })

我想将它转换为:

df_new = pd.DataFrame({"col1": [0,1,0,1,0,1], "col2": ["a", "D", "b", "E", "c", "F"] })

我该如何实现这个目标?

到目前为止,我尝试了:

df.reset_index()

得到的结果是:

index  col1 col2
0      0     0    a
1      1     0    b
2      2     0    c
3      3     1    D
4      4     1    E
5      5     1    F

df_test.reindex()

但这并没有改变数据框。

英文:

I have the following df:

df = pd.DataFrame({"col1": [0,0,0,1,1,1], "col2": ["a", "b", "c", "D", "E", "F"] }) 

and want to transform it in

df_new = pd.DataFrame({"col1": [0,1,0,1,0,1], "col2": ["a", "D", "b", "E", "c", "F"] })

How can I achieve this?

So far I have tried

df.reset_index()

resulting in

index  col1 col2
0      0     0    a
1      1     0    b
2      2     0    c
3      3     1    D
4      4     1    E
5      5     1    F

and

df_test.reindex()

which did not change the df at all.

答案1

得分: 2

使用自定义排序方法,借助 grouby.cumcount 作为 key 来进行 sort_values,并采用稳定的排序方式:

df_new = df.sort_values(by='col1', key=lambda x: df.groupby(x).cumcount(),
                        kind='stable', ignore_index=True)

或者使用 numpy.argsort

df_new = df.iloc[np.argsort(df.groupby('col1').cumcount())]

输出结果:

   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

这个方法通过创建一个中间的排序键来重新索引行:

df.groupby('col1').cumcount()

0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64
英文:

Use a custom sorting with help of grouby.cumcount as key to sort_values and a stable sorting method:

df_new = df.sort_values(by='col1', key=lambda x: df.groupby(x).cumcount(),
                        kind='stable', ignore_index=True)

Or with numpy.argsort:

df_new = df.iloc[np.argsort(df.groupby('col1').cumcount())]

Output:

   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

This works by creating an intermediate sorter key that will be used to reindex your rows:

df.groupby('col1').cumcount()

0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64

答案2

得分: 2

使用 DataFrame.sort_valuesGroupBy.cumcount 来进行排序:

df = df.sort_values('col1', key=lambda x: df.groupby('col1').cumcount(), ignore_index=True)
print(df)
   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

工作原理

print(df.groupby('col1').cumcount())
0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64
英文:

Use DataFrame.sort_values with counter by GroupBy.cumcount:

df = df.sort_values('col1', key=lambda x: df.groupby('col1').cumcount(), ignore_index=True)
print (df)
   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

How it working:

print (df.groupby('col1').cumcount())
0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64

huangapple
  • 本文由 发表于 2023年3月15日 18:04:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743181.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定