英文:
reindex df on axis with duplicates
问题
我有以下的数据框:
df = pd.DataFrame({"col1": [0,0,0,1,1,1], "col2": ["a", "b", "c", "D", "E", "F"] })
我想将它转换为:
df_new = pd.DataFrame({"col1": [0,1,0,1,0,1], "col2": ["a", "D", "b", "E", "c", "F"] })
我该如何实现这个目标?
到目前为止,我尝试了:
df.reset_index()
得到的结果是:
index col1 col2
0 0 0 a
1 1 0 b
2 2 0 c
3 3 1 D
4 4 1 E
5 5 1 F
和
df_test.reindex()
但这并没有改变数据框。
英文:
I have the following df:
df = pd.DataFrame({"col1": [0,0,0,1,1,1], "col2": ["a", "b", "c", "D", "E", "F"] })
and want to transform it in
df_new = pd.DataFrame({"col1": [0,1,0,1,0,1], "col2": ["a", "D", "b", "E", "c", "F"] })
How can I achieve this?
So far I have tried
df.reset_index()
resulting in
index col1 col2
0 0 0 a
1 1 0 b
2 2 0 c
3 3 1 D
4 4 1 E
5 5 1 F
and
df_test.reindex()
which did not change the df at all.
答案1
得分: 2
使用自定义排序方法,借助 grouby.cumcount
作为 key
来进行 sort_values
,并采用稳定的排序方式:
df_new = df.sort_values(by='col1', key=lambda x: df.groupby(x).cumcount(),
kind='stable', ignore_index=True)
或者使用 numpy.argsort
:
df_new = df.iloc[np.argsort(df.groupby('col1').cumcount())]
输出结果:
col1 col2
0 0 a
1 1 D
2 0 b
3 1 E
4 0 c
5 1 F
这个方法通过创建一个中间的排序键来重新索引行:
df.groupby('col1').cumcount()
0 0
1 1
2 2
3 0
4 1
5 2
dtype: int64
英文:
Use a custom sorting with help of grouby.cumcount
as key
to sort_values
and a stable sorting method:
df_new = df.sort_values(by='col1', key=lambda x: df.groupby(x).cumcount(),
kind='stable', ignore_index=True)
Or with numpy.argsort
:
df_new = df.iloc[np.argsort(df.groupby('col1').cumcount())]
Output:
col1 col2
0 0 a
1 1 D
2 0 b
3 1 E
4 0 c
5 1 F
This works by creating an intermediate sorter key that will be used to reindex your rows:
df.groupby('col1').cumcount()
0 0
1 1
2 2
3 0
4 1
5 2
dtype: int64
答案2
得分: 2
使用 DataFrame.sort_values
与 GroupBy.cumcount
来进行排序:
df = df.sort_values('col1', key=lambda x: df.groupby('col1').cumcount(), ignore_index=True)
print(df)
col1 col2
0 0 a
1 1 D
2 0 b
3 1 E
4 0 c
5 1 F
工作原理:
print(df.groupby('col1').cumcount())
0 0
1 1
2 2
3 0
4 1
5 2
dtype: int64
英文:
Use DataFrame.sort_values
with counter by GroupBy.cumcount
:
df = df.sort_values('col1', key=lambda x: df.groupby('col1').cumcount(), ignore_index=True)
print (df)
col1 col2
0 0 a
1 1 D
2 0 b
3 1 E
4 0 c
5 1 F
How it working:
print (df.groupby('col1').cumcount())
0 0
1 1
2 2
3 0
4 1
5 2
dtype: int64
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论