2023年3月15日 18:04:32go评论127阅读模式

英文:

reindex df on axis with duplicates

问题

我有以下的数据框：

df = pd.DataFrame({"col1": [0,0,0,1,1,1], "col2": ["a", "b", "c", "D", "E", "F"] })

我想将它转换为：

df_new = pd.DataFrame({"col1": [0,1,0,1,0,1], "col2": ["a", "D", "b", "E", "c", "F"] })

我该如何实现这个目标？

到目前为止，我尝试了：

df.reset_index()

得到的结果是：

index  col1 col2
0      0     0    a
1      1     0    b
2      2     0    c
3      3     1    D
4      4     1    E
5      5     1    F

和

df_test.reindex()

但这并没有改变数据框。

英文:

I have the following df:

df = pd.DataFrame({&quot;col1&quot;: [0,0,0,1,1,1], &quot;col2&quot;: [&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;D&quot;, &quot;E&quot;, &quot;F&quot;] })

and want to transform it in

df_new = pd.DataFrame({&quot;col1&quot;: [0,1,0,1,0,1], &quot;col2&quot;: [&quot;a&quot;, &quot;D&quot;, &quot;b&quot;, &quot;E&quot;, &quot;c&quot;, &quot;F&quot;] })

How can I achieve this?

So far I have tried

df.reset_index()

resulting in

index  col1 col2
0      0     0    a
1      1     0    b
2      2     0    c
3      3     1    D
4      4     1    E
5      5     1    F

and

df_test.reindex()

which did not change the df at all.

答案1

得分: 2

使用自定义排序方法，借助 grouby.cumcount 作为 key 来进行 sort_values，并采用稳定的排序方式：

df_new = df.sort_values(by='col1', key=lambda x: df.groupby(x).cumcount(),
                        kind='stable', ignore_index=True)

或者使用 numpy.argsort：

df_new = df.iloc[np.argsort(df.groupby('col1').cumcount())]

输出结果：

   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

这个方法通过创建一个中间的排序键来重新索引行：

df.groupby('col1').cumcount()

0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64

英文:

Use a custom sorting with help of grouby.cumcount as key to sort_values and a stable sorting method:

df_new = df.sort_values(by=&#39;col1&#39;, key=lambda x: df.groupby(x).cumcount(),
                        kind=&#39;stable&#39;, ignore_index=True)

Or with numpy.argsort:

df_new = df.iloc[np.argsort(df.groupby(&#39;col1&#39;).cumcount())]

Output:

   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

This works by creating an intermediate sorter key that will be used to reindex your rows:

df.groupby(&#39;col1&#39;).cumcount()

0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64

答案2

得分: 2

使用 DataFrame.sort_values 与 GroupBy.cumcount 来进行排序：

df = df.sort_values('col1', key=lambda x: df.groupby('col1').cumcount(), ignore_index=True)
print(df)
   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

工作原理：

print(df.groupby('col1').cumcount())
0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64

英文:

Use DataFrame.sort_values with counter by GroupBy.cumcount:

df = df.sort_values(&#39;col1&#39;, key=lambda x: df.groupby(&#39;col1&#39;).cumcount(), ignore_index=True)
print (df)
   col1 col2
0     0    a
1     1    D
2     0    b
3     1    E
4     0    c
5     1    F

How it working:

print (df.groupby(&#39;col1&#39;).cumcount())
0    0
1    1
2    2
3    0
4    1
5    2
dtype: int64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在带有重复值的轴上重新索引数据框。

问题

答案1

答案2

Pandas 中根据动态值进行列搜索的向量化处理

永不结束的循环

用合成数据填充时间序列的Pandas数据框，使其形状与原始数据类似。

熊猫的合并在内存中出现了问题。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论