2020年1月3日 18:10:11go评论104阅读模式

英文:

Pandas : change the index of the duplicates

问题

我有2个DataFrame：df0和df1，以及df1.shape[0] > df1.shape[0]。

df0和df1具有完全相同的列。
df0的大多数行都在df1中。

df0和df1的索引是

df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])

然后我创建了dft

dft = pd.concat([df0, df1], axis=0, sort=False)

并使用以下代码删除了重复的行

dft.drop_duplicates(subset='this_col_is_not_index', keep='first', inplace=True)

在dft的索引上我有一些重复的行。例如：

dft.loc3.shape
返回

(2, 38)

我的目标是将返回的第二行的索引更改为具有唯一索引3。
这第二行的索引应该是dft.index.sort_values()[-1]+1。

我想在所有重复的行上应用这个操作。

参考链接：

Python Pandas: 获取某列匹配特定值的行的索引

Pandas：获取重复的索引

重新定义Pandas DataFrame对象的索引

英文:

I have 2 DataFrames : df0 and df1 and df1.shape[0] > df1.shape[0].

df0 and df1 have the exact same columns.
Most of the rows of df0 are in df1.

The indices of df0 and df1 are

df0.index = range(df0.shape[0])
df1.index = range(df1.shape[0])

I then created dft

dft = pd.concat([df0, df1], axis=0, sort=False)

and removed duplicated rows with

dft.drop_duplicates(subset=&#39;this_col_is_not_index&#39;, keep=&#39;first&#39;, inplace=True)

I have some duplicates on the index of dft. For example :

dft.loc[3].shape

returns

(2, 38)

My aim is to change the index of the second row returned to have a unique index 3.
This second row should be indexed dft.index.sort_values()[-1]+1.

I would like to apply this operation on all duplicates.

References :

Python Pandas: Get index of rows which column matches certain value

Pandas: Get duplicated indexes

Redefining the Index in a Pandas DataFrame object

答案1

得分: 2

在concat中添加参数ignore_index=True以避免重复的索引值：

dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)

英文:

Add parameter ignore_index=True to concat for avoid duplicated index values:

dft = pd.concat([df0, df1], axis=0, sort=False, ignore_index=True)

答案2

得分: 1

Use reset_index(drop=True)

dft.reset_index(drop=True)

英文:

Use reset_index(drop = True)

dft.reset_index(drop=True)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pandas：更改重复项的索引

问题

答案1

答案2

为什么我的Azure函数处于只读模式？

如何使用Pandas的apply函数创建一个新的数据框？

如何在pandas中合并交叉表的类别，其中一些类别是共同的？

创建一个表格，返回多个变量每个值的计数。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。