问题

我有以下的数据框：

index	errorId	start	end	timestamp	uniqueId
0	1404	2022-04-25 02:10:41	2022-04-25 02:10:46	2022-04-25	1404_2022-04-25
1	1302	2022-04-25 02:10:41	2022-04-25 02:10:46	2022-04-25	1302_2022-04-25
2	1404	2022-04-27 12:54:46	2022-04-27 12:54:51	2022-04-25	1404_2022-04-25
3	1302	2022-04-27 13:34:43	2022-04-27 13:34:50	2022-04-25	1302_2022-04-25
4	1404	2022-04-29 04:30:22	2022-04-29 04:30:29	2022-04-25	1404_2022-04-25
5	1302	2022-04-29 08:26:25	2022-04-29 08:26:32	2022-04-25	1302_2022-04-25

uniqueId是从列errorId和uniqueId组合而成的。应该检查列'uniqueID'是否包含重复值。如果是这样，应该选择首次出现的行。在示例中，对于errorId 1404，它将是索引0的列。然后，应该将列'end'中的值覆盖为最后一次出现的值。在此示例中，是索引4的位置。

对于errorId 1302也是一样的。

最后的结果应该如下所示：

index	errorId	start	end	timestamp	uniqueId
0	1404	2022-04-25 02:10:41	2022-04-29 04:30:29	2022-04-25	1404_2022-04-25
1	1302	2022-04-25 02:10:41	2022-04-29 08:26:32	2022-04-25	1302_2022-04-25

英文:

I have the following Dataframe:

index	errorId	start	end	timestamp	uniqueId
0	1404	2022-04-25 02:10:41	2022-04-25 02:10:46	2022-04-25	1404_2022-04-25
1	1302	2022-04-25 02:10:41	2022-04-25 02:10:46	2022-04-25	1302_2022-04-25
2	1404	2022-04-27 12:54:46	2022-04-27 12:54:51	2022-04-25	1404_2022-04-25
3	1302	2022-04-27 13:34:43	2022-04-27 13:34:50	2022-04-25	1302_2022-04-25
4	1404	2022-04-29 04:30:22	2022-04-29 04:30:29	2022-04-25	1404_2022-04-25
5	1302	2022-04-29 08:26:25	2022-04-29 08:26:32	2022-04-25	1302_2022-04-25

The unique_ID is a combination from the column errorId and uniqueId.
It should be checked whether the column 'uniqueID' contains a duplicate value. If this is the case, the row should be taken where it appears for the first time. In the example for errorId 1404, it would be the column at index 0. Afterwards, the value in the column 'end' should be overwritten with the value where it appears for the last time. In the example here, at index 4.<br>
The same for errorId 1302

In the End it should look like this:

index	errorId	start	end	timestamp	uniqueId
0	1404	2022-04-25 02:10:41	2022-04-29 04:30:29	2022-04-25	1404_2022-04-25
1	1302	2022-04-25 02:10:41	2022-04-29 08:26:32	2022-04-25	1302_2022-04-25

答案1

得分: 2

我认为您需要对3列进行min和max的聚合，并使用命名聚合按照原始列的顺序进行，就像使用DataFrame.reindex一样：

df1 = (df.groupby(['errorId','timestamp','uniqueId'], as_index=False, sort=False)
         .agg(start=('start','min'), end=('end','max'))
         .reindex(df.columns, axis=1))

或者如果日期时间已按组排序，可以通过first和last进行聚合以获得相同的输出：

df2 = (df.groupby(['errorId','timestamp','uniqueId'], as_index=False, sort=False)
         .agg(start=('start','first'), end=('end','last'))
         .reindex(df.columns, axis=1))

英文:

I think you need aggragate min and max per 3 columns with named aggregation, last for same order of columns like original add DataFrame.reindex:

df1 = (df.groupby([&#39;errorId&#39;,&#39;timestamp&#39;,&#39;uniqueId&#39;], as_index=False, sort=False)
         .agg(start=(&#39;start&#39;,&#39;min&#39;), end=(&#39;end&#39;,&#39;max&#39;))
         .reindex(df.columns, axis=1))

Or aggregate by first and last, if datetimes are sorted per groups get same ouput:

df2 = (df.groupby([&#39;errorId&#39;,&#39;timestamp&#39;,&#39;uniqueId&#39;], as_index=False, sort=False)
         .agg(start=(&#39;start&#39;,&#39;first&#39;), end=(&#39;end&#39;,&#39;last&#39;))
         .reindex(df.columns, axis=1))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取数据框中的重复行并覆盖它们 Python

问题

答案1

如何使def函数在if语句中工作？

Go语言可以像Python一样对字符串进行乘法运算吗？

如何在单个数据集中将多个列进行 “left_join” 合并为一列？

调整每月条形图的标签

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论