2023年7月11日 13:50:33go评论161阅读模式

英文:

get specific id after group by condition of time stamp difference

问题

在每个组（id）内，我试图计算时间差，并检查每个组是否满足ref1的时间戳 > ref2的时间戳的条件，如果满足，则将这些id存储在一个列表中。

在下面的数据框示例中，对于id -> a2，ref1 > ref2，因此输出列表应包含['a2']。

请帮助我实现这个结果。

数据框：

    id	    reference  timestamp
    a1	ref1	2022-11-12 08:58:21
    a1	ref2	2022-11-12 08:58:26
    a1	ref3	2022-11-12 08:58:45
    a2	ref2	2022-11-12 08:58:21
    a2	ref1	2022-11-12 08:58:45
    a3	ref2	2022-11-12 08:58:21
    a2	ref3	2022-11-12 08:58:45

数据框代码：

import pandas as pd
  
# 初始化数据列表
data = [['a1', 'ref1', '2022-11-12 08:58:21'],
        ['a1', 'ref2', '2022-11-12 08:58:26'],
        ['a1', 'ref3', '2022-11-12 08:58:45'],
        ['a2', 'ref2', '2022-11-12 08:58:21'],
        ['a2', 'ref2', '2022-11-12 08:58:40'],
        ['a2', 'ref1', '2022-11-12 08:58:45'],
        ['a3', 'ref2', '2022-11-12 08:58:21'],
        ['a2', 'ref3', '2022-11-12 08:58:45']]
  
# 创建pandas数据框
df = pd.DataFrame(data, columns=['id', 'reference', 'timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df

英文:

I am trying to take a time difference with in each group(id) and check for each group if ref1's timestamp > ref2's timestamp if true, then store those id's in a list.

In the data frame example below for id -> a2,the ref1> ref2 hence the output list should contain ['a2']

Please help me in achieving the result

dataframe:

    id	    reference  timestamp
    a1	ref1	2022-11-12 08:58:21
    a1	ref2	2022-11-12 08:58:26
    a1	ref3	2022-11-12 08:58:45
    a2	ref2	2022-11-12 08:58:21
    a2	ref1	2022-11-12 08:58:45
    a3	ref2	2022-11-12 08:58:21
    a2	ref3	2022-11-12 08:58:45

Dataframe code:

import pandas as pd
  
# initialize list of lists
data = [[&#39;a1&#39;,&#39;ref1&#39;, &#39;2022-11-12 08:58:21&#39;],[&#39;a1&#39;, &#39;ref2&#39;,&#39;2022-11-12 08:58:26&#39;], [&#39;a1&#39;, &#39;ref3&#39;,&#39;2022-11-12 08:58:45&#39;],[&#39;a2&#39;, &#39;ref2&#39;,&#39;2022-11-12 08:58:21&#39;],[&#39;a2&#39;, &#39;ref2&#39;,&#39;2022-11-12 08:58:40&#39;], [&#39;a2&#39;, &#39;ref1&#39;,&#39;2022-11-12 08:58:45&#39;], [&#39;a3&#39;,&#39;ref2&#39;, &#39;2022-11-12 08:58:21&#39;],[&#39;a2&#39;, &#39;ref3&#39;,&#39;2022-11-12 08:58:45&#39;]]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=[&#39;id&#39;, &#39;reference&#39;,&#39;timestamp&#39;])
df[&#39;timestamp&#39;] = pd.to_datetime(df.timestamp)
df

答案1

得分: 0

你可以使用pandas的索引功能：

tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])

out = tmp.loc['ref1'] - tmp.loc['ref2']

out = set(out.index[out.gt('0')])

注意：如果对于其中一个引用有多个相同的日期，它将计算所有的组合。在这里，集合的作用类似于“任何”。

输出：

{'a2'}

或者：

out = (tmp.loc['ref1'] - tmp.loc['ref2']
      ).loc[lambda x: x.gt('0')].index.unique().tolist()

输出：['a2']

英文:

You can use pandas indexing:

tmp = pd.to_datetime(df.set_index([&#39;reference&#39;, &#39;id&#39;])[&#39;timestamp&#39;])

out = tmp.loc[&#39;ref1&#39;] - tmp.loc[&#39;ref2&#39;]

out = set(out.index[out.gt(&#39;0&#39;)])

NB. If you have several identical dates for one of the references, it will compute all combinations. Here the set acts as a any.

Output:

{&#39;a2&#39;}

Or:

out = (tmp.loc[&#39;ref1&#39;] - tmp.loc[&#39;ref2&#39;]
      ).loc[lambda x: x.gt(&#39;0&#39;)].index.unique().tolist()

Output: ['a2']

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按时间戳差异分组条件获取特定ID。

问题

答案1

Polars – Groupby_rolling – 如何每天重置计数？

How does one "click" a subsidiary URL in Python, scrape that URL, and append those scraped data to the output of parent file?

循环导入与 Beanie ODM

使用自定义编码器压缩Pydantic模型字典。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论