按时间戳差异分组条件获取特定ID。

huangapple go评论64阅读模式
英文:

get specific id after group by condition of time stamp difference

问题

在每个组(id)内,我试图计算时间差,并检查每个组是否满足ref1的时间戳 > ref2的时间戳的条件,如果满足,则将这些id存储在一个列表中。

在下面的数据框示例中,对于id -> a2,ref1 > ref2,因此输出列表应包含['a2']。

请帮助我实现这个结果。

数据框:

    id	    reference  timestamp
    a1	ref1	2022-11-12 08:58:21
    a1	ref2	2022-11-12 08:58:26
    a1	ref3	2022-11-12 08:58:45
    a2	ref2	2022-11-12 08:58:21
    a2	ref1	2022-11-12 08:58:45
    a3	ref2	2022-11-12 08:58:21
    a2	ref3	2022-11-12 08:58:45

数据框代码:

import pandas as pd
  
# 初始化数据列表
data = [['a1', 'ref1', '2022-11-12 08:58:21'],
        ['a1', 'ref2', '2022-11-12 08:58:26'],
        ['a1', 'ref3', '2022-11-12 08:58:45'],
        ['a2', 'ref2', '2022-11-12 08:58:21'],
        ['a2', 'ref2', '2022-11-12 08:58:40'],
        ['a2', 'ref1', '2022-11-12 08:58:45'],
        ['a3', 'ref2', '2022-11-12 08:58:21'],
        ['a2', 'ref3', '2022-11-12 08:58:45']]
  
# 创建pandas数据框
df = pd.DataFrame(data, columns=['id', 'reference', 'timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df
英文:

I am trying to take a time difference with in each group(id) and check for each group if ref1's timestamp > ref2's timestamp if true, then store those id's in a list.

In the data frame example below for id -> a2,the ref1> ref2 hence the output list should contain ['a2']

Please help me in achieving the result

dataframe:

    id	    reference  timestamp
    a1	ref1	2022-11-12 08:58:21
    a1	ref2	2022-11-12 08:58:26
    a1	ref3	2022-11-12 08:58:45
    a2	ref2	2022-11-12 08:58:21
    a2	ref1	2022-11-12 08:58:45
    a3	ref2	2022-11-12 08:58:21
    a2	ref3	2022-11-12 08:58:45

Dataframe code:

import pandas as pd
  
# initialize list of lists
data = [['a1','ref1', '2022-11-12 08:58:21'],['a1', 'ref2','2022-11-12 08:58:26'], ['a1', 'ref3','2022-11-12 08:58:45'],['a2', 'ref2','2022-11-12 08:58:21'],['a2', 'ref2','2022-11-12 08:58:40'], ['a2', 'ref1','2022-11-12 08:58:45'], ['a3','ref2', '2022-11-12 08:58:21'],['a2', 'ref3','2022-11-12 08:58:45']]
  
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'reference','timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df

答案1

得分: 0

你可以使用pandas的索引功能:

tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])

out = tmp.loc['ref1'] - tmp.loc['ref2']

out = set(out.index[out.gt('0')])

注意:如果对于其中一个引用有多个相同的日期,它将计算所有的组合。在这里,集合的作用类似于“任何”。

输出:

{'a2'}

或者:

out = (tmp.loc['ref1'] - tmp.loc['ref2']
      ).loc[lambda x: x.gt('0')].index.unique().tolist()

输出:['a2']

英文:

You can use pandas indexing:

tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])

out = tmp.loc['ref1'] - tmp.loc['ref2']

out = set(out.index[out.gt('0')])

NB. If you have several identical dates for one of the references, it will compute all combinations. Here the set acts as a any.

Output:

{'a2'}

Or:

out = (tmp.loc['ref1'] - tmp.loc['ref2']
      ).loc[lambda x: x.gt('0')].index.unique().tolist()

Output: ['a2']

huangapple
  • 本文由 发表于 2023年7月11日 13:50:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76659002.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定