英文:
get specific id after group by condition of time stamp difference
问题
在每个组(id)内,我试图计算时间差,并检查每个组是否满足ref1的时间戳 > ref2的时间戳的条件,如果满足,则将这些id存储在一个列表中。
在下面的数据框示例中,对于id -> a2,ref1 > ref2,因此输出列表应包含['a2']。
请帮助我实现这个结果。
数据框:
id reference timestamp
a1 ref1 2022-11-12 08:58:21
a1 ref2 2022-11-12 08:58:26
a1 ref3 2022-11-12 08:58:45
a2 ref2 2022-11-12 08:58:21
a2 ref1 2022-11-12 08:58:45
a3 ref2 2022-11-12 08:58:21
a2 ref3 2022-11-12 08:58:45
数据框代码:
import pandas as pd
# 初始化数据列表
data = [['a1', 'ref1', '2022-11-12 08:58:21'],
['a1', 'ref2', '2022-11-12 08:58:26'],
['a1', 'ref3', '2022-11-12 08:58:45'],
['a2', 'ref2', '2022-11-12 08:58:21'],
['a2', 'ref2', '2022-11-12 08:58:40'],
['a2', 'ref1', '2022-11-12 08:58:45'],
['a3', 'ref2', '2022-11-12 08:58:21'],
['a2', 'ref3', '2022-11-12 08:58:45']]
# 创建pandas数据框
df = pd.DataFrame(data, columns=['id', 'reference', 'timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df
英文:
I am trying to take a time difference with in each group(id) and check for each group if ref1's timestamp > ref2's timestamp if true, then store those id's in a list.
In the data frame example below for id -> a2,the ref1> ref2 hence the output list should contain ['a2']
Please help me in achieving the result
dataframe:
id reference timestamp
a1 ref1 2022-11-12 08:58:21
a1 ref2 2022-11-12 08:58:26
a1 ref3 2022-11-12 08:58:45
a2 ref2 2022-11-12 08:58:21
a2 ref1 2022-11-12 08:58:45
a3 ref2 2022-11-12 08:58:21
a2 ref3 2022-11-12 08:58:45
Dataframe code:
import pandas as pd
# initialize list of lists
data = [['a1','ref1', '2022-11-12 08:58:21'],['a1', 'ref2','2022-11-12 08:58:26'], ['a1', 'ref3','2022-11-12 08:58:45'],['a2', 'ref2','2022-11-12 08:58:21'],['a2', 'ref2','2022-11-12 08:58:40'], ['a2', 'ref1','2022-11-12 08:58:45'], ['a3','ref2', '2022-11-12 08:58:21'],['a2', 'ref3','2022-11-12 08:58:45']]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns=['id', 'reference','timestamp'])
df['timestamp'] = pd.to_datetime(df.timestamp)
df
答案1
得分: 0
你可以使用pandas的索引功能:
tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])
out = tmp.loc['ref1'] - tmp.loc['ref2']
out = set(out.index[out.gt('0')])
注意:如果对于其中一个引用有多个相同的日期,它将计算所有的组合。在这里,集合的作用类似于“任何”。
输出:
{'a2'}
或者:
out = (tmp.loc['ref1'] - tmp.loc['ref2']
).loc[lambda x: x.gt('0')].index.unique().tolist()
输出:['a2']
英文:
You can use pandas indexing:
tmp = pd.to_datetime(df.set_index(['reference', 'id'])['timestamp'])
out = tmp.loc['ref1'] - tmp.loc['ref2']
out = set(out.index[out.gt('0')])
NB. If you have several identical dates for one of the references, it will compute all combinations. Here the set acts as a any
.
Output:
{'a2'}
Or:
out = (tmp.loc['ref1'] - tmp.loc['ref2']
).loc[lambda x: x.gt('0')].index.unique().tolist()
Output: ['a2']
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论