英文:
How to compare two data frames and keep rows of the left one based on common values and a time diff in pandas?
问题
我可以帮你将这段文本翻译成代码部分,请参考以下内容:
import pandas as pd
import datetime
# 假设当前时间是 '2023-07-04 23:36:38'
current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)
saved_info = [
(datetime.datetime(2023, 7, 4, 23, 18, 22, 113476), 't55643', 'ab$ff$55'),
(datetime.datetime(2023, 7, 4, 23, 26, 22, 113476), 't55643', '5b$ff$15'),
(datetime.datetime(2023, 7, 4, 23, 27, 22, 133470), 't55643', 'ab$ff$55')
]
new_info = [
('t55643', 'ab$ff$55', 44),
('t55643', 'be$qf$34', 33)
]
df1 = pd.DataFrame(new_info, columns=["tid", "cid", "val"])
df2 = pd.DataFrame(saved_info, columns=["modified_at", "tid", "cid"])
# 计算时间差
df2['time_diff'] = (current_time - df2['modified_at']).dt.total_seconds() / 60
# 合并两个DataFrame
merged_df = df1.merge(df2, on='cid', how='inner')
# 筛选满足条件的行
df1_final = merged_df[(merged_df['time_diff'] > 15)]
df1_final = df1_final[['tid_x', 'cid', 'val']]
df1_final.columns = ['tid', 'cid', 'val']
print(df1_final)
这段代码首先计算了两个数据帧的时间差,然后将它们合并,并根据条件筛选出满足条件的行,最后输出结果。希望这有助于你完成你的任务。
英文:
I have two data that looks like below
saved_info = [datetime.datetime(2023, 7, 4, 23, 18, 22, 113476), 't55643', 'ab$ff$55'),
datetime.datetime(2023, 7, 4, 23, 26, 22, 113476), 't55643', '5b$ff$15'),
datetime.datetime(2023, 7, 4, 23, 27, 22, 133470), 't55643', 'ab$ff$55')
]
new_info = [('t55643', 'ab$ff$55', 44),
('t55643', 'be$qf$34', 33)
]
I load them into pandas as follows
df1 = pd.DataFrame(new_info)
df1.columns = ["tid", "cid", "val"]
df2 = pd.DataFrame(saved_info)
df2.columns = ["modified_at", "tid", "cid"]
So the data frames look like below
df1
tid cid val
0 t55643 ab$ff$55 44
1 t55643 be$qf$34 33
df2
modified_at tid cid
0 datetime.datetime(2023, 7, 4, 23, 18, 22, 113476) t55643 ab$ff$55
1 datetime.datetime(2023, 7, 4, 23, 26, 22, 112471) t55643 5b$ff$15
2 datetime.datetime(2023, 7, 4, 23, 27, 22, 133470) t55643 ab$ff$55
Now I want to get rows from df1
that have common cid
value with df2
and modified_at
value of df2
should be greater than 15mins
So lets say datetime right now is 2023-07-04 23:36:38
So accordingly the final result of df1
should be
df1 (final)
tid cid val
0 t55643 ab$ff$55 44
As you can see the cid
value of first row of df1
matches with the first row of df2
and also the time diff of modified_at
value of first row of df2
with current time is greater than 15 mins
.
Now I can get rows of df1
that share common value on cid
column with df2
by doing something like below
common = df1.merge(df2, on=['cid'])
df1_final = df1[(df1.cid.isin(common.cid))]
For comparing time diff between rows of two data frames, I found a stackoverflow answer https://stackoverflow.com/a/46966942/5550284
But in my case I need to check a column value against the current UTC time and furthermore I don't know how do I chain these two conditions together.
Can someone help me?
答案1
得分: 2
不需要在这里使用merge
,只需保留差异大于15分钟的行:
current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)
cond = current_time - df2['modified_at'] > '15m'
out = df1[df1['cid'].isin(df2.loc[cond, 'cid'])]
输出:
>>> out
tid cid val
0 t55643 ab$ff$55 44
>>> cond
0 True
1 False
2 False
Name: modified_at, dtype: bool
英文:
You don't need merge
here, just keep rows where diff is greater than 15 minutes:
current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)
cond = current_time - df2['modified_at'] > '15m'
out = df1[df1['cid'].isin(df2.loc[cond, 'cid'])]
Output:
>>> out
tid cid val
0 t55643 ab$ff$55 44
>>> cond
0 True
1 False
2 False
Name: modified_at, dtype: bool
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论