How to compare two data frames and keep rows of the left one based on common values and a time diff in pandas?

huangapple go评论73阅读模式
英文:

How to compare two data frames and keep rows of the left one based on common values and a time diff in pandas?

问题

我可以帮你将这段文本翻译成代码部分,请参考以下内容:

import pandas as pd
import datetime

# 假设当前时间是 '2023-07-04 23:36:38'
current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)

saved_info = [
    (datetime.datetime(2023, 7, 4, 23, 18, 22, 113476), 't55643', 'ab$ff$55'),
    (datetime.datetime(2023, 7, 4, 23, 26, 22, 113476), 't55643', '5b$ff$15'),
    (datetime.datetime(2023, 7, 4, 23, 27, 22, 133470), 't55643', 'ab$ff$55')
]

new_info = [
    ('t55643', 'ab$ff$55', 44),
    ('t55643', 'be$qf$34', 33)
]

df1 = pd.DataFrame(new_info, columns=["tid", "cid", "val"])
df2 = pd.DataFrame(saved_info, columns=["modified_at", "tid", "cid"])

# 计算时间差
df2['time_diff'] = (current_time - df2['modified_at']).dt.total_seconds() / 60

# 合并两个DataFrame
merged_df = df1.merge(df2, on='cid', how='inner')

# 筛选满足条件的行
df1_final = merged_df[(merged_df['time_diff'] > 15)]
df1_final = df1_final[['tid_x', 'cid', 'val']]
df1_final.columns = ['tid', 'cid', 'val']

print(df1_final)

这段代码首先计算了两个数据帧的时间差,然后将它们合并,并根据条件筛选出满足条件的行,最后输出结果。希望这有助于你完成你的任务。

英文:

I have two data that looks like below

saved_info = [datetime.datetime(2023, 7, 4, 23, 18, 22, 113476), 't55643', 'ab$ff$55'),
		      datetime.datetime(2023, 7, 4, 23, 26, 22, 113476), 't55643', '5b$ff$15'),
              datetime.datetime(2023, 7, 4, 23, 27, 22, 133470), 't55643', 'ab$ff$55')
	       ]

new_info = [('t55643', 'ab$ff$55', 44),
		    ('t55643', 'be$qf$34', 33)
	       ]

I load them into pandas as follows

df1 = pd.DataFrame(new_info)
df1.columns = ["tid", "cid", "val"]

df2 = pd.DataFrame(saved_info)
df2.columns = ["modified_at", "tid", "cid"]

So the data frames look like below

df1

      tid       cid  val
0  t55643  ab$ff$55   44
1  t55643  be$qf$34   33

df2

                                         modified_at     tid       cid
0  datetime.datetime(2023, 7, 4, 23, 18, 22, 113476)  t55643  ab$ff$55
1  datetime.datetime(2023, 7, 4, 23, 26, 22, 112471)  t55643  5b$ff$15
2  datetime.datetime(2023, 7, 4, 23, 27, 22, 133470)  t55643  ab$ff$55

Now I want to get rows from df1 that have common cid value with df2 and modified_at value of df2 should be greater than 15mins

So lets say datetime right now is 2023-07-04 23:36:38 So accordingly the final result of df1 should be

df1 (final)

      tid       cid  val
0  t55643  ab$ff$55   44                               

As you can see the cid value of first row of df1 matches with the first row of df2 and also the time diff of modified_at value of first row of df2 with current time is greater than 15 mins.

Now I can get rows of df1 that share common value on cid column with df2 by doing something like below

common = df1.merge(df2, on=['cid'])
df1_final = df1[(df1.cid.isin(common.cid))]

For comparing time diff between rows of two data frames, I found a stackoverflow answer https://stackoverflow.com/a/46966942/5550284

But in my case I need to check a column value against the current UTC time and furthermore I don't know how do I chain these two conditions together.

Can someone help me?

答案1

得分: 2

不需要在这里使用merge,只需保留差异大于15分钟的行:

current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)
cond = current_time - df2['modified_at'] > '15m'

out = df1[df1['cid'].isin(df2.loc[cond, 'cid'])]

输出:

>>> out
      tid       cid  val
0  t55643  ab$ff$55   44

>>> cond
0     True
1    False
2    False
Name: modified_at, dtype: bool
英文:

You don't need merge here, just keep rows where diff is greater than 15 minutes:

current_time = datetime.datetime(2023, 7, 4, 23, 36, 38)
cond = current_time - df2['modified_at'] > '15m'

out = df1[df1['cid'].isin(df2.loc[cond, 'cid'])]

Output:

>>> out
      tid       cid  val
0  t55643  ab$ff$55   44

>>> cond
0     True
1    False
2    False
Name: modified_at, dtype: bool

huangapple
  • 本文由 发表于 2023年7月7日 02:55:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76631783.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定