英文:
match or nearest match of two dataframe datetime columns and read only one column value from second dataframe
问题
I have two csv files -
CSV A -
Timestamp Date/Time Error_1 Error_2
170:02:46:928 2023/01/01 16:45:11 APP_ERR Loss_of_Data
170:28:37:328 2023/01/01 17:11:04 APP_ERR SB_Data_Expiration
CSV B -
Timestamp Date/Time Error_1 Error_2 Error_3
170:02:46:928 2023/01/01 16:45:07 APP_ERR Loss_of_Data NO_ERROR
170:14:40:928 2023/01/01 16:45:10 APP_ERR Loss_of_Data Loss_of_Data
170:28:16:928 2023/01/01 17:10:44 APP_ERR Cnx_Attempt NO_ERROR
170:28:37:328 2023/01/01 17:11:04 APP_ERR SB_Data_Expiration ERROR
170:29:57:728 2023/01/01 17:11:04 APP_ERR PEB_due_to_overlimit Due_to_Over_SBI
170:32:59:978 2023/01/01 17:11:04 APP_ERR Cnx_Attempt NO_ERROR
170:33:40:778 2023/01/01 17:16:08 APP_ERR Cnx_Attempt NO_ERROR
我想要比较CSV-A中的Date/Time与CSV-B,并在Date/Time匹配或最接近匹配时读取Error_3列的值。最终输出我希望是 -
CSV A -
Timestamp Date/Time Error_1 Error_2 Error_3
170:02:46:928 2023/01/01 16:45:11 APP_ERR Loss_of_Data Loss_of_Data
170:28:37:328 2023/01/01 17:11:04 APP_ERR SB_Data_Expiration ERROR
CSV-A的第一行Date/Time(2023/01/01 16:45:11)在CSV-B中没有匹配,这就是为什么我们检查最接近的匹配,即'2023/01/01 16:45:10'(通常如果没有完全匹配,它将有1或2秒的差异)。
第二行有三个精确匹配,但我们需要读取CSV-B中Error_2='SB_Data_Expiration'的值。
非常感谢您的帮助。
英文:
I have two csv file -
CSV A-
Timestamp Date/Time Error_1 Error_2
170:02:46:928 2023/01/01 16:45:11 APP_ERR Loss_of_Data
170:28:37:328 2023/01/01 17:11:04 APP_ERR SB_Data_Expiration
CSV B-
Timestamp Date/Time Error_1 Error_2 Error_3
170:02:46:928 2023/01/01 16:45:07 APP_ERR Loss_of_Data NO_ERROR
170:14:40:928 2023/01/01 16:45:10 APP_ERR Loss_of_Data Loss_of_Data
170:28:16:928 2023/01/01 17:10:44 APP_ERR Cnx_Attempt NO_ERROR
170:28:37:328 2023/01/01 17:11:04 APP_ERR SB_Data_Expiration ERROR
170:29:57:728 2023/01/01 17:11:04 APP_ERR PEB_due_to_overlimit Due_to_Over_SBI
170:32:59:978 2023/01/01 17:11:04 APP_ERR Cnx_Attempt NO_ERROR
170:33:40:778 2023/01/01 17:16:08 APP_ERR Cnx_Attempt NO_ERROR
I want to compare CSV-A both Date/Time with CSV-B and read the Error_3 column value if Date/Time match or nearest match. Final output I want something like -
CSV A-
Timestamp Date/Time Error_1 Error_2 Error_3
170:02:46:928 2023/01/01 16:45:11 APP_ERR Loss_of_Data Loss_of_Data
170:28:37:328 2023/01/01 17:11:04 APP_ERR SB_Data_Expiration ERROR
First row from CSV-A Date/Time(2023/01/01 16:45:11) is not matched in CSV-B that's why we checked the nearest match i.e. '2023/01/01 16:45:10' (usually it will be 1 or 2 seconds difference if not exact match).
Second row has three exact match but we need to read the value where CSV-B Error_2='SB_Data_Expiration'.
Thank you for the help in advance.
答案1
得分: 1
使用read_csv
进行日期时间解析,然后将其传递给merge_asof
,并使用参数direction='nearest'
和tolorance
:
df1 = pd.read_csv(file1, parse_dates=[1])
df2 = pd.read_csv(file2, parse_dates=[1])
df = pd.merge_asof(df1,
df2[['Date/Time','Error_3']],
on='Date/Time',
direction='nearest',
tolerance=pd.Timedelta('3 sec'))
print (df)
Timestamp Date/Time Error_1 Error_2 \
0 170:02:46:928 2023-01-01 16:45:11 APP_ERR Loss_of_Data
1 170:28:37:328 2023-01-01 17:11:04 APP_ERR SB_Data_Expiration
Error_3
0 Loss_of_Data
1 ERROR
英文:
Use read_csv
with parse_dates
for datetimes and then pass to merge_asof
with parameter direction='nearest'
and tolerance
:
df1 = pd.read_csv(file1, parse_dates=[1])
df2 = pd.read_csv(file2, parse_dates=[1])
df = pd.merge_asof(df1,
df2[['Date/Time','Error_3']],
on='Date/Time',
direction='nearest',
tolerance=pd.Timedelta('3 sec'))
print (df)
Timestamp Date/Time Error_1 Error_2 \
0 170:02:46:928 2023-01-01 16:45:11 APP_ERR Loss_of_Data
1 170:28:37:328 2023-01-01 17:11:04 APP_ERR SB_Data_Expiration
Error_3
0 Loss_of_Data
1 ERROR
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论