match or nearest match of two dataframe datetime columns and read only one column value from second dataframe

huangapple go评论68阅读模式
英文:

match or nearest match of two dataframe datetime columns and read only one column value from second dataframe

问题

I have two csv files -

CSV A -

Timestamp          Date/Time             Error_1         Error_2
170:02:46:928    2023/01/01 16:45:11     APP_ERR       Loss_of_Data
170:28:37:328    2023/01/01 17:11:04     APP_ERR       SB_Data_Expiration

CSV B -

Timestamp          Date/Time             Error_1         Error_2               Error_3
170:02:46:928    2023/01/01 16:45:07     APP_ERR       Loss_of_Data         NO_ERROR
170:14:40:928    2023/01/01 16:45:10     APP_ERR       Loss_of_Data         Loss_of_Data
170:28:16:928    2023/01/01 17:10:44     APP_ERR       Cnx_Attempt          NO_ERROR
170:28:37:328    2023/01/01 17:11:04     APP_ERR       SB_Data_Expiration   ERROR
170:29:57:728    2023/01/01 17:11:04     APP_ERR       PEB_due_to_overlimit Due_to_Over_SBI
170:32:59:978    2023/01/01 17:11:04     APP_ERR       Cnx_Attempt          NO_ERROR
170:33:40:778    2023/01/01 17:16:08     APP_ERR       Cnx_Attempt          NO_ERROR

我想要比较CSV-A中的Date/Time与CSV-B,并在Date/Time匹配或最接近匹配时读取Error_3列的值。最终输出我希望是 -

CSV A -

Timestamp          Date/Time             Error_1         Error_2              Error_3
170:02:46:928    2023/01/01 16:45:11     APP_ERR       Loss_of_Data        Loss_of_Data
170:28:37:328    2023/01/01 17:11:04     APP_ERR       SB_Data_Expiration   ERROR

CSV-A的第一行Date/Time(2023/01/01 16:45:11)在CSV-B中没有匹配,这就是为什么我们检查最接近的匹配,即'2023/01/01 16:45:10'(通常如果没有完全匹配,它将有1或2秒的差异)。
第二行有三个精确匹配,但我们需要读取CSV-B中Error_2='SB_Data_Expiration'的值。

非常感谢您的帮助。

英文:

I have two csv file -

CSV A-

 Timestamp  	  	Date/Time  	 		 Error_1 	        Error_2
170:02:46:928	 2023/01/01 16:45:11 	 APP_ERR 	  	 Loss_of_Data
170:28:37:328	 2023/01/01 17:11:04 	 APP_ERR 	  	 SB_Data_Expiration

CSV B-

 Timestamp  	  	Date/Time  	 		 Error_1 	        Error_2	 				Error_3
170:02:46:928	 2023/01/01 16:45:07 	 APP_ERR 	  	 Loss_of_Data 	 			NO_ERROR
170:14:40:928	 2023/01/01 16:45:10 	 APP_ERR 	  	 Loss_of_Data 	 			Loss_of_Data 
170:28:16:928	 2023/01/01 17:10:44 	 APP_ERR 	  	 Cnx_Attempt 	 			NO_ERROR 
170:28:37:328	 2023/01/01 17:11:04 	 APP_ERR 	  	 SB_Data_Expiration 	 	ERROR 
170:29:57:728	 2023/01/01 17:11:04 	 APP_ERR 	  	 PEB_due_to_overlimit 	 	Due_to_Over_SBI 
170:32:59:978	 2023/01/01 17:11:04 	 APP_ERR 	  	 Cnx_Attempt 	 			NO_ERROR 
170:33:40:778	 2023/01/01 17:16:08 	 APP_ERR 	  	 Cnx_Attempt 	 			NO_ERROR

I want to compare CSV-A both Date/Time with CSV-B and read the Error_3 column value if Date/Time match or nearest match. Final output I want something like -

CSV A-

 Timestamp  	  	Date/Time  	 		 Error_1 	        Error_2				Error_3
170:02:46:928	 2023/01/01 16:45:11 	 APP_ERR 	  	 Loss_of_Data			Loss_of_Data
170:28:37:328	 2023/01/01 17:11:04 	 APP_ERR 	  	 SB_Data_Expiration		ERROR

First row from CSV-A Date/Time(2023/01/01 16:45:11) is not matched in CSV-B that's why we checked the nearest match i.e. '2023/01/01 16:45:10' (usually it will be 1 or 2 seconds difference if not exact match).
Second row has three exact match but we need to read the value where CSV-B Error_2='SB_Data_Expiration'.

Thank you for the help in advance.

答案1

得分: 1

使用read_csv进行日期时间解析,然后将其传递给merge_asof,并使用参数direction='nearest'tolorance

df1 = pd.read_csv(file1, parse_dates=[1])
df2 = pd.read_csv(file2, parse_dates=[1])

df = pd.merge_asof(df1, 
                   df2[['Date/Time','Error_3']], 
                   on='Date/Time', 
                   direction='nearest', 
                   tolerance=pd.Timedelta('3 sec'))
print (df)
       Timestamp           Date/Time  Error_1             Error_2  \
0  170:02:46:928 2023-01-01 16:45:11  APP_ERR        Loss_of_Data   
1  170:28:37:328 2023-01-01 17:11:04  APP_ERR  SB_Data_Expiration   
        Error_3  
0  Loss_of_Data  
1         ERROR
英文:

Use read_csv with parse_dates for datetimes and then pass to merge_asof with parameter direction='nearest' and tolerance:

df1 = pd.read_csv(file1, parse_dates=[1])
df2 = pd.read_csv(file2, parse_dates=[1])

df = pd.merge_asof(df1, 
                   df2[['Date/Time','Error_3']], 
                   on='Date/Time', 
                   direction='nearest', 
                   tolerance=pd.Timedelta('3 sec'))
print (df)
       Timestamp           Date/Time  Error_1             Error_2  \
0  170:02:46:928 2023-01-01 16:45:11  APP_ERR        Loss_of_Data   
1  170:28:37:328 2023-01-01 17:11:04  APP_ERR  SB_Data_Expiration   

        Error_3  
0  Loss_of_Data  
1         ERROR  

huangapple
  • 本文由 发表于 2023年4月4日 12:56:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925643.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定