英文:
Dataframe merge on multiple conditions in date range
问题
我有两个数据框:
df = pd.DataFrame({'orderID': [10, 11, 12, 13, 14], 'Sales': [100, 110, 120, 140, 150], 'Name': ['John', "Maria", "Maria", "John", "Cesar"], 'Date':['2022-01-08', '2022-02-10', '2022-02-15', '2022-02-05', '2022-05-07']})
df2 = pd.DataFrame({'Negotiation': [100, 110, 121, 134, 141], 'Sales': [100, 110, 120, 140, 150], 'Name': ['John', "Maria", "Maria", "John", "Ricardo"], 'Date':['2022-01-01', '2022-01-20', '2022-01-30', '2022-02-01', '2022-09-01']})
我需要根据 'Name' 和日期合并它们,但日期不同,所以它们需要成为一个日期范围,生成以下数据框:
df_m = pd.DataFrame({'orderID': [10, 11, 12, 13, 14], 'Sales': [100, 110, 120, 140, 150], 'Name_x': ['John', "Maria", "Maria", "John", "Cesar"], 'Date_X':['2022-01-08', '2022-02-10', '2022-02-15', '2022-02-05', '2022-05-07'], 'Negotiation': [100, 110, 121, 134, 141], 'Sales': [100, 110, 120, 140, 150], 'Name_y': ['John', "Maria", "Maria", 'John', "Null"], 'Date_y':['2022-01-01', '2022-01-20', '2022-01-30', '2022-02-01', 'Null']}
我需要避免与错误的日期合并,如下所示:
df_m_wrong_date = pd.DataFrame({'orderID': [10, 11, 12, 13, 14], 'Sales': [100, 110, 120, 140, 150], 'Name_x': ['John', "Maria", "Maria", "John", "Cesar"], 'Date_X':['2022-01-08', '2022-02-10', '2022-02-15', '2022-02-05', '2022-05-07'], 'Negotiation': [100, 110, 121, 134, 141], 'Sales': [100, 110, 120, 140, 150], 'Name_y': ['John', "Maria", "Maria", 'John', "Null"], 'Date_y':['2022-02-01', '2022-01-30', '2022-01-20', '2022-01-01', 'Null']}
英文:
I have two dataframes
df = pd.DataFrame({'orderID': [10, 11, 12, 13, 14], 'Sales': [100, 110, 120, 140, 150], 'Name': ['John', "Maria", "Maria", "John", "Cesar"],
'Date':['2022-01-08', '2022-02-10', '2022-02-15', '2022-02-05', '2022-05-07']})
df2 = pd.DataFrame({'Negotiation': [100, 110, 121, 134, 141], 'Sales': [100, 110, 120, 140, 150], 'Name': ['John', "Maria", "Maria", "John", "Ricardo"],
'Date':['2022-01-01', '2022-01-20', '2022-01-30', '2022-02-01', '2022-09-01']})
I need to merge them based on 'Name' and date, but the dates aren't the same, so they need to be a date range yielding a dataframe as follow:
df_m = pd.DataFrame({'orderID': [10, 11, 12, 13, 14], 'Sales': [100, 110, 120, 140, 150], 'Name_x': ['John', "Maria", "Maria", "John", "Cesar"],
'Date_X':['2022-01-08', '2022-02-10', '2022-02-15', '2022-02-05', '2022-05-07'], 'Negotiation': [100, 110, 121, 134, 141], 'Sales': [100, 110, 120, 140, 150], 'Name_y': ['John', "Maria", "Maria", 'John', "Null"], 'Date_y':['2022-01-01', '2022-01-20', '2022-01-30', '2022-02-01', 'Null']})
I need to avoid merging with the wrong dates as follow:
df_m_wrong_date = pd.DataFrame({'orderID': [10, 11, 12, 13, 14], 'Sales': [100, 110, 120, 140, 150], 'Name_x': ['John', "Maria", "Maria", "John", "Cesar"],
'Date_X':['2022-01-08', '2022-02-10', '2022-02-15', '2022-02-05', '2022-05-07'], 'Negotiation': [100, 110, 121, 134, 141], 'Sales': [100, 110, 120, 140, 150], 'Name_y': ['John', "Maria", "Maria", 'John', "Null"], 'Date_y':['2022-02-01', '2022-01-30', '2022-01-20', '2022-01-01', 'Null']})
答案1
得分: 2
你可以使用 merge_asof
:
df['Date'] = pd.to_datetime(df['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
out = (pd.merge_asof(df.sort_values('Date'),
df2.sort_values('Date').rename(columns={'Date': 'NegDate'}),
by=['Sales', 'Name'],
left_on='Date', right_on='NegDate', direction='backward')
.sort_values('orderID'))
输出:
>>> out
orderID Sales Name Date Negotiation NegDate
0 10 100 John 2022-01-08 100.0 2022-01-01
2 11 110 Maria 2022-02-10 110.0 2022-01-20
3 12 120 Maria 2022-02-15 121.0 2022-01-30
1 13 140 John 2022-02-05 134.0 2022-02-01
4 14 150 Cesar 2022-05-07 NaN NaT
英文:
You can use merge_asof
:
df['Date'] = pd.to_datetime(df['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
out = (pd.merge_asof(df.sort_values('Date'),
df2.sort_values('Date').rename(columns={'Date': 'NegDate'}),
by=['Sales', 'Name'],
left_on='Date', right_on='NegDate', direction='backward')
.sort_values('orderID'))
Output:
>>> out
orderID Sales Name Date Negotiation NegDate
0 10 100 John 2022-01-08 100.0 2022-01-01
2 11 110 Maria 2022-02-10 110.0 2022-01-20
3 12 120 Maria 2022-02-15 121.0 2022-01-30
1 13 140 John 2022-02-05 134.0 2022-02-01
4 14 150 Cesar 2022-05-07 NaN NaT
答案2
得分: 0
你可以这样拼接。如果名称和日期相同,则会删除。我认为将其合并是不合逻辑的。如果你想删除错误的日期,你可以使用pd.to_datetime
来做。
data = [df, df2]
df = pd.concat(data)
print(df.drop_duplicates(subset=['Name', 'Date']))
英文:
You can concatenate it like that. It drops if name and date is same. I think it is unlogical to merging it. If you want to drop wrong dates you can do it with pd.to_datetime
data = [df,df2]
df = pd.concat(data)
print(df.drop_duplicates(subset = (['Name', 'Date'])))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论