英文:
Merge not giving the desired result when replacing values in a dataframe based on a match in another dataframe
问题
我正在尝试根据不同数据框中的值替换数据框中的值。以下是一个简单的数据示例:
df1
old_index Date
0 3 2019-10-23
1 19 2019-12-05
2 29 NaT
3 32 NaT
4 34 2019-12-05
5 116 NaT
6 118 2020-01-29
df2
old_index Date
4 3 2019-10-24
12 19 2019-12-10
15 29 2019-11-12
22 32 2021-11-19
29 34 2019-12-10
40 116 2020-03-24
45 118 2020-01-30
结果应该是
old_index Date
0 3 2019-10-23
1 19 2019-12-05
2 29 2019-11-12
3 32 2021-11-19
4 34 2019-12-05
5 116 2020-03-24
6 118 2020-01-29
我尝试使用如下所示的合并操作:
mrg = pd.merge(df1, df2, how="right", on="old_index")
但结果是错误的数据框:
old_index Date_x Date_y
0 3 2019-10-23 2019-10-24
1 19 2019-12-05 2019-12-10
2 29 NaT 2019-11-12
3 32 NaT 2021-11-19
4 34 2019-12-05 2019-12-10
5 116 NaT 2020-03-24
6 118 2020-01-29 2020-01-30
任何帮助都将不胜感激。
英文:
I'm trying to replace values in a dataframe based on values in a different dataframe. Below you will find a simple data sample:
df1
old_index Date
0 3 2019-10-23
1 19 2019-12-05
2 29 NaT
3 32 NaT
4 34 2019-12-05
5 116 NaT
6 118 2020-01-29
df2
old_index Date
4 3 2019-10-24
12 19 2019-12-10
15 29 2019-11-12
22 32 2021-11-19
29 34 2019-12-10
40 116 2020-03-24
45 118 2020-01-30
The resulting df should be
old_index Date
0 3 2019-10-23
1 19 2019-12-05
2 29 2019-11-12
3 32 2021-11-19
4 34 2019-12-05
5 116 2020-03-24
6 118 2020-01-29
I tried to use merge as shown below:
mrg = pd.merge(df1, df2, how="right", on="old_index")
But that resulted in the wrong df
old_index Date_x Date_y
0 3 2019-10-23 2019-10-24
1 19 2019-12-05 2019-12-10
2 29 NaT 2019-11-12
3 32 NaT 2021-11-19
4 34 2019-12-05 2019-12-10
5 116 NaT 2020-03-24
6 118 2020-01-29 2020-01-30
Any help is appreciated.
答案1
得分: 1
你可以创建一个映射系列,然后填充缺失值:
df1['Date'] = df1['Date'].fillna(df1['old_index'].map(df2.set_index('old_index')['Date']))
print(df1)
# 输出
old_index Date
0 3 2019-10-23
1 19 2019-12-05
2 29 2019-11-12
3 32 2021-11-19
4 34 2019-12-05
5 116 2020-03-24
6 118 2020-01-29
一个映射系列充当字典:
>>> df2.set_index('old_index')['Date']
old_index
3 2019-10-24
19 2019-12-10
29 2019-11-12
32 2021-11-19
34 2019-12-10
116 2020-03-24
118 2020-01-30
Name: Date, dtype: datetime64[ns]
英文:
You can create a mapping series then fill missing values:
df1['Date'] = df1['Date'].fillna(df1['old_index'].map(df2.set_index('old_index')['Date']))
print(df1)
# Output
old_index Date
0 3 2019-10-23
1 19 2019-12-05
2 29 2019-11-12
3 32 2021-11-19
4 34 2019-12-05
5 116 2020-03-24
6 118 2020-01-29
A mapping Series acts as a dictionary:
>>> df2.set_index('old_index')['Date']
old_index
3 2019-10-24
19 2019-12-10
29 2019-11-12
32 2021-11-19
34 2019-12-10
116 2020-03-24
118 2020-01-30
Name: Date, dtype: datetime64[ns]
答案2
得分: 0
mrg = pd.merge(df1, df2, how="right", on="old_index", suffixes=('', ''))
mrg['Date'] = mrg['Date'].fillna(mrg['Date'].pop('Date'))
或者:
df11 = df1.set_index('old_index')
df22 = df2.set_index('old_index')
out = df11.combine_first(df22).reset_index()
英文:
Use:
mrg = pd.merge(df1, df2, how="right", on="old_index", suffixes=('','_'))
mrg['Date'] = mrg['Date'].fillna(mrg['Date'].pop('Date_'))
Or:
df11 = df1.set_index('old_index')
df22 = df2.set_index('old_index')
out = df11.combine_first(df22).reset_index()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论