合并在另一个数据框中匹配的值时未能产生所期望的结果

huangapple go评论102阅读模式
英文:

Merge not giving the desired result when replacing values in a dataframe based on a match in another dataframe

问题

我正在尝试根据不同数据框中的值替换数据框中的值。以下是一个简单的数据示例:

df1

  1. old_index Date
  2. 0 3 2019-10-23
  3. 1 19 2019-12-05
  4. 2 29 NaT
  5. 3 32 NaT
  6. 4 34 2019-12-05
  7. 5 116 NaT
  8. 6 118 2020-01-29

df2

  1. old_index Date
  2. 4 3 2019-10-24
  3. 12 19 2019-12-10
  4. 15 29 2019-11-12
  5. 22 32 2021-11-19
  6. 29 34 2019-12-10
  7. 40 116 2020-03-24
  8. 45 118 2020-01-30

结果应该是

  1. old_index Date
  2. 0 3 2019-10-23
  3. 1 19 2019-12-05
  4. 2 29 2019-11-12
  5. 3 32 2021-11-19
  6. 4 34 2019-12-05
  7. 5 116 2020-03-24
  8. 6 118 2020-01-29

我尝试使用如下所示的合并操作:

  1. mrg = pd.merge(df1, df2, how="right", on="old_index")

但结果是错误的数据框:

  1. old_index Date_x Date_y
  2. 0 3 2019-10-23 2019-10-24
  3. 1 19 2019-12-05 2019-12-10
  4. 2 29 NaT 2019-11-12
  5. 3 32 NaT 2021-11-19
  6. 4 34 2019-12-05 2019-12-10
  7. 5 116 NaT 2020-03-24
  8. 6 118 2020-01-29 2020-01-30

任何帮助都将不胜感激。

英文:

I'm trying to replace values in a dataframe based on values in a different dataframe. Below you will find a simple data sample:

df1

  1. old_index Date
  2. 0 3 2019-10-23
  3. 1 19 2019-12-05
  4. 2 29 NaT
  5. 3 32 NaT
  6. 4 34 2019-12-05
  7. 5 116 NaT
  8. 6 118 2020-01-29

df2

  1. old_index Date
  2. 4 3 2019-10-24
  3. 12 19 2019-12-10
  4. 15 29 2019-11-12
  5. 22 32 2021-11-19
  6. 29 34 2019-12-10
  7. 40 116 2020-03-24
  8. 45 118 2020-01-30

The resulting df should be

  1. old_index Date
  2. 0 3 2019-10-23
  3. 1 19 2019-12-05
  4. 2 29 2019-11-12
  5. 3 32 2021-11-19
  6. 4 34 2019-12-05
  7. 5 116 2020-03-24
  8. 6 118 2020-01-29

I tried to use merge as shown below:

  1. mrg = pd.merge(df1, df2, how="right", on="old_index")

But that resulted in the wrong df

  1. old_index Date_x Date_y
  2. 0 3 2019-10-23 2019-10-24
  3. 1 19 2019-12-05 2019-12-10
  4. 2 29 NaT 2019-11-12
  5. 3 32 NaT 2021-11-19
  6. 4 34 2019-12-05 2019-12-10
  7. 5 116 NaT 2020-03-24
  8. 6 118 2020-01-29 2020-01-30

Any help is appreciated.

答案1

得分: 1

你可以创建一个映射系列,然后填充缺失值:

  1. df1['Date'] = df1['Date'].fillna(df1['old_index'].map(df2.set_index('old_index')['Date']))
  2. print(df1)
  3. # 输出
  4. old_index Date
  5. 0 3 2019-10-23
  6. 1 19 2019-12-05
  7. 2 29 2019-11-12
  8. 3 32 2021-11-19
  9. 4 34 2019-12-05
  10. 5 116 2020-03-24
  11. 6 118 2020-01-29

一个映射系列充当字典:

  1. >>> df2.set_index('old_index')['Date']
  2. old_index
  3. 3 2019-10-24
  4. 19 2019-12-10
  5. 29 2019-11-12
  6. 32 2021-11-19
  7. 34 2019-12-10
  8. 116 2020-03-24
  9. 118 2020-01-30
  10. Name: Date, dtype: datetime64[ns]
英文:

You can create a mapping series then fill missing values:

  1. df1['Date'] = df1['Date'].fillna(df1['old_index'].map(df2.set_index('old_index')['Date']))
  2. print(df1)
  3. # Output
  4. old_index Date
  5. 0 3 2019-10-23
  6. 1 19 2019-12-05
  7. 2 29 2019-11-12
  8. 3 32 2021-11-19
  9. 4 34 2019-12-05
  10. 5 116 2020-03-24
  11. 6 118 2020-01-29

A mapping Series acts as a dictionary:

  1. >>> df2.set_index('old_index')['Date']
  2. old_index
  3. 3 2019-10-24
  4. 19 2019-12-10
  5. 29 2019-11-12
  6. 32 2021-11-19
  7. 34 2019-12-10
  8. 116 2020-03-24
  9. 118 2020-01-30
  10. Name: Date, dtype: datetime64[ns]

答案2

得分: 0

mrg = pd.merge(df1, df2, how="right", on="old_index", suffixes=('', ''))
mrg['Date'] = mrg['Date'].fillna(mrg['Date'].pop('Date
'))

或者:

df11 = df1.set_index('old_index')
df22 = df2.set_index('old_index')
out = df11.combine_first(df22).reset_index()

英文:

Use:

  1. mrg = pd.merge(df1, df2, how="right", on="old_index", suffixes=('','_'))
  2. mrg['Date'] = mrg['Date'].fillna(mrg['Date'].pop('Date_'))

Or:

  1. df11 = df1.set_index('old_index')
  2. df22 = df2.set_index('old_index')
  3. out = df11.combine_first(df22).reset_index()

huangapple
  • 本文由 发表于 2023年7月3日 14:33:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76602337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定