如何在R中用另一个数据框替代一个数据框

huangapple go评论95阅读模式
英文:

how to replace a dataframe with another dataframe in R

问题

if df1 look like a df2, df1 would replaced to df2

英文:

i want to replace a df1 data, with df2, which df2 is a data like df1
example

  1. df1 <- data.frame(
  2. name = c(
  3. "A. MAHJUM-61365",
  4. "A. MAHJUM-61365. MAHJUM-61365",
  5. "A. RIZAL. AD-11002795",
  6. "A. RIZAL. AD-11002795. RIZAL. AD-11002795",
  7. "ABD. KADIR-60447",
  8. "ABD. KADIR-60447ABD. KADIR-60447",
  9. "ABD. KAHAR-62551",
  10. "ABD. RASYID DS-11002082",
  11. "ABDREAS APUNG @SANY",
  12. "ABDUL AZIS @HYUNDAY",
  13. "ABDUL AZIZ @HYUNDAI",
  14. "ABDUL AZIZ@HYUNDAI"
  15. ))

and df2 is

  1. df2 <- data.frame(
  2. name = c(
  3. "A. MAHJUM-61365",
  4. "A. RIZAL. AD-11002795",
  5. "ABD. KADIR-60447",
  6. "ABD. KAHAR-62551",
  7. "ABD. RASYID DS-11002082",
  8. "ABDREAS APUNG @SANY",
  9. "ABDUL AZIS @HYUNDAY"
  10. ))

if df1 look like a df2, df1 would replaced to df2

答案1

得分: 3

如它是子字符串匹配,我们可以使用 fuzzyjoin

  1. library(dplyr)
  2. library(fuzzyjoin)
  3. regex_left_join(df1, df2, by = 'name') %>%
  4. transmute(name = coalesce(name.y, name.x))

或者使用基于距离的方法。

  1. stringdist_left_join(df1, df2, by = 'name') %>%
  2. transmute(name = coalesce(name.y, name.x))
英文:

As it is substring match, we can use fuzzyjoin

  1. library(dplyr)
  2. library(fuzzyjoin)
  3. regex_left_join(df1, df2, by = 'name') %>%
  4. transmute(name = coalesce(name.y, name.x))

or use a distance based approach

  1. stringdist_left_join(df1, df2, by = 'name') %>%
  2. transmute(name = coalesce(name.y, name.x))
  3. </details>
  4. # 答案2
  5. **得分**: 0
  6. 你可以使用 `adist` 来查找最佳匹配并替换它们。
  7. ```R
  8. i <- max.col(-adist(df1$name, df2$name, partial=TRUE))
  9. df1$name <- df2$name[i]
  10. df1
  11. # name
  12. #1 A. MAHJUM-61365
  13. #2 A. MAHJUM-61365
  14. #3 A. RIZAL. AD-11002795
  15. #4 A. RIZAL. AD-11002795
  16. #5 ABD. KADIR-60447
  17. #6 ABD. KADIR-60447
  18. #7 ABD. KAHAR-62551
  19. #8 ABD. RASYID DS-11002082
  20. #9 ABDREAS APUNG @SANY
  21. #10 ABDUL AZIS @HYUNDAY
  22. #11 ABDUL AZIS @HYUNDAY
  23. #12 ABDUL AZIS @HYUNDAY
英文:

You can use adist to find the best match and replace them.

  1. i &lt;- max.col(-adist(df1$name, df2$name, partial=TRUE))
  2. df1$name &lt;- df2$name[i]
  3. df1
  4. # name
  5. #1 A. MAHJUM-61365
  6. #2 A. MAHJUM-61365
  7. #3 A. RIZAL. AD-11002795
  8. #4 A. RIZAL. AD-11002795
  9. #5 ABD. KADIR-60447
  10. #6 ABD. KADIR-60447
  11. #7 ABD. KAHAR-62551
  12. #8 ABD. RASYID DS-11002082
  13. #9 ABDREAS APUNG @SANY
  14. #10 ABDUL AZIS @HYUNDAY
  15. #11 ABDUL AZIS @HYUNDAY
  16. #12 ABDUL AZIS @HYUNDAY

huangapple
  • 本文由 发表于 2023年4月4日 14:03:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定