如何在R中用另一个数据框替代一个数据框

huangapple go评论52阅读模式
英文:

how to replace a dataframe with another dataframe in R

问题

if df1 look like a df2, df1 would replaced to df2

英文:

i want to replace a df1 data, with df2, which df2 is a data like df1
example

df1 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. MAHJUM-61365. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "A. RIZAL. AD-11002795. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KADIR-60447ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY",
    "ABDUL AZIZ @HYUNDAI",
    "ABDUL AZIZ@HYUNDAI"
  ))

and df2 is

df2 <- data.frame(
  name = c(
    "A. MAHJUM-61365",
    "A. RIZAL. AD-11002795",
    "ABD. KADIR-60447",
    "ABD. KAHAR-62551",
    "ABD. RASYID DS-11002082",
    "ABDREAS APUNG @SANY",
    "ABDUL AZIS @HYUNDAY"
  ))

if df1 look like a df2, df1 would replaced to df2

答案1

得分: 3

如它是子字符串匹配,我们可以使用 fuzzyjoin

library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = 'name') %>%
  transmute(name = coalesce(name.y, name.x))

或者使用基于距离的方法。

stringdist_left_join(df1, df2, by = 'name') %>%
  transmute(name = coalesce(name.y, name.x))
英文:

As it is substring match, we can use fuzzyjoin

library(dplyr)
library(fuzzyjoin)
regex_left_join(df1, df2, by = 'name') %>% 
  transmute(name = coalesce(name.y, name.x))

or use a distance based approach

 stringdist_left_join(df1, df2, by = 'name') %>% 
   transmute(name = coalesce(name.y, name.x))

</details>



# 答案2
**得分**: 0

你可以使用 `adist` 来查找最佳匹配并替换它们。
```R
i <- max.col(-adist(df1$name, df2$name, partial=TRUE))
df1$name <- df2$name[i]

df1
#                      name
#1          A. MAHJUM-61365
#2          A. MAHJUM-61365
#3    A. RIZAL. AD-11002795
#4    A. RIZAL. AD-11002795
#5         ABD. KADIR-60447
#6         ABD. KADIR-60447
#7         ABD. KAHAR-62551
#8  ABD. RASYID DS-11002082
#9      ABDREAS APUNG @SANY
#10     ABDUL AZIS @HYUNDAY
#11     ABDUL AZIS @HYUNDAY
#12     ABDUL AZIS @HYUNDAY
英文:

You can use adist to find the best match and replace them.

i &lt;- max.col(-adist(df1$name, df2$name, partial=TRUE))
df1$name &lt;- df2$name[i]

df1
#                      name
#1          A. MAHJUM-61365
#2          A. MAHJUM-61365
#3    A. RIZAL. AD-11002795
#4    A. RIZAL. AD-11002795
#5         ABD. KADIR-60447
#6         ABD. KADIR-60447
#7         ABD. KAHAR-62551
#8  ABD. RASYID DS-11002082
#9      ABDREAS APUNG @SANY
#10     ABDUL AZIS @HYUNDAY
#11     ABDUL AZIS @HYUNDAY
#12     ABDUL AZIS @HYUNDAY

huangapple
  • 本文由 发表于 2023年4月4日 14:03:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75925960.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定