使用 match(而不是 merge)来从另一个更大的数据框填充列值。

huangapple go评论88阅读模式
英文:

Using match (not merge) to fill column values from another bigger data frame

问题

I have a dataframe to which I want to create a new column based on the values from another column but struggling to be able to match properly.

df1
name            code
Player 3        NA
Player 14       NA
Player 16       NA
Player 22       NA
Player 43       NA
Player 45       NA

现在我想从df2的code列中填充df1的code列,根据name进行匹配。

df2
name            id      nationality
Player 1        1       UK
Player 2        2       UK
Player 3        3       UK
Player 4        4       UK
Player 5        5       UK
Player 14       14      UK
Player 16       16      UK
Player 22       22      UK
Player 29       29      UK
Player 30       30      UK
Player 32       32      UK
Player 39       39      UK
Player 43       43      UK
Player 45       45      UK

我不想在这里使用合并(merge)操作,因为df2要比df1大得多,而且完全独立。可以尝试以下代码,但我无法正确运行:

df1$code = df2[match(df1$name, df2$name), 'id')
英文:

I have a dataframe to which I want to create a new column based on the values from another column but struggling to be able to match properly.

df1
name            code
Player 3        NA
Player 14       NA
Player 16       NA
Player 22       NA
Player 43       NA
Player 45       NA

Now I wish to fill the code column in df1 from the code column in df2 my matching on name

df2
name            id      nationality
Player 1        1       UK
Player 2        2       UK
Player 3        3       UK
Player 4        4       UK
Player 5        5       UK
Player 14       14      UK
Player 16       16      UK
Player 22       22      UK
Player 29       29      UK
Player 30       30      UK
Player 32       32      UK
Player 39       39      UK
Player 43       43      UK
Player 45       45      UK

I dont want to use merge here as df2 will be much bigger than df2 and completely separate, it would be something like; (but I cant get it correct)

df1$code = df2[match(df1$name, df2$name), 'id')

答案1

得分: 2

Here is the translated content:

"match works for this because you are only matching on one column. merge works for this too, but generalizes up to matching on multiple columns.

It doesn't matter that df2 is bigger than df1, merge() will still work just fine as long as you don't override the default and set all = TRUE - if you do that, then you will get all the rows from df2. The default is all = FALSE and you will only get rows that appear in both data frames. Here, I set all.x = TRUE to make sure you keep all rows in df1 even if they don't have matches in df2.

Because merge is more general (working for multiple columns, letting you specify whether you want to keep only rows that occur in df1 or only rows that occur in df2 or both or all), I think it is a better solution when working with data frames. match is a great function when one (or both) of your inputs are plain vectors, not data frames.

Unfortunately merge doesn't keep the row order, but you can easily re-order after."

英文:

match works for this because you are only matching on one column. merge works for this too, but generalizes up to matching on multiple columns.

It doesn't matter that df2 is bigger than df1, merge() will still work just fine as long as you don't override the default and set all = TRUE - if you do that, then you will get all the rows from df2. The default is all = FALSE and you will only get rows that appear in both data frames. Here, I set all.x = TRUE to make sure you keep all rows in df1 even if they don't have matches in df2.

Because merge is more general (working for multiple columns, letting you specify whether you want to keep only rows that occur in df1 or only rows that occur in df2 or both or all), I think it is a better solution when working with data frames. match is a great function when one (or both) of your inputs are plain vectors, not data frames.

## with dplyr
library(dplyr)
df1 |> select(-code) |>
  left_join(select(df2, -id), by = "name")
#        name nationality
# 1  Player 3          UK
# 2 Player 14          UK
# 3 Player 16          UK
# 4 Player 22          UK
# 5 Player 43          UK
# 6 Player 45          UK

## with base R
df1[["code"]] = NULL
merge(df1, df2[c("name", "nationality")], all.x = TRUE)
#        name nationality
# 1 Player 14          UK
# 2 Player 16          UK
# 3 Player 22          UK
# 4  Player 3          UK
# 5 Player 43          UK
# 6 Player 45          UK

Unfortunately merge doesn't keep the row order, but you can easily re-order after.


Using this sample data:

df1 = read.table(text = 'name|code
Player 3|NA
Player 14|NA
Player 16|NA
Player 22|NA
Player 43|NA
Player 45|NA', header = T, sep = "|")

df2 = read.table(text = 'name|id|nationality
Player 1|1|UK
Player 2|2|UK
Player 3|3|UK
Player 4|4|UK
Player 5|5|UK
Player 14|14|UK
Player 16|16|UK
Player 22|22|UK
Player 29|29|UK
Player 30|30|UK
Player 32|32|UK
Player 39|39|UK
Player 43|43|UK
Player 45|45|UK
', header = TRUE, sep = "|")

答案2

得分: 1

我认为这就是你需要的。match 返回第一个参数中的匹配项在第二个参数中的索引。

df1$code = df2$id[match(df1$name, df2$name)]
英文:

I think this is what you need. match returns the index in the second argument of matches from the first.

df1$code = df2$id[match(df1$name, df2$name)]

答案3

得分: -1

df1$code = df2[df2$name %in% df1$name,2]
我猜这里的 %in% 几乎与 match 相同。
你也可以尝试 which(df2$name == df1$name) 来返回你想要的行。

英文:

my answer:
df1$code = df2[df2$name %in% df1$name,2]

I guess %in% here is almost the same with match.
you can also try which(df2$name == df1$name) to return the rows you want

huangapple
  • 本文由 发表于 2023年7月6日 21:45:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76629520.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定