英文:
Using match (not merge) to fill column values from another bigger data frame
问题
I have a dataframe to which I want to create a new column based on the values from another column but struggling to be able to match properly.
df1
name code
Player 3 NA
Player 14 NA
Player 16 NA
Player 22 NA
Player 43 NA
Player 45 NA
现在我想从df2的code列中填充df1的code列,根据name进行匹配。
df2
name id nationality
Player 1 1 UK
Player 2 2 UK
Player 3 3 UK
Player 4 4 UK
Player 5 5 UK
Player 14 14 UK
Player 16 16 UK
Player 22 22 UK
Player 29 29 UK
Player 30 30 UK
Player 32 32 UK
Player 39 39 UK
Player 43 43 UK
Player 45 45 UK
我不想在这里使用合并(merge)操作,因为df2要比df1大得多,而且完全独立。可以尝试以下代码,但我无法正确运行:
df1$code = df2[match(df1$name, df2$name), 'id')
英文:
I have a dataframe to which I want to create a new column based on the values from another column but struggling to be able to match properly.
df1
name code
Player 3 NA
Player 14 NA
Player 16 NA
Player 22 NA
Player 43 NA
Player 45 NA
Now I wish to fill the code column in df1 from the code column in df2 my matching on name
df2
name id nationality
Player 1 1 UK
Player 2 2 UK
Player 3 3 UK
Player 4 4 UK
Player 5 5 UK
Player 14 14 UK
Player 16 16 UK
Player 22 22 UK
Player 29 29 UK
Player 30 30 UK
Player 32 32 UK
Player 39 39 UK
Player 43 43 UK
Player 45 45 UK
I dont want to use merge here as df2 will be much bigger than df2 and completely separate, it would be something like; (but I cant get it correct)
df1$code = df2[match(df1$name, df2$name), 'id')
答案1
得分: 2
Here is the translated content:
"match
works for this because you are only matching on one column. merge
works for this too, but generalizes up to matching on multiple columns.
It doesn't matter that df2
is bigger than df1
, merge()
will still work just fine as long as you don't override the default and set all = TRUE
- if you do that, then you will get all the rows from df2
. The default is all = FALSE
and you will only get rows that appear in both data frames. Here, I set all.x = TRUE
to make sure you keep all rows in df1
even if they don't have matches in df2
.
Because merge
is more general (working for multiple columns, letting you specify whether you want to keep only rows that occur in df1
or only rows that occur in df2
or both or all), I think it is a better solution when working with data frames. match
is a great function when one (or both) of your inputs are plain vectors, not data frames.
Unfortunately merge
doesn't keep the row order, but you can easily re-order after."
英文:
match
works for this because you are only matching on one column. merge
works for this too, but generalizes up to matching on multiple columns.
It doesn't matter that df2
is bigger than df1
, merge()
will still work just fine as long as you don't override the default and set all = TRUE
- if you do that, then you will get all the rows from df2
. The default is all = FALSE
and you will only get rows that appear in both data frames. Here, I set all.x = TRUE
to make sure you keep all rows in df1
even if they don't have matches in df2
.
Because merge
is more general (working for multiple columns, letting you specify whether you want to keep only rows that occur in df1
or only rows that occur in df2
or both or all), I think it is a better solution when working with data frames. match
is a great function when one (or both) of your inputs are plain vectors, not data frames.
## with dplyr
library(dplyr)
df1 |> select(-code) |>
left_join(select(df2, -id), by = "name")
# name nationality
# 1 Player 3 UK
# 2 Player 14 UK
# 3 Player 16 UK
# 4 Player 22 UK
# 5 Player 43 UK
# 6 Player 45 UK
## with base R
df1[["code"]] = NULL
merge(df1, df2[c("name", "nationality")], all.x = TRUE)
# name nationality
# 1 Player 14 UK
# 2 Player 16 UK
# 3 Player 22 UK
# 4 Player 3 UK
# 5 Player 43 UK
# 6 Player 45 UK
Unfortunately merge
doesn't keep the row order, but you can easily re-order after.
Using this sample data:
df1 = read.table(text = 'name|code
Player 3|NA
Player 14|NA
Player 16|NA
Player 22|NA
Player 43|NA
Player 45|NA', header = T, sep = "|")
df2 = read.table(text = 'name|id|nationality
Player 1|1|UK
Player 2|2|UK
Player 3|3|UK
Player 4|4|UK
Player 5|5|UK
Player 14|14|UK
Player 16|16|UK
Player 22|22|UK
Player 29|29|UK
Player 30|30|UK
Player 32|32|UK
Player 39|39|UK
Player 43|43|UK
Player 45|45|UK
', header = TRUE, sep = "|")
答案2
得分: 1
我认为这就是你需要的。match
返回第一个参数中的匹配项在第二个参数中的索引。
df1$code = df2$id[match(df1$name, df2$name)]
英文:
I think this is what you need. match
returns the index in the second argument of matches from the first.
df1$code = df2$id[match(df1$name, df2$name)]
答案3
得分: -1
df1$code = df2[df2$name %in% df1$name,2]
我猜这里的 %in% 几乎与 match 相同。
你也可以尝试 which(df2$name == df1$name)
来返回你想要的行。
英文:
my answer:
df1$code = df2[df2$name %in% df1$name,2]
I guess %in% here is almost the same with match.
you can also try which(df2$name == df1$name)
to return the rows you want
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论