2023年7月6日 21:45:26go评论99阅读模式

英文:

Using match (not merge) to fill column values from another bigger data frame

问题

I have a dataframe to which I want to create a new column based on the values from another column but struggling to be able to match properly.

df1
name            code
Player 3        NA
Player 14       NA
Player 16       NA
Player 22       NA
Player 43       NA
Player 45       NA

现在我想从df2的code列中填充df1的code列，根据name进行匹配。

df2
name            id      nationality
Player 1        1       UK
Player 2        2       UK
Player 3        3       UK
Player 4        4       UK
Player 5        5       UK
Player 14       14      UK
Player 16       16      UK
Player 22       22      UK
Player 29       29      UK
Player 30       30      UK
Player 32       32      UK
Player 39       39      UK
Player 43       43      UK
Player 45       45      UK

我不想在这里使用合并（merge）操作，因为df2要比df1大得多，而且完全独立。可以尝试以下代码，但我无法正确运行：

df1$code = df2[match(df1$name, df2$name), 'id')

英文:

I have a dataframe to which I want to create a new column based on the values from another column but struggling to be able to match properly.

df1
name            code
Player 3        NA
Player 14       NA
Player 16       NA
Player 22       NA
Player 43       NA
Player 45       NA

Now I wish to fill the code column in df1 from the code column in df2 my matching on name

df2
name            id      nationality
Player 1        1       UK
Player 2        2       UK
Player 3        3       UK
Player 4        4       UK
Player 5        5       UK
Player 14       14      UK
Player 16       16      UK
Player 22       22      UK
Player 29       29      UK
Player 30       30      UK
Player 32       32      UK
Player 39       39      UK
Player 43       43      UK
Player 45       45      UK

I dont want to use merge here as df2 will be much bigger than df2 and completely separate, it would be something like; (but I cant get it correct)

df1$code = df2[match(df1$name, df2$name), &#39;id&#39;)

答案1

得分: 2

Here is the translated content:

"match works for this because you are only matching on one column. merge works for this too, but generalizes up to matching on multiple columns.

It doesn't matter that df2 is bigger than df1, merge() will still work just fine as long as you don't override the default and set all = TRUE - if you do that, then you will get all the rows from df2. The default is all = FALSE and you will only get rows that appear in both data frames. Here, I set all.x = TRUE to make sure you keep all rows in df1 even if they don't have matches in df2.

Because merge is more general (working for multiple columns, letting you specify whether you want to keep only rows that occur in df1 or only rows that occur in df2 or both or all), I think it is a better solution when working with data frames. match is a great function when one (or both) of your inputs are plain vectors, not data frames.

Unfortunately merge doesn't keep the row order, but you can easily re-order after."

英文:

match works for this because you are only matching on one column. merge works for this too, but generalizes up to matching on multiple columns.

## with dplyr
library(dplyr)
df1 |&gt; select(-code) |&gt;
  left_join(select(df2, -id), by = &quot;name&quot;)
#        name nationality
# 1  Player 3          UK
# 2 Player 14          UK
# 3 Player 16          UK
# 4 Player 22          UK
# 5 Player 43          UK
# 6 Player 45          UK

## with base R
df1[[&quot;code&quot;]] = NULL
merge(df1, df2[c(&quot;name&quot;, &quot;nationality&quot;)], all.x = TRUE)
#        name nationality
# 1 Player 14          UK
# 2 Player 16          UK
# 3 Player 22          UK
# 4  Player 3          UK
# 5 Player 43          UK
# 6 Player 45          UK

Unfortunately merge doesn't keep the row order, but you can easily re-order after.

Using this sample data:

df1 = read.table(text = &#39;name|code
Player 3|NA
Player 14|NA
Player 16|NA
Player 22|NA
Player 43|NA
Player 45|NA&#39;, header = T, sep = &quot;|&quot;)

df2 = read.table(text = &#39;name|id|nationality
Player 1|1|UK
Player 2|2|UK
Player 3|3|UK
Player 4|4|UK
Player 5|5|UK
Player 14|14|UK
Player 16|16|UK
Player 22|22|UK
Player 29|29|UK
Player 30|30|UK
Player 32|32|UK
Player 39|39|UK
Player 43|43|UK
Player 45|45|UK
&#39;, header = TRUE, sep = &quot;|&quot;)

答案2

得分: 1

我认为这就是你需要的。match 返回第一个参数中的匹配项在第二个参数中的索引。

df1$code = df2$id[match(df1$name, df2$name)]

英文:

I think this is what you need. match returns the index in the second argument of matches from the first.

df1$code = df2$id[match(df1$name, df2$name)]

答案3

得分: -1

df1$code = df2[df2$name %in% df1$name,2]
我猜这里的 %in% 几乎与 match 相同。
你也可以尝试 which(df2$name == df1$name) 来返回你想要的行。

英文:

my answer:
df1$code = df2[df2$name %in% df1$name,2]

I guess %in% here is almost the same with match.
you can also try which(df2$name == df1$name) to return the rows you want

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用 match（而不是 merge）来从另一个更大的数据框填充列值。

问题

答案1

答案2

答案3

Quarto扩展已安装，但在渲染过程中未找到。

如何使用dplyr从原始数据集中获取一组的最大值以及另一列的相应值。

R shiny：bslib与shinyBS之间的使用不兼容。

为什么Arima()和glm()函数的拟合结果不同？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论