2023年4月4日 10:05:24go评论96阅读模式

英文:

R: How to subset dataframe over columns from 2 different dataframe?

问题

使用R代码如何从两个不同的数据框中对列进行子集化？

这是一个示例代码：

library(dplyr)
data <- data.frame(b = rep(LETTERS[1:4], 2), c = c("B", "A", "A", "E", "G", "H", "K", "L"))
data2 <- data.frame(d = c("A", "B", ""), e = c("E", "", "C"))
subset <- subset(data, data$b %in% c(data2$d, data2$e))

正如你所看到的，可以使用subset()函数将"data"子集化为"data2"。但如果"data2"中有大量列，是否有简化这段代码的方法？如果可能的话，优先考虑tidyverse方法。

我尝试使用以下代码，但它不起作用：

subset_try <- subset(data, data$b %in% data2[, c(1:2)])

谢谢你。

(Note: The code in your question contains HTML-encoded characters, which I've decoded in the translation.)

英文:

How to subset dataframe over columns from 2 different dataframe using R code?

Here is the dummy code:

library(dplyr)
data &lt;- data.frame(b = rep(LETTERS[1:4],2), c = c(&quot;B&quot;, &quot;A&quot;, &quot;A&quot;, &quot;E&quot;, &quot;G&quot;, &quot;H&quot;, &quot;K&quot;, &quot;L&quot;))
#   b c
# 1 A B
# 2 B A
# 3 C A
# 4 D E
# 5 A G
# 6 B H
# 7 C K
# 8 D L
data2 &lt;- data.frame(d = c(&quot;A&quot;, &quot;B&quot;, &quot;&quot;), e = c(&quot;E&quot;, &quot;&quot;, &quot;C&quot;))
#    d e
#1   A E
#2   B  
#3     C
subset &lt;- subset(data, data$b %in%  c(data2$d, data2$e))
#   b c
# 1 A B
# 2 B A
# 3 C A
# 5 A G
# 6 B H
# 7 C K

As you can see, i can use subset() function to overlap "data" to "data2". But what if i have large number of columns in "data2"? is there a way to simplify this code? If possible tidyverse approach is preferred.

I tried to use below code, but its not working.

subset_try &lt;- subset(data, data$b %in%  data2[,c(1:2)])
#[1] b c
#&lt;0 rows&gt; (or 0-length row.names)

Thank you.

答案1

得分: 1

If there are lots of columns, unlist into a vector and subset

subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist - Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

> data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?%in%

x %in% table

table - vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to a vector

> data$b %in% unlist(data2)
[1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
> subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

library(dplyr)
filter(data, b %in% unlist(data2[1:2]))

英文:

If there are lots of columns, unlist into a vector and subset

subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist - Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

&gt; data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?"%in%"

> x %in% table

> table - vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to vector

&gt; data$b %in% unlist(data2)
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
&gt; subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

library(dplyr)
filter(data, b %in% unlist(data2[1:2]))
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从两个不同的数据框中对列进行子集化？

问题

答案1

添加顶点（’type’）参数到igraph对象列表中

根据最新行条件筛选表格

ACS/Census API中的未知变量

如何让弧度控制台停止监听渲染输出？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。