英文:
R: How to subset dataframe over columns from 2 different dataframe?
问题
使用R代码如何从两个不同的数据框中对列进行子集化?
这是一个示例代码:
library(dplyr)
data <- data.frame(b = rep(LETTERS[1:4], 2), c = c("B", "A", "A", "E", "G", "H", "K", "L"))
data2 <- data.frame(d = c("A", "B", ""), e = c("E", "", "C"))
subset <- subset(data, data$b %in% c(data2$d, data2$e))
正如你所看到的,可以使用subset()
函数将"data"子集化为"data2"。但如果"data2"中有大量列,是否有简化这段代码的方法?如果可能的话,优先考虑tidyverse方法。
我尝试使用以下代码,但它不起作用:
subset_try <- subset(data, data$b %in% data2[, c(1:2)])
谢谢你。
(Note: The code in your question contains HTML-encoded characters, which I've decoded in the translation.)
英文:
How to subset dataframe over columns from 2 different dataframe using R code?
Here is the dummy code:
library(dplyr)
data <- data.frame(b = rep(LETTERS[1:4],2), c = c("B", "A", "A", "E", "G", "H", "K", "L"))
# b c
# 1 A B
# 2 B A
# 3 C A
# 4 D E
# 5 A G
# 6 B H
# 7 C K
# 8 D L
data2 <- data.frame(d = c("A", "B", ""), e = c("E", "", "C"))
# d e
#1 A E
#2 B
#3 C
subset <- subset(data, data$b %in% c(data2$d, data2$e))
# b c
# 1 A B
# 2 B A
# 3 C A
# 5 A G
# 6 B H
# 7 C K
As you can see, i can use subset() function to overlap "data" to "data2". But what if i have large number of columns in "data2"? is there a way to simplify this code? If possible tidyverse approach is preferred.
I tried to use below code, but its not working.
subset_try <- subset(data, data$b %in% data2[,c(1:2)])
#[1] b c
#<0 rows> (or 0-length row.names)
Thank you.
答案1
得分: 1
If there are lots of columns, unlist
into a vector
and subset
subset(data, b %in% unlist(data2))
If we want only a subset of columns, then select the subset of columns and unlist
- Note that data2[, c(1, 2)]
is still a data.frame with two columns and not a vector, thus when we do
> data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
It has to do with the table
argument with %in%
. According to ?%in%
x %in% table
table - vector or NULL: the values to be matched against. Long vectors are not supported.
Therefore, we may want to convert to a vector
> data$b %in% unlist(data2)
[1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
> subset(data, b %in% unlist(data2[1:2]))
For the tidyverse, it is just replacing the subset
with filter
library(dplyr)
filter(data, b %in% unlist(data2[1:2]))
英文:
If there are lots of columns, unlist
into a vector
and subset
subset(data, b %in% unlist(data2))
If we want only a subset of columns, then select the subset of columns and unlist
- Note that data2[, c(1, 2)]
is still a data.frame with two columns and not a vector, thus when we do
> data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
It has to do with the table
argument with %in%
. According to ?"%in%"
> x %in% table
> table - vector or NULL: the values to be matched against. Long vectors are not supported.
Therefore, we may want to convert to vector
> data$b %in% unlist(data2)
[1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
> subset(data, b %in% unlist(data2[1:2]))
For the tidyverse, it is just replacing the subset
with filter
library(dplyr)
filter(data, b %in% unlist(data2[1:2]))
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论