如何从两个不同的数据框中对列进行子集化?

huangapple go评论61阅读模式
英文:

R: How to subset dataframe over columns from 2 different dataframe?

问题

使用R代码如何从两个不同的数据框中对列进行子集化?

这是一个示例代码:

library(dplyr)
data <- data.frame(b = rep(LETTERS[1:4], 2), c = c("B", "A", "A", "E", "G", "H", "K", "L"))

data2 <- data.frame(d = c("A", "B", ""), e = c("E", "", "C"))

subset <- subset(data, data$b %in% c(data2$d, data2$e))

正如你所看到的,可以使用subset()函数将"data"子集化为"data2"。但如果"data2"中有大量列,是否有简化这段代码的方法?如果可能的话,优先考虑tidyverse方法。

我尝试使用以下代码,但它不起作用:

subset_try <- subset(data, data$b %in% data2[, c(1:2)])

谢谢你。

(Note: The code in your question contains HTML-encoded characters, which I've decoded in the translation.)

英文:

How to subset dataframe over columns from 2 different dataframe using R code?

Here is the dummy code:

library(dplyr)
data &lt;- data.frame(b = rep(LETTERS[1:4],2), c = c(&quot;B&quot;, &quot;A&quot;, &quot;A&quot;, &quot;E&quot;, &quot;G&quot;, &quot;H&quot;, &quot;K&quot;, &quot;L&quot;))

#   b c
# 1 A B
# 2 B A
# 3 C A
# 4 D E
# 5 A G
# 6 B H
# 7 C K
# 8 D L


data2 &lt;- data.frame(d = c(&quot;A&quot;, &quot;B&quot;, &quot;&quot;), e = c(&quot;E&quot;, &quot;&quot;, &quot;C&quot;))
#    d e
#1   A E
#2   B  
#3     C


subset &lt;- subset(data, data$b %in%  c(data2$d, data2$e))

#   b c
# 1 A B
# 2 B A
# 3 C A
# 5 A G
# 6 B H
# 7 C K

As you can see, i can use subset() function to overlap "data" to "data2". But what if i have large number of columns in "data2"? is there a way to simplify this code? If possible tidyverse approach is preferred.

I tried to use below code, but its not working.

subset_try &lt;- subset(data, data$b %in%  data2[,c(1:2)])
#[1] b c
#&lt;0 rows&gt; (or 0-length row.names)

Thank you.

答案1

得分: 1

If there are lots of columns, unlist into a vector and subset

subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist - Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

> data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?%in%

x %in% table

table - vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to a vector

> data$b %in% unlist(data2)
[1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
> subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

library(dplyr)
filter(data, b %in% unlist(data2[1:2]))
英文:

If there are lots of columns, unlist into a vector and subset

subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist - Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

&gt; data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?&quot;%in%&quot;

> x %in% table

> table - vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to vector

&gt; data$b %in% unlist(data2)
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
&gt; subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

library(dplyr)
filter(data, b %in% unlist(data2[1:2]))


</details>



huangapple
  • 本文由 发表于 2023年4月4日 10:05:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924974.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定