如何从两个不同的数据框中对列进行子集化?

huangapple go评论96阅读模式
英文:

R: How to subset dataframe over columns from 2 different dataframe?

问题

使用R代码如何从两个不同的数据框中对列进行子集化?

这是一个示例代码:

  1. library(dplyr)
  2. data <- data.frame(b = rep(LETTERS[1:4], 2), c = c("B", "A", "A", "E", "G", "H", "K", "L"))
  3. data2 <- data.frame(d = c("A", "B", ""), e = c("E", "", "C"))
  4. subset <- subset(data, data$b %in% c(data2$d, data2$e))

正如你所看到的,可以使用subset()函数将"data"子集化为"data2"。但如果"data2"中有大量列,是否有简化这段代码的方法?如果可能的话,优先考虑tidyverse方法。

我尝试使用以下代码,但它不起作用:

  1. subset_try <- subset(data, data$b %in% data2[, c(1:2)])

谢谢你。

(Note: The code in your question contains HTML-encoded characters, which I've decoded in the translation.)

英文:

How to subset dataframe over columns from 2 different dataframe using R code?

Here is the dummy code:

  1. library(dplyr)
  2. data &lt;- data.frame(b = rep(LETTERS[1:4],2), c = c(&quot;B&quot;, &quot;A&quot;, &quot;A&quot;, &quot;E&quot;, &quot;G&quot;, &quot;H&quot;, &quot;K&quot;, &quot;L&quot;))
  3. # b c
  4. # 1 A B
  5. # 2 B A
  6. # 3 C A
  7. # 4 D E
  8. # 5 A G
  9. # 6 B H
  10. # 7 C K
  11. # 8 D L
  12. data2 &lt;- data.frame(d = c(&quot;A&quot;, &quot;B&quot;, &quot;&quot;), e = c(&quot;E&quot;, &quot;&quot;, &quot;C&quot;))
  13. # d e
  14. #1 A E
  15. #2 B
  16. #3 C
  17. subset &lt;- subset(data, data$b %in% c(data2$d, data2$e))
  18. # b c
  19. # 1 A B
  20. # 2 B A
  21. # 3 C A
  22. # 5 A G
  23. # 6 B H
  24. # 7 C K

As you can see, i can use subset() function to overlap "data" to "data2". But what if i have large number of columns in "data2"? is there a way to simplify this code? If possible tidyverse approach is preferred.

I tried to use below code, but its not working.

  1. subset_try &lt;- subset(data, data$b %in% data2[,c(1:2)])
  2. #[1] b c
  3. #&lt;0 rows&gt; (or 0-length row.names)

Thank you.

答案1

得分: 1

If there are lots of columns, unlist into a vector and subset

  1. subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist - Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

  1. > data$b %in% data2
  2. [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?%in%

x %in% table

table - vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to a vector

  1. > data$b %in% unlist(data2)
  2. [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
  3. > subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

  1. library(dplyr)
  2. filter(data, b %in% unlist(data2[1:2]))
英文:

If there are lots of columns, unlist into a vector and subset

  1. subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist - Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

  1. &gt; data$b %in% data2
  2. [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?&quot;%in%&quot;

> x %in% table

> table - vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to vector

  1. &gt; data$b %in% unlist(data2)
  2. [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE FALSE
  3. &gt; subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

  1. library(dplyr)
  2. filter(data, b %in% unlist(data2[1:2]))
  3. </details>

huangapple
  • 本文由 发表于 2023年4月4日 10:05:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924974.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定