你可以使用从数据框中提取的值来指定在R中要进行子集操作的列。

huangapple go评论95阅读模式
英文:

How can I use a value extracted from a dataframe to specify columns to subset in R?

问题

Sure, here's the translated code snippet:

  1. 我有一个数据框,我想在函数内部对其进行子集化,以便只保留两列都为1NA的行。对于df
  2. df <- data.frame(a = c(1,1,0,NA,0,1),
  3. b = c(0,1,0,1,0, NA),
  4. c = c(0,0,0,0,0,0))
  5. 我想要的结果是:
  6. a b c
  7. 2 1 1 0
  8. 4 NA 1 0
  9. 6 1 NA 0
  10. 我遇到的问题是,我有许多列的名称会变化。因此,这个方法效果很好:
  11. subset(df, (is.na(a) | a == 1) & (is.na(b) | b == 1))
  12. 但是,当列名'a''b'在函数操作过程中变为'd''f'时,这个方法就失效了。通过列索引指定的方法更加健壮:
  13. subset(df, (is.na(df[,1]) | df[,1] == 1) & (is.na(df[,2]) | df[,2] == 1))
  14. 但这样做很麻烦,而且如果先前的处理步骤出错,导致列'c''a''b'之前,那么我可能会选择错误的列进行子集化。
  15. 我还有另一个指定要进行子集化的列名的数据框:
  16. cro_df <- data.frame(pop = c('c1', 'c2'),
  17. p1 = c('a', 'd'),
  18. p2 = c('b', 'f'))
  19. 我想能够从该数据框中提取列名,以在我的子集化函数中使用,例如:
  20. col1 <- cro_df[cro_df[, 'pop'] == 'c1', 'p1']
  21. subset(df, is.na(col1) | col1 == 1)
  22. 这将返回一个空数据框。我已经尝试将col1转换为符号和因子,但没有成功:
  23. subset(df, as.symbol(col1) == 1)
  24. subset(df, sym(col1) == 1)
  25. subset(df, as.factor(col1) == 1)
  26. 它们都返回:
  27. [1] a b c
  28. <0 rows> (or 0-length row.names)
  29. 是否有一种方法可以使用第二个数据框cro_df来指定要进行子集化的列?
英文:

I have a dataframe that I want to subset inside a function so that only rows where both columns are either 1 or NA remain. For df:

  1. df &lt;- data.frame(a = c(1,1,0,NA,0,1),
  2. b = c(0,1,0,1,0, NA),
  3. c = c(0,0,0,0,0,0))

I want:

  1. a b c
  2. 2 1 1 0
  3. 4 NA 1 0
  4. 6 1 NA 0

The problem I'm having is I have many columns with names that change. So this works well:

  1. subset(df, (is.na(a) | a == 1) &amp; (is.na(b) | b == 1))

but when column names 'a' and 'b' become 'd' and 'f' during the operation of the function it breaks. Specifying by column index works more robustly:

  1. subset(df, (is.na(df[,1]) | df[,1] == 1) &amp; (is.na(df[,2]) | df[,2] == 1))

But is cumbersome, and if a previous processing step messes up and column 'c' ends up before 'a' or 'b' I end up subsetting by the wrong columns.

I also have another dataframe that specifies what the column names to subset by will be:

  1. cro_df &lt;- data.frame(pop = c(&#39;c1&#39;, &#39;c2&#39;),
  2. p1 = c(&#39;a&#39;, &#39;d&#39;),
  3. p2 = c(&#39;b&#39;, &#39;f&#39;))
  4. pop p1 p2
  5. 1 c1 a d
  6. 2 c2 b f

I would like to be able to extract the column names from that dataframe to use in my subset function, e.g.:

  1. col1 &lt;- cro_df[cro_df[,&#39;pop&#39;]==&#39;c1&#39;, &#39;p1&#39;]
  2. subset(df, is.na(col1) | col1 == 1)

This returns an empty dataframe. I have tried turning col1 into a symbol and a factor with no success:

  1. subset(df, as.symbol(col1) == 1)
  2. subset(df, sym(col1) == 1)
  3. subset(df, as.factor(col1) == 1)

And they all return:

  1. [1] a b c
  2. &lt;0 rows&gt; (or 0-length row.names)

Is there a way I can specify my columns to subset using the second dataframe cro_df?

答案1

得分: 1

你可以使用dplyr包中的filterif_all

按照你认为最适合的方式选择要筛选的列的名称。在我的案例中,我创建了一个变量cols,其中包含'a''b'

然后,我检查cols中所有列名是否都满足条件,并使用filter筛选满足if_all语句为TRUE的行:

  1. library(dplyr) # packageVersion("dplyr") >= 1.1.0
  2. cols <- c('a', 'b')
  3. filter(df, if_all(all_of(cols), \(x) is.na(x) | x == 1))
  4. #> a b c
  5. #> 1 1 1 0
  6. #> 2 NA 1 0
  7. #> 3 1 NA 0

如果你将不同的列名分配给cols,你可以重用相同的代码。

英文:

You can use filter and if_all from the dplyr package.

Select in the manner you find best suited for your case the names of the columns you want to filter. In my case I just created a variable cols that contains &#39;a&#39; and &#39;b&#39;.

Then I check all_of the column names in cols and filter the rows if_all statements are TRUE:

  1. library(dplyr) # packageVersion(&quot;dplyr&quot;) &gt;= 1.1.0
  2. cols &lt;- c(&#39;a&#39;, &#39;b&#39;)
  3. filter(df, if_all(all_of(cols), \(x) is.na(x) | x == 1))
  4. #&gt; a b c
  5. #&gt; 1 1 1 0
  6. #&gt; 2 NA 1 0
  7. #&gt; 3 1 NA 0

If you assign different column names to cols you can reuse the same code.

答案2

得分: 1

以下是您请求的代码的中文翻译:

  1. # 加载必要的包
  2. library(dplyr)
  3. library(purrr)
  4. # 创建第一个数据框
  5. df <- data.frame(a = c(1,1,0,NA,0,1),
  6. b = c(0,1,0,1,0, NA),
  7. c = c(0,0,0,0,0,0))
  8. # 添加第二个具有不同列名的数据框
  9. df2 <- data.frame(d = c(1,1,0,NA,0,1),
  10. f = c(0,1,0,1,0, NA),
  11. c = c(0,0,0,0,0,0))
  12. # 使用dplyr::if_all()在dplyr::filter()中应用筛选条件
  13. df |>
  14. filter(if_all(c(a, b), \(x) is.na(x) | x == 1))
  15. # 输出:
  16. # a b c
  17. # 1 1 1 0
  18. # 2 NA 1 0
  19. # 3 1 NA 0
  20. # 创建自定义函数以适应不同的列名
  21. custom_filter <-
  22. function(data, v1, v2) {
  23. filter(data,
  24. if_all(c({{v1}}, {{v2}}), \(x) is.na(x) | x == 1))
  25. }
  26. # 示例如何使用自定义函数
  27. custom_filter(df, a, b)
  28. # 输出:
  29. # a b c
  30. # 1 1 1 0
  31. # 2 NA 1 0
  32. # 3 1 NA 0
  33. custom_filter(df2, d, f)
  34. # 输出:
  35. # d f c
  36. # 1 1 1 0
  37. # 2 NA 1 0
  38. # 3 1 NA 0
  39. # 使用cro_df数据框和将所有数据框放入list()中,以便通过所有数据框并应用筛选条件的编程方式(purrr::map2())。
  40. cro_df <- data.frame(pop = c('c1', 'c2'),
  41. p1 = c('a', 'd'),
  42. p2 = c('b', 'f'))
  43. cro_l <-
  44. cro_df |>
  45. split(1:nrow(cro_df))
  46. data_l <- list(df, df2)
  47. map2(data_l,
  48. cro_l,
  49. \(x, y) custom_filter(
  50. x, y$p1, y$p2
  51. ))
  52. # 输出:
  53. # [[1]]
  54. # a b c
  55. # 1 1 1 0
  56. # 2 NA 1 0
  57. # 3 1 NA 0
  58. #
  59. # [[2]]
  60. # d f c
  61. # 1 1 1 0
  62. # 2 NA 1 0
  63. # 3 1 NA 0
英文:
  1. library(dplyr)
  2. library(purrr)
  3. df &lt;- data.frame(a = c(1,1,0,NA,0,1),
  4. b = c(0,1,0,1,0, NA),
  5. c = c(0,0,0,0,0,0))

Let’s add a second data frame with different column names.

  1. df2 &lt;- data.frame(d = c(1,1,0,NA,0,1),
  2. f = c(0,1,0,1,0, NA),
  3. c = c(0,0,0,0,0,0))

We can use dplyr::if_all() in dplyr::filter() to apply the filter.

  1. df |&gt;
  2. filter(if_all(c(a, b), \(x) is.na(x) | x == 1))
  3. #&gt; a b c
  4. #&gt; 1 1 1 0
  5. #&gt; 2 NA 1 0
  6. #&gt; 3 1 NA 0

Using that idea we now write a custom function to accomodate for changing
column names.

  1. custom_filter &lt;-
  2. function(data, v1, v2) {
  3. filter(data,
  4. if_all(c({{v1}}, {{v2}}), \(x) is.na(x) | x == 1))
  5. }

Here is how that can work.

  1. custom_filter(df, a, b)
  2. #&gt; a b c
  3. #&gt; 1 1 1 0
  4. #&gt; 2 NA 1 0
  5. #&gt; 3 1 NA 0
  6. custom_filter(df2, d, f)
  7. #&gt; d f c
  8. #&gt; 1 1 1 0
  9. #&gt; 2 NA 1 0
  10. #&gt; 3 1 NA 0

Using your cro_df dataframe and by placing all dataframes in a list()
we can now programmatically (purrr::map2()) go through all of the
dataframes and apply the filter.

  1. cro_df &lt;- data.frame(pop = c(&#39;c1&#39;, &#39;c2&#39;),
  2. p1 = c(&#39;a&#39;, &#39;d&#39;),
  3. p2 = c(&#39;b&#39;, &#39;f&#39;))
  4. cro_l &lt;-
  5. cro_df |&gt;
  6. split(1:nrow(cro_df))
  7. data_l &lt;- list(df, df2)
  8. map2(data_l,
  9. cro_l,
  10. \(x, y) custom_filter(
  11. x, y$p1, y$p2
  12. ))
  13. #&gt; [[1]]
  14. #&gt; a b c
  15. #&gt; 1 1 1 0
  16. #&gt; 2 NA 1 0
  17. #&gt; 3 1 NA 0
  18. #&gt;
  19. #&gt; [[2]]
  20. #&gt; d f c
  21. #&gt; 1 1 1 0
  22. #&gt; 2 NA 1 0
  23. #&gt; 3 1 NA 0

答案3

得分: 0

以下是翻译好的内容:

也许这是一个不错的开始?

  1. with(cro_df[cro_df$pop == "c1",],
  2. df[ (is.na(df[[p1]]) | df[[p1]] == 1) & (is.na(df[[p2]]) | df[[p2]] == 1), ]
  3. )
  4. # a b c
  5. # 2 1 1 0
  6. # 4 NA 1 0
  7. # 6 1 NA 0

FYI,subset 用于交互式使用,其帮助页面指出:

  1. 这是一个方便的函数,用于交互式使用。
  2. 对于编程,最好使用标准的子集函数,如[,],尤其是参数 'subset' 的非标准评估可能会导致意外后果。
英文:

Perhaps this is a good start?

  1. with(cro_df[cro_df$pop == &quot;c1&quot;,],
  2. df[ (is.na(df[[p1]]) | df[[p1]] == 1) &amp; (is.na(df[[p2]]) | df[[p2]] == 1), ]
  3. )
  4. # a b c
  5. # 2 1 1 0
  6. # 4 NA 1 0
  7. # 6 1 NA 0

FYI, subset is intended for interactive use, its help page says

  1. Warning:
  2. This is a convenience function intended for use interactively.
  3. For programming it is better to use the standard subsetting
  4. functions like [, and in particular the non-standard evaluation
  5. of argument subset can have unanticipated consequences.

huangapple
  • 本文由 发表于 2023年5月17日 21:36:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76272728.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定