2023年5月17日 21:36:46go评论95阅读模式

英文:

How can I use a value extracted from a dataframe to specify columns to subset in R?

问题

Sure, here's the translated code snippet:

我有一个数据框，我想在函数内部对其进行子集化，以便只保留两列都为1或NA的行。对于df：
df <- data.frame(a = c(1,1,0,NA,0,1), 
                 b = c(0,1,0,1,0, NA),
                 c = c(0,0,0,0,0,0))
我想要的结果是：
   a  b  c
2  1  1  0
4 NA  1  0
6  1 NA  0
我遇到的问题是，我有许多列的名称会变化。因此，这个方法效果很好：
subset(df, (is.na(a) | a == 1) & (is.na(b) | b == 1))
但是，当列名'a'和'b'在函数操作过程中变为'd'和'f'时，这个方法就失效了。通过列索引指定的方法更加健壮：
subset(df, (is.na(df[,1]) | df[,1] == 1) & (is.na(df[,2]) | df[,2] == 1))
但这样做很麻烦，而且如果先前的处理步骤出错，导致列'c'在'a'或'b'之前，那么我可能会选择错误的列进行子集化。
我还有另一个指定要进行子集化的列名的数据框：
cro_df <- data.frame(pop = c('c1', 'c2'),
                     p1 = c('a', 'd'),
                     p2 = c('b', 'f'))
我想能够从该数据框中提取列名，以在我的子集化函数中使用，例如：
col1 <- cro_df[cro_df[, 'pop'] == 'c1', 'p1']
subset(df, is.na(col1) | col1 == 1)
这将返回一个空数据框。我已经尝试将col1转换为符号和因子，但没有成功：
subset(df, as.symbol(col1) == 1)
subset(df, sym(col1) == 1)
subset(df, as.factor(col1) == 1)
它们都返回：
[1] a b c
<0 rows> (or 0-length row.names)
是否有一种方法可以使用第二个数据框cro_df来指定要进行子集化的列？

英文:

I have a dataframe that I want to subset inside a function so that only rows where both columns are either 1 or NA remain. For df:

df &lt;- data.frame(a = c(1,1,0,NA,0,1), 
                 b = c(0,1,0,1,0, NA),
                 c = c(0,0,0,0,0,0))

I want:

The problem I'm having is I have many columns with names that change. So this works well:

subset(df, (is.na(a) | a == 1) &amp; (is.na(b) | b == 1))

but when column names 'a' and 'b' become 'd' and 'f' during the operation of the function it breaks. Specifying by column index works more robustly:

subset(df, (is.na(df[,1]) | df[,1] == 1) &amp; (is.na(df[,2]) | df[,2] == 1))

But is cumbersome, and if a previous processing step messes up and column 'c' ends up before 'a' or 'b' I end up subsetting by the wrong columns.

I also have another dataframe that specifies what the column names to subset by will be:

cro_df &lt;- data.frame(pop = c(&#39;c1&#39;, &#39;c2&#39;),
                     p1 = c(&#39;a&#39;, &#39;d&#39;),
                     p2 = c(&#39;b&#39;, &#39;f&#39;))
  pop p1 p2
1  c1  a  d
2  c2  b  f

I would like to be able to extract the column names from that dataframe to use in my subset function, e.g.:

col1 &lt;- cro_df[cro_df[,&#39;pop&#39;]==&#39;c1&#39;, &#39;p1&#39;]
subset(df, is.na(col1) | col1 == 1)

This returns an empty dataframe. I have tried turning col1 into a symbol and a factor with no success:

subset(df, as.symbol(col1) == 1)
subset(df, sym(col1) == 1)
subset(df, as.factor(col1) == 1)

And they all return:

[1] a b c
&lt;0 rows&gt; (or 0-length row.names)

Is there a way I can specify my columns to subset using the second dataframe cro_df?

答案1

得分: 1

你可以使用dplyr包中的filter和if_all。

按照你认为最适合的方式选择要筛选的列的名称。在我的案例中，我创建了一个变量cols，其中包含'a'和'b'。

然后，我检查cols中所有列名是否都满足条件，并使用filter筛选满足if_all语句为TRUE的行：

library(dplyr) # packageVersion("dplyr") >= 1.1.0
cols <- c('a', 'b')
filter(df, if_all(all_of(cols), \(x) is.na(x) | x == 1))
#>    a  b c
#> 1  1  1 0
#> 2 NA  1 0
#> 3  1 NA 0

如果你将不同的列名分配给cols，你可以重用相同的代码。

英文:

You can use filter and if_all from the dplyr package.

Select in the manner you find best suited for your case the names of the columns you want to filter. In my case I just created a variable cols that contains 'a' and 'b'.

Then I check all_of the column names in cols and filter the rows if_all statements are TRUE:

library(dplyr) # packageVersion(&quot;dplyr&quot;) &gt;= 1.1.0
cols &lt;- c(&#39;a&#39;, &#39;b&#39;)
filter(df, if_all(all_of(cols), \(x) is.na(x) | x == 1))
#&gt;    a  b c
#&gt; 1  1  1 0
#&gt; 2 NA  1 0
#&gt; 3  1 NA 0

If you assign different column names to cols you can reuse the same code.

答案2

得分: 1

以下是您请求的代码的中文翻译：

# 加载必要的包
library(dplyr)
library(purrr)
# 创建第一个数据框
df <- data.frame(a = c(1,1,0,NA,0,1), 
                 b = c(0,1,0,1,0, NA),
                 c = c(0,0,0,0,0,0))
# 添加第二个具有不同列名的数据框
df2 <- data.frame(d = c(1,1,0,NA,0,1), 
                  f = c(0,1,0,1,0, NA),
                  c = c(0,0,0,0,0,0))
# 使用dplyr::if_all()在dplyr::filter()中应用筛选条件
df |>
  filter(if_all(c(a, b), \(x) is.na(x) | x == 1))
# 输出：
#   a  b c
# 1  1  1 0
# 2 NA  1 0
# 3  1 NA 0
# 创建自定义函数以适应不同的列名
custom_filter <- 
  function(data, v1, v2) {
    filter(data,
           if_all(c({{v1}}, {{v2}}), \(x) is.na(x) | x == 1))
  }
# 示例如何使用自定义函数
custom_filter(df, a, b)
# 输出：
#   a  b c
# 1  1  1 0
# 2 NA  1 0
# 3  1 NA 0
custom_filter(df2, d, f)
# 输出：
#   d  f c
# 1  1  1 0
# 2 NA  1 0
# 3  1 NA 0
# 使用cro_df数据框和将所有数据框放入list()中，以便通过所有数据框并应用筛选条件的编程方式（purrr::map2()）。
cro_df <- data.frame(pop = c('c1', 'c2'),
                     p1 = c('a', 'd'),
                     p2 = c('b', 'f'))
cro_l <- 
  cro_df |>
  split(1:nrow(cro_df))
data_l <- list(df, df2)
map2(data_l,
     cro_l,
     \(x, y) custom_filter(
       x, y$p1, y$p2
     ))
# 输出：
# [[1]]
#   a  b c
# 1  1  1 0
# 2 NA  1 0
# 3  1 NA 0
# 
# [[2]]
#   d  f c
# 1  1  1 0
# 2 NA  1 0
# 3  1 NA 0

英文:

library(dplyr)
library(purrr)
  
df &lt;- data.frame(a = c(1,1,0,NA,0,1), 
                 b = c(0,1,0,1,0, NA),
                 c = c(0,0,0,0,0,0))

Let’s add a second data frame with different column names.

df2 &lt;- data.frame(d = c(1,1,0,NA,0,1), 
                 f = c(0,1,0,1,0, NA),
                 c = c(0,0,0,0,0,0))

We can use dplyr::if_all() in dplyr::filter() to apply the filter.

df |&gt; 
  filter(if_all(c(a, b), \(x) is.na(x) | x == 1))
#&gt;    a  b c
#&gt; 1  1  1 0
#&gt; 2 NA  1 0
#&gt; 3  1 NA 0

Using that idea we now write a custom function to accomodate for changing
column names.

custom_filter &lt;- 
  function(data, v1, v2) {
    filter(data,
           if_all(c({{v1}}, {{v2}}), \(x) is.na(x) | x == 1))
  }

Here is how that can work.

custom_filter(df, a, b)
#&gt;    a  b c
#&gt; 1  1  1 0
#&gt; 2 NA  1 0
#&gt; 3  1 NA 0
custom_filter(df2, d, f)
#&gt;    d  f c
#&gt; 1  1  1 0
#&gt; 2 NA  1 0
#&gt; 3  1 NA 0

Using your cro_df dataframe and by placing all dataframes in a list()
we can now programmatically (purrr::map2()) go through all of the
dataframes and apply the filter.

cro_df &lt;- data.frame(pop = c(&#39;c1&#39;, &#39;c2&#39;),
                     p1 = c(&#39;a&#39;, &#39;d&#39;),
                     p2 = c(&#39;b&#39;, &#39;f&#39;))
cro_l &lt;- 
  cro_df |&gt; 
  split(1:nrow(cro_df))
data_l &lt;- list(df, df2)
map2(data_l,
     cro_l,
     \(x, y) custom_filter(
       x, y$p1, y$p2
     ))
#&gt; [[1]]
#&gt;    a  b c
#&gt; 1  1  1 0
#&gt; 2 NA  1 0
#&gt; 3  1 NA 0
#&gt; 
#&gt; [[2]]
#&gt;    d  f c
#&gt; 1  1  1 0
#&gt; 2 NA  1 0
#&gt; 3  1 NA 0

答案3

得分: 0

以下是翻译好的内容：

也许这是一个不错的开始？

with(cro_df[cro_df$pop == "c1",],
  df[ (is.na(df[[p1]]) | df[[p1]] == 1) & (is.na(df[[p2]]) | df[[p2]] == 1), ]
)
#    a  b c
# 2  1  1 0
# 4 NA  1 0
# 6  1 NA 0

FYI，subset 用于交互式使用，其帮助页面指出：


这是一个方便的函数，用于交互式使用。
对于编程，最好使用标准的子集函数，如[,]，尤其是参数 'subset' 的非标准评估可能会导致意外后果。

英文:

Perhaps this is a good start?

with(cro_df[cro_df$pop == &quot;c1&quot;,],
  df[ (is.na(df[[p1]]) | df[[p1]] == 1) &amp; (is.na(df[[p2]]) | df[[p2]] == 1), ]
)
#    a  b c
# 2  1  1 0
# 4 NA  1 0
# 6  1 NA 0

FYI, subset is intended for interactive use, its help page says

Warning:
This is a convenience function intended for use interactively.
For programming it is better to use the standard subsetting
functions like [, and in particular the non-standard evaluation
of argument ‘subset’ can have unanticipated consequences.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你可以使用从数据框中提取的值来指定在R中要进行子集操作的列。

问题

答案1

答案2

答案3

根据另一张表中的两列选择R表中的行。

如何使用Purrr/reduce组合数据框对象

编写一个手动的BFS搜索算法

How can I extract a string from between last dash and second to last dash out of a column that contains lists of strings?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。