2023年3月4日 06:31:17go评论95阅读模式

英文:

Filter tibble in R when column names (to be filtered) and values are in vectors?

问题

这可能是一个晦涩的问题或用例，但是否有一种快速的方法可以在列名和值都在向量内的情况下筛选一个tibble？

比方说，我想在mtcars中筛选mpg和hp。我可以这样做：

filter(mtcars, mpg >= 15 & hp >= 100)

但相反，假设我有几个筛选案例，其中要筛选的列在一个向量中，而值在另一个向量中。在实际应用中，我可能在更大的数据框中有四到五个这样的案例。

car_stat <- c('mpg', 'hp')
car_value <- c(15, 100)

显然，这样不起作用。

filter(mtcars, car_stat >= car_value)

但是否有一种简洁的dplyr/tidyverse方式可以使用向量进行筛选，或者我必须使用循环将其拆分为长度为1的单独向量？

英文:

This might be an esoteric question or use-case, but is there a quick way to filter a tibble when the column names and values are inside vectors?

Say I want to filter mpg and hp in mtcars. I could do something like:

filter(mtcars, mpg &gt;= 15 &amp; hp &gt;= 100)

But instead, say I have several filtering cases -- with the columns to be filtered in one vector and the values in another. (In practice, I might have four or five cases in a larger df.)

car_stat &lt;- c(&#39;mpg&#39;, &#39;hp&#39;)
car_value &lt;- c(15, 100)

Obviously this doesn't work.

filter(mtcars, car_stat &gt;= car_value)

But is there some succinct dplyr/tidyverse way to filter with vectors, or am I resigned to using some loop to break it up into separate vectors, each of length one?

答案1

得分: 5

使用您的变量和数值，您可以将它们转化为过滤表达式。在这里，我们使用基本的R Map 和 bquote 函数。

car_stat <- c('mpg', 'hp')
car_value <- c(15, 100)
criteria <- unname(Map(function(c, v) bquote(.(as.name(c)) >= .(v)), car_stat, car_value))
criteria
# [[1]]
# mpg >= 15
# 
# [[2]]
# hp >= 100

这将创建一个表达式的列表，用于过滤。然后，您可以使用 !!! 将它们传递给 filter 函数。

dplyr::filter(mtcars, !!!criteria)
#                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
# ...

以上是您要求的翻译。

英文:

Using your variables and values, you can turn those into filtering expressions. Here we use the base R Map and bquote functions

car_stat &lt;- c(&#39;mpg&#39;, &#39;hp&#39;)
car_value &lt;- c(15, 100)
criteria &lt;- unname(Map(function(c, v) bquote(.(as.name(c))&gt;=.(v)), car_stat, car_value))
criteria
# [[1]]
# mpg &gt;= 15
# 
# [[2]]
# hp &gt;= 100

This creates a list of expressions that you want for your filter. Then you can inejct them to filter with !!!

dplyr::filter(mtcars, !!!criteria)
#                    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
# Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
# Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
# Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
# Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
# ...

答案2

得分: 1

以下是翻译的代码部分：

这里是另一种方法，利用了data.table 1.14.9的`env`参数。

library(data.table)
cars = setDT(copy(mtcars))
do.call(
  fintersect,
  lapply(1:2, \(i) cars[k>=z, env = list(k=car_stat[i], z =car_value[i])])
)

输出：

     mpg cyl  disp  hp drat    wt  qsec vs am gear carb id
 1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  1
 2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  2
 3: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  4
 4: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  5
 5: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  6
 6: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 10
 7: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 11
 8: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 12
 9: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 13
10: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3 14
11: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 22
12: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2 23
13: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 25
14: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 28
15: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 29
16: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 30
17: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 31
18: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 32

英文:

Here is another approach that leverages the env parameter of data.table 1.14.9

library(data.table)
cars = setDT(copy(mtcars))
do.call(
fintersect,
lapply(1:2, \(i) cars[k&gt;=z, env = list(k=car_stat[i], z =car_value[i])])
)

Output:

     mpg cyl  disp  hp drat    wt  qsec vs am gear carb id
1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4  1
2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  2
3: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  4
4: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2  5
5: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1  6
6: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4 10
7: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4 11
8: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3 12
9: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3 13
10: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3 14
11: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2 22
12: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2 23
13: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2 25
14: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 28
15: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4 29
16: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6 30
17: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8 31
18: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2 32

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Filter tibble in R when column names (to be filtered) and values are in vectors?

问题

答案1

答案2

在特定试验中进行平均，有一些重叠的标签。

在一个字符串中获取字符索引匹配并应用到另一个字符串。

逐行从均匀分布中抽样

解析函数输入名称作为输出名称。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。