2023年5月24日 21:43:46go评论88阅读模式

英文:

R function to summarise using dplyr group_by with flexibble groups, including no grouping at all

问题

我想编写一个R函数，使用dplyr来总结一个数据集，该函数接受不同数量的分组变量作为group_by语句的一部分，包括根本不分组。我找到了类似问题的答案，它们使用了'group_by_'，但这已经被弃用（写作时的dplyr版本为1.1.2）。

我尝试过使用不同的方法将向group_by语句传递向量，试图使用整洁评估，但没有一个达到预期的效果，而且在不需要分组时无法返回答案。

以下是一个使用星球大战数据集的可重现示例的基础。该函数应能够返回各种生物的体重指数（BMI）的摘要表。

```r
`star_wars_BMI <- function(group_vec) {
  df_out <- starwars %>%
    mutate(BMI = height/mass^2) %>%
    group_by(group_vec) %>%
    summarise(height_mean = mean(height, na.rm = T),
              mass_mean = mean(mass, na.rm = T),
              BMI_mean = mean(BMI, na.rm = T))
  return(df_out)
}

group_vector0 <- c()  # 即整个星系的摘要
group_vector1 <- c("homeworld")  # 按故乡星球总结
group_vector2 <- c("homeworld", "species") # 在每个故乡星球上按物种总结

galaxy_BMI <- star_wars_BMI(group_vec = group_vector0)
homeworld_BMI <- star_wars_BMI(group_vec = group_vector1)
`

我知道为无组或某些组单独编写函数是一个相对简单的任务，但我想看看是否可能只使用一个函数来完成这个任务。

关于整洁评估原理的解释将非常感激，如果能提供一个示例来继续绘制摘要，那将更好。


<details>
<summary>英文:</summary>

I want to write an R function using dplyr to summarise a data set that accepts different numbers of grouping variables to the group_by statement - including no grouping at all.  I have found answers to similar questions that use &#39;group_by_&#39;, but this has been deprecated (dplyr vrsion at time of writing is 1.1.2).  

I have used different methods of passing vectors to the group_by statements attempting to use tidy evaluation, but none have worked as expected and failed to return an answer when no grouping is required.  

Here&#39;s the basis for a reproduceable example using the starwars dataset.  The function should be capable of returning summary tables of the Body-Mass Indexes (BMI) of the various creatures.

`star_wars_BMI <- function(group_vec) {
df_out <- starwars %>%
mutate (BMI = height/mass^2) %>%
group_by(group_vec) %>%
summarise(height_mean = mean(height, na.rm = T),
mass_mean = mean(mass, na.rm = T),
BMI_mean = mean(BMI, na.rm = T))
return(df_out)
}

group_vector0 <- c() # ie. summarise for the whole galaxy
group_vector1 <- c("homeworld") # summarise by homeworld planet
group_vector2 <- c("homeworld", "species") = summarise by species on each homeworld

galaxy_BMI <- star_wars_BMI(group_vec = group_vector0)
homeworld_BMI <- star_wars_BMI(group_vec = group_vector1)
`


I know it&#39;s a relatively simple task to produce separate functions for either no or some groups, but I would like to see if it is possible to do this with just one.  

An explanation of the tidy evalation rationale would be very much appreciated - as would an example that went on to plot the summaries.

</details>


# 答案1
**得分**: 3

这是另一种选项，使用省略号或 `...` 作为传递给 `group_by` 函数的参数列名。现在我们传递的不是向量，而是列名：

`rlang::ensyms(...)` 将列名存储为符号，然后 `!!!` 在 `group_by` 函数中取消引用它们：

```R
library(dplyr)

star_wars_BMI <- function(...) {
  
  group_vec <- rlang::ensyms(...)
  
  df_out <- starwars %>%
    mutate (BMI = height/mass^2) %>%
    group_by(!!!group_vec) %>%
    summarise(height_mean = mean(height, na.rm = TRUE),
              mass_mean = mean(mass, na.rm = TRUE),
              BMI_mean = mean(BMI, na.rm = TRUE))
  
  return(df_out)
}

star_wars_BMI() 结果输出：

height_mean mass_mean BMI_mean
        <dbl>     <dbl>    <dbl>
1        174.      97.3   0.0481

star_wars_BMI("homeworld") 结果输出：

# A tibble: 49 × 4
   homeworld      height_mean mass_mean BMI_mean
   <chr>                <dbl>     <dbl>    <dbl>
 1 Alderaan              176.      64     0.0463
 2 Aleen Minor            79       15     0.351 
 3 Bespin                175       79     0.0280
 4 Bestine IV            180      110     0.0149
 5 Cato Neimoidia        191       90     0.0236
 6 Cerea                 198       82     0.0294
 7 Champala              196      NaN   NaN     
 8 Chandrila             150      NaN   NaN     
 9 Concord Dawn          183       79     0.0293
10 Corellia              175       78.5   0.0284
# ... with 39 more rows
# ℹ Use `print(n = ...)` to see more rows

star_wars_BMI("homeworld", "species") 结果输出：

`summarise()` has grouped output by 'homeworld'. You can override using the
`.groups` argument.
# A tibble: 58 × 5
# Groups:   homeworld [49]
   homeworld      species   height_mean mass_mean BMI_mean
   <chr>          <chr>           <dbl>     <dbl>    <dbl>
 1 Alderaan       Human            176.      64     0.0463
 2 Aleen Minor    Aleena            79       15     0.351 
 3 Bespin         Human            175       79     0.0280
 4 Bestine IV     Human            180      110     0.0149
 5 Cato Neimoidia Neimodian        191       90     0.0236
 6 Cerea          Cerean           198       82     0.0294
 7 Champala       Chagrian         196      NaN   NaN     
 8 Chandrila      Human            150      NaN   NaN     
 9 Concord Dawn   Human            183       79     0.0293
10 Corellia       Human            175       78.5   0.0284
# ... with 48 more rows
# ℹ Use `print(n = ...)` to see more rows

英文:

Here is another option using the ellipsis or ... as argument to column names for group_by. Now we pass not a vector but the column names instead:

The rlang::ensyms(...) stores the column names as symbols, then !!!` unquotes them in the group_by function:

library(dplyr)

star_wars_BMI &lt;- function(...) {
  
  group_vec &lt;- rlang::ensyms(...)
  
  df_out &lt;- starwars %&gt;% 
    mutate (BMI = height/mass^2) %&gt;% 
    group_by(!!!group_vec) %&gt;% 
    summarise(height_mean = mean(height, na.rm = TRUE),
              mass_mean = mean(mass, na.rm = TRUE),
              BMI_mean = mean(BMI, na.rm = TRUE))
  
  return(df_out)
}


star_wars_BMI()
star_wars_BMI(&quot;homeworld&quot;)
star_wars_BMI(&quot;homeworld&quot;, &quot;species&quot;)

output:

height_mean mass_mean BMI_mean
        &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
1        174.      97.3   0.0481
&gt; star_wars_BMI(&quot;homeworld&quot;)
# A tibble: 49 &#215; 4
   homeworld      height_mean mass_mean BMI_mean
   &lt;chr&gt;                &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
 1 Alderaan              176.      64     0.0463
 2 Aleen Minor            79       15     0.351 
 3 Bespin                175       79     0.0280
 4 Bestine IV            180      110     0.0149
 5 Cato Neimoidia        191       90     0.0236
 6 Cerea                 198       82     0.0294
 7 Champala              196      NaN   NaN     
 8 Chandrila             150      NaN   NaN     
 9 Concord Dawn          183       79     0.0293
10 Corellia              175       78.5   0.0284
# … with 39 more rows
# ℹ Use `print(n = ...)` to see more rows
&gt; star_wars_BMI(&quot;homeworld&quot;, &quot;species&quot;)
`summarise()` has grouped output by &#39;homeworld&#39;. You can override using the
`.groups` argument.
# A tibble: 58 &#215; 5
# Groups:   homeworld [49]
   homeworld      species   height_mean mass_mean BMI_mean
   &lt;chr&gt;          &lt;chr&gt;           &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt;
 1 Alderaan       Human            176.      64     0.0463
 2 Aleen Minor    Aleena            79       15     0.351 
 3 Bespin         Human            175       79     0.0280
 4 Bestine IV     Human            180      110     0.0149
 5 Cato Neimoidia Neimodian        191       90     0.0236
 6 Cerea          Cerean           198       82     0.0294
 7 Champala       Chagrian         196      NaN   NaN     
 8 Chandrila      Human            150      NaN   NaN     
 9 Concord Dawn   Human            183       79     0.0293
10 Corellia       Human            175       78.5   0.0284
# … with 48 more rows
# ℹ Use `print(n = ...)` to see more rows
&gt;

答案2

得分: 2

希望你一切都好。

我相信你可以使用across。

例如：

star_wars_BMI <- function(group_vec) {
  df_out <- starwars %>%
    mutate(BMI = height/mass^2) %>%
    group_by(across(group_vec)) %>%
    summarise(height_mean = mean(height, na.rm = T),
              mass_mean = mean(mass, na.rm = T),
              BMI_mean = mean(BMI, na.rm = T))
  return(df_out)
}

group_vector0 <- c()  # 即整个星系的总结
group_vector1 <- c("homeworld")  # 按母星总结
group_vector2 <- c("homeworld", "species") # 在每个母星上按物种总结

star_wars_BMI(group_vec = group_vector0)
star_wars_BMI(group_vec = group_vector1)
star_wars_BMI(group_vec = group_vector2)

英文:

hope you are doing well

I believe you can use the across

Like:

star_wars_BMI &lt;- function(group_vec) {
  df_out &lt;- starwars %&gt;% 
    mutate (BMI = height/mass^2) %&gt;% 
    group_by(across(group_vec)) %&gt;% 
    summarise(height_mean = mean(height, na.rm = T),
              mass_mean = mean(mass, na.rm = T),
              BMI_mean = mean(BMI, na.rm = T))
  return(df_out)
}

group_vector0 &lt;- c()  # ie. summarise for the whole galaxy
group_vector1 &lt;- c(&quot;homeworld&quot;)  # summarise by homeworld planet
group_vector2 &lt;- c(&quot;homeworld&quot;, &quot;species&quot;) # summarise by species on each homeworld


star_wars_BMI(group_vec = group_vector0)
star_wars_BMI(group_vec = group_vector1)
star_wars_BMI(group_vec = group_vector2)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

R函数用于使用dplyr的group_by和灵活的组进行总结，包括完全没有分组。

问题

答案2

HTML/XML: 理解”滚动条”的工作方式

‘yaml_body’ 不是从 ‘namespace:xfun’ 导出的对象。

如何删除数据框中包含在另一个字符串中已经包含的子字符串的行？

R函数用于将单元格中的逗号分隔值转换为具有相同行名称的多行数据。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论