2023年2月16日 04:08:05go评论102阅读模式

英文:

How to detect if data.frame is grouped by dplyr from subfunction?

问题

我有一个R包，其中一些函数通常设计为在dplyr函数mutate或summarize内部调用。
newdata <- dplyr::mutate(group_by(olddata, col1), newcol = myfunc(col1))
然而，有时用户可能会忘记在将数据放入mutate或summarize调用之前对其进行分组。
newdata <- dplyr::mutate(olddata, newcol = myfunc(col1))
当数据框没有首先分组时，包函数将产生大部分毫无意义的结果。但是，不会有明显的错误或警告，这可能会让用户对问题的原因感到不确定。
我想在myfunc代码内部添加一个Warning()，当myfunc检测到输入数据不来自分组的数据框时。然而，我无法弄清楚myfunc如何检测数据是否来自分组的数据框。似乎mutate只传递一个向量给myfunc，所以dplyr::is.grouped_df和inherits(x, "grouped_df")都返回false。
我想要的是：

myfunc <- function(x) {
if (comes.from.grouped.df) {
print("grouped")
} else {
print("ungrouped")
}
}

mutate(olddata, newcol = myfunc(col1))
'ungrouped'

mutate(group_by(olddata, col1), newcol = myfunc(col1))
'grouped'
'grouped'
'grouped'


<details>
<summary>英文:</summary>
I have an R package where some functions are designed to be typically called within dplyr functions mutate or summarize.
    newdata &lt;- dplyr::mutate(group_by(olddata, col1), newcol = myfunc(col1))
However, sometimes users might forget to group their data before putting it into the mutate or summarize call. 
    newdata &lt;- dplyr::mutate(olddata, newcol = myfunc(col1))
When the data frame is not grouped first, the package functions will produce largely nonsensical results. However, there won&#39;t be any errors or warnings per se, which could leave users uncertain about the cause of the issue. 
I&#39;d like to add a `Warning()` within the `myfunc` code when `myfunc` detects that the input data isn&#39;t coming from a grouped `data.frame`. However, I can&#39;t figure out how `myfunc` could detect if the data is coming from a grouped `data.frame`. It appears that `mutate` only passes a vector to `myfunc`, so both `dplyr::is.grouped_df` and `inherits(x, &quot;grouped_df&quot;)` return false.
What I would like:

myfunc <- function(x) {if(comes.from.grouped.df) {print("grouped")} else {print("ungrouped")}}

mutate(olddata, newcol = myfunc(col1))
'ungrouped'

mutate(group_by(olddata, col1), newcol = myfunc(col1))
'grouped'
'grouped'
'grouped'


</details>
# 答案1
**得分**: 5
``` r
如果你想要在特定上下文中使用你的函数，并且在数据框未分组时发出警告，那么你可以这样做：
在`mutate`之外使用，会出现错误：
```r
myfunc(1:10)
#&gt; Error in myfunc(1:10): `myfunc`必须从`mutate`内部调用

在未分组的数据框或 tibble 上会得到一个警告：

tibble(iris) %&gt;% 
  mutate(x = myfunc(Sepal.Length))
#&gt; 警告信息：`myfunc`在未分组的数据框或 tibble 上被调用
#&gt; # 一个 tibble: 150 x 6
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width Species     x
#&gt;           &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
#&gt;  1          5.1         3.5          1.4         0.2 setosa   26.0
#&gt;  2          4.9         3            1.4         0.2 setosa   24.0
#&gt;  3          4.7         3.2          1.3         0.2 setosa   22.1
#&gt;  4          4.6         3.1          1.5         0.2 setosa   21.2
#&gt;  5          5           3.6          1.4         0.2 setosa   25  
#&gt;  6          5.4         3.9          1.7         0.4 setosa   29.2
#&gt;  7          4.6         3.4          1.4         0.3 setosa   21.2
#&gt;  8          5           3.4          1.5         0.2 setosa   25  
#&gt;  9          4.4         2.9          1.4         0.2 setosa   19.4
#&gt; 10          4.9         3.1          1.5         0.1 setosa   24.0
#&gt; # ... 还有 140 行

如果 tibble 被分组，它会毫无怨言地运行：

tibble(iris) %&gt;% 
  group_by(Species) %&gt;%
  mutate(x = myfunc(Sepal.Length))
#&gt; # 一个 tibble: 150 x 6
#&gt; # 分组:   Species [3]
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width Species     x
#&gt;           &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
#&gt;  1          5.1         3.5          1.4         0.2 setosa   26.0
#&gt;  2          4.9         3            1.4         0.2 setosa   24.0
#&gt;  3          4.7         3.2          1.3         0.2 setosa   22.1
#&gt;  4          4.6         3.1          1.5         0.2 setosa   21.2
#&gt;  5          5           3.6          1.4         0.2 setosa   25  
#&gt;  6          5.4         3.9          1.7         0.4 setosa   29.2
#&gt;  7          4.6         3.4          1.4         0.3 setosa   21.2
#&gt;  8          5           3.4          1.5         0.2 setosa   25  
#&gt;  9          4.4         2.9          1.4         0.2 setosa   19.4
#&gt; 10          4.9         3.1          1.5         0.1 setosa   24.0
#&gt; # ... 还有 140 行

<sup>在 2023-02-15 使用 reprex v2.0.2 创建</sup>

英文:

If you want your function used within a specific context, and emit a warning if the data frame is not grouped, then you can do:

library(tidyverse)
myfunc &lt;- function(x) {
  if(all(ls(envir = parent.frame()) == &quot;~&quot;)) {
    ss &lt;- sys.status()
    funcs &lt;- sapply(ss$sys.calls, function(x) deparse(as.list(x)[[1]]))
    wf &lt;- which(funcs == &quot;mutate&quot;)
    if(length(wf) == 0) stop(&quot;`myfunc` must be called from inside `mutate`&quot;)
    wf &lt;- max(wf)
    data &lt;- eval(substitute(.data), ss$sys.frames[[wf]])
    if(!inherits(data, &quot;grouped_df&quot;)) {
      warning(&quot;`myfunc` called on an ungrouped data frame / tibble.&quot;)
    }
    return(x^2)
  }
  stop(&quot;`myfunc` must be called from inside `mutate`&quot;)
}

Used outside mutate, we get an error:

myfunc(1:10)
#&gt; Error in myfunc(1:10): `myfunc` must be called from inside `mutate`

With an ungrouped data frame or tibble we get a warning:

tibble(iris) %&gt;% 
  mutate(x = myfunc(Sepal.Length))
#&gt; Warning in myfunc(Sepal.Length): `myfunc` called on an ungrouped data frame /
#&gt; tibble.
#&gt; # A tibble: 150 x 6
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width Species     x
#&gt;           &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
#&gt;  1          5.1         3.5          1.4         0.2 setosa   26.0
#&gt;  2          4.9         3            1.4         0.2 setosa   24.0
#&gt;  3          4.7         3.2          1.3         0.2 setosa   22.1
#&gt;  4          4.6         3.1          1.5         0.2 setosa   21.2
#&gt;  5          5           3.6          1.4         0.2 setosa   25  
#&gt;  6          5.4         3.9          1.7         0.4 setosa   29.2
#&gt;  7          4.6         3.4          1.4         0.3 setosa   21.2
#&gt;  8          5           3.4          1.5         0.2 setosa   25  
#&gt;  9          4.4         2.9          1.4         0.2 setosa   19.4
#&gt; 10          4.9         3.1          1.5         0.1 setosa   24.0
#&gt; # ... with 140 more rows

And it runs without complaint if the tibble is grouped:

tibble(iris) %&gt;% 
  group_by(Species) %&gt;%
  mutate(x = myfunc(Sepal.Length))
#&gt; # A tibble: 150 x 6
#&gt; # Groups:   Species [3]
#&gt;    Sepal.Length Sepal.Width Petal.Length Petal.Width Species     x
#&gt;           &lt;dbl&gt;       &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt; &lt;fct&gt;   &lt;dbl&gt;
#&gt;  1          5.1         3.5          1.4         0.2 setosa   26.0
#&gt;  2          4.9         3            1.4         0.2 setosa   24.0
#&gt;  3          4.7         3.2          1.3         0.2 setosa   22.1
#&gt;  4          4.6         3.1          1.5         0.2 setosa   21.2
#&gt;  5          5           3.6          1.4         0.2 setosa   25  
#&gt;  6          5.4         3.9          1.7         0.4 setosa   29.2
#&gt;  7          4.6         3.4          1.4         0.3 setosa   21.2
#&gt;  8          5           3.4          1.5         0.2 setosa   25  
#&gt;  9          4.4         2.9          1.4         0.2 setosa   19.4
#&gt; 10          4.9         3.1          1.5         0.1 setosa   24.0
#&gt; # ... with 140 more rows

<sup>Created on 2023-02-15 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何检测数据框是否被dplyr从子函数分组？

问题

Function conv_units() not working inside an ifelse() statement

在R的Plotly动画中，连接点的线段消失。

在R中解方程：

如何在R中创建一个由分组或嵌套计数组成的数据框？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。