英文:
How to detect if data.frame is grouped by dplyr from subfunction?
问题
我有一个R包,其中一些函数通常设计为在dplyr函数mutate或summarize内部调用。
newdata <- dplyr::mutate(group_by(olddata, col1), newcol = myfunc(col1))
然而,有时用户可能会忘记在将数据放入mutate或summarize调用之前对其进行分组。
newdata <- dplyr::mutate(olddata, newcol = myfunc(col1))
当数据框没有首先分组时,包函数将产生大部分毫无意义的结果。但是,不会有明显的错误或警告,这可能会让用户对问题的原因感到不确定。
我想在myfunc代码内部添加一个Warning(),当myfunc检测到输入数据不来自分组的数据框时。然而,我无法弄清楚myfunc如何检测数据是否来自分组的数据框。似乎mutate只传递一个向量给myfunc,所以dplyr::is.grouped_df和inherits(x, "grouped_df")都返回false。
我想要的是:
myfunc <- function(x) {
if (comes.from.grouped.df) {
print("grouped")
} else {
print("ungrouped")
}
}
mutate(olddata, newcol = myfunc(col1))
'ungrouped'
mutate(group_by(olddata, col1), newcol = myfunc(col1))
'grouped'
'grouped'
'grouped'
<details>
<summary>英文:</summary>
I have an R package where some functions are designed to be typically called within dplyr functions mutate or summarize.
newdata <- dplyr::mutate(group_by(olddata, col1), newcol = myfunc(col1))
However, sometimes users might forget to group their data before putting it into the mutate or summarize call.
newdata <- dplyr::mutate(olddata, newcol = myfunc(col1))
When the data frame is not grouped first, the package functions will produce largely nonsensical results. However, there won't be any errors or warnings per se, which could leave users uncertain about the cause of the issue.
I'd like to add a `Warning()` within the `myfunc` code when `myfunc` detects that the input data isn't coming from a grouped `data.frame`. However, I can't figure out how `myfunc` could detect if the data is coming from a grouped `data.frame`. It appears that `mutate` only passes a vector to `myfunc`, so both `dplyr::is.grouped_df` and `inherits(x, "grouped_df")` return false.
What I would like:
myfunc <- function(x) {if(comes.from.grouped.df) {print("grouped")} else {print("ungrouped")}}
mutate(olddata, newcol = myfunc(col1))
'ungrouped'
mutate(group_by(olddata, col1), newcol = myfunc(col1))
'grouped'
'grouped'
'grouped'
</details>
# 答案1
**得分**: 5
``` r
如果你想要在特定上下文中使用你的函数,并且在数据框未分组时发出警告,那么你可以这样做:
在`mutate`之外使用,会出现错误:
```r
myfunc(1:10)
#> Error in myfunc(1:10): `myfunc`必须从`mutate`内部调用
在未分组的数据框或 tibble 上会得到一个警告:
tibble(iris) %>%
mutate(x = myfunc(Sepal.Length))
#> 警告信息:`myfunc`在未分组的数据框或 tibble 上被调用
#> # 一个 tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 26.0
#> 2 4.9 3 1.4 0.2 setosa 24.0
#> 3 4.7 3.2 1.3 0.2 setosa 22.1
#> 4 4.6 3.1 1.5 0.2 setosa 21.2
#> 5 5 3.6 1.4 0.2 setosa 25
#> 6 5.4 3.9 1.7 0.4 setosa 29.2
#> 7 4.6 3.4 1.4 0.3 setosa 21.2
#> 8 5 3.4 1.5 0.2 setosa 25
#> 9 4.4 2.9 1.4 0.2 setosa 19.4
#> 10 4.9 3.1 1.5 0.1 setosa 24.0
#> # ... 还有 140 行
如果 tibble 被分组,它会毫无怨言地运行:
tibble(iris) %>%
group_by(Species) %>%
mutate(x = myfunc(Sepal.Length))
#> # 一个 tibble: 150 x 6
#> # 分组: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 26.0
#> 2 4.9 3 1.4 0.2 setosa 24.0
#> 3 4.7 3.2 1.3 0.2 setosa 22.1
#> 4 4.6 3.1 1.5 0.2 setosa 21.2
#> 5 5 3.6 1.4 0.2 setosa 25
#> 6 5.4 3.9 1.7 0.4 setosa 29.2
#> 7 4.6 3.4 1.4 0.3 setosa 21.2
#> 8 5 3.4 1.5 0.2 setosa 25
#> 9 4.4 2.9 1.4 0.2 setosa 19.4
#> 10 4.9 3.1 1.5 0.1 setosa 24.0
#> # ... 还有 140 行
<sup>在 2023-02-15 使用 reprex v2.0.2 创建</sup>
英文:
If you want your function used within a specific context, and emit a warning if the data frame is not grouped, then you can do:
library(tidyverse)
myfunc <- function(x) {
if(all(ls(envir = parent.frame()) == "~")) {
ss <- sys.status()
funcs <- sapply(ss$sys.calls, function(x) deparse(as.list(x)[[1]]))
wf <- which(funcs == "mutate")
if(length(wf) == 0) stop("`myfunc` must be called from inside `mutate`")
wf <- max(wf)
data <- eval(substitute(.data), ss$sys.frames[[wf]])
if(!inherits(data, "grouped_df")) {
warning("`myfunc` called on an ungrouped data frame / tibble.")
}
return(x^2)
}
stop("`myfunc` must be called from inside `mutate`")
}
Used outside mutate
, we get an error:
myfunc(1:10)
#> Error in myfunc(1:10): `myfunc` must be called from inside `mutate`
With an ungrouped data frame or tibble we get a warning:
tibble(iris) %>%
mutate(x = myfunc(Sepal.Length))
#> Warning in myfunc(Sepal.Length): `myfunc` called on an ungrouped data frame /
#> tibble.
#> # A tibble: 150 x 6
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 26.0
#> 2 4.9 3 1.4 0.2 setosa 24.0
#> 3 4.7 3.2 1.3 0.2 setosa 22.1
#> 4 4.6 3.1 1.5 0.2 setosa 21.2
#> 5 5 3.6 1.4 0.2 setosa 25
#> 6 5.4 3.9 1.7 0.4 setosa 29.2
#> 7 4.6 3.4 1.4 0.3 setosa 21.2
#> 8 5 3.4 1.5 0.2 setosa 25
#> 9 4.4 2.9 1.4 0.2 setosa 19.4
#> 10 4.9 3.1 1.5 0.1 setosa 24.0
#> # ... with 140 more rows
And it runs without complaint if the tibble is grouped:
tibble(iris) %>%
group_by(Species) %>%
mutate(x = myfunc(Sepal.Length))
#> # A tibble: 150 x 6
#> # Groups: Species [3]
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
#> <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
#> 1 5.1 3.5 1.4 0.2 setosa 26.0
#> 2 4.9 3 1.4 0.2 setosa 24.0
#> 3 4.7 3.2 1.3 0.2 setosa 22.1
#> 4 4.6 3.1 1.5 0.2 setosa 21.2
#> 5 5 3.6 1.4 0.2 setosa 25
#> 6 5.4 3.9 1.7 0.4 setosa 29.2
#> 7 4.6 3.4 1.4 0.3 setosa 21.2
#> 8 5 3.4 1.5 0.2 setosa 25
#> 9 4.4 2.9 1.4 0.2 setosa 19.4
#> 10 4.9 3.1 1.5 0.1 setosa 24.0
#> # ... with 140 more rows
<sup>Created on 2023-02-15 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论