如何检测数据框是否被dplyr从子函数分组?

huangapple go评论102阅读模式
英文:

How to detect if data.frame is grouped by dplyr from subfunction?

问题

  1. 我有一个R包,其中一些函数通常设计为在dplyr函数mutatesummarize内部调用。
  2. newdata <- dplyr::mutate(group_by(olddata, col1), newcol = myfunc(col1))
  3. 然而,有时用户可能会忘记在将数据放入mutatesummarize调用之前对其进行分组。
  4. newdata <- dplyr::mutate(olddata, newcol = myfunc(col1))
  5. 当数据框没有首先分组时,包函数将产生大部分毫无意义的结果。但是,不会有明显的错误或警告,这可能会让用户对问题的原因感到不确定。
  6. 我想在myfunc代码内部添加一个Warning(),当myfunc检测到输入数据不来自分组的数据框时。然而,我无法弄清楚myfunc如何检测数据是否来自分组的数据框。似乎mutate只传递一个向量给myfunc,所以dplyr::is.grouped_dfinherits(x, "grouped_df")都返回false
  7. 我想要的是:

myfunc <- function(x) {
if (comes.from.grouped.df) {
print("grouped")
} else {
print("ungrouped")
}
}

mutate(olddata, newcol = myfunc(col1))
'ungrouped'

mutate(group_by(olddata, col1), newcol = myfunc(col1))
'grouped'
'grouped'
'grouped'

  1. <details>
  2. <summary>英文:</summary>
  3. I have an R package where some functions are designed to be typically called within dplyr functions mutate or summarize.
  4. newdata &lt;- dplyr::mutate(group_by(olddata, col1), newcol = myfunc(col1))
  5. However, sometimes users might forget to group their data before putting it into the mutate or summarize call.
  6. newdata &lt;- dplyr::mutate(olddata, newcol = myfunc(col1))
  7. When the data frame is not grouped first, the package functions will produce largely nonsensical results. However, there won&#39;t be any errors or warnings per se, which could leave users uncertain about the cause of the issue.
  8. I&#39;d like to add a `Warning()` within the `myfunc` code when `myfunc` detects that the input data isn&#39;t coming from a grouped `data.frame`. However, I can&#39;t figure out how `myfunc` could detect if the data is coming from a grouped `data.frame`. It appears that `mutate` only passes a vector to `myfunc`, so both `dplyr::is.grouped_df` and `inherits(x, &quot;grouped_df&quot;)` return false.
  9. What I would like:

myfunc <- function(x) {if(comes.from.grouped.df) {print("grouped")} else {print("ungrouped")}}

mutate(olddata, newcol = myfunc(col1))
'ungrouped'

mutate(group_by(olddata, col1), newcol = myfunc(col1))
'grouped'
'grouped'
'grouped'

  1. </details>
  2. # 答案1
  3. **得分**: 5
  4. ``` r
  5. 如果你想要在特定上下文中使用你的函数,并且在数据框未分组时发出警告,那么你可以这样做:
  6. 在`mutate`之外使用,会出现错误:
  7. ```r
  8. myfunc(1:10)
  9. #&gt; Error in myfunc(1:10): `myfunc`必须从`mutate`内部调用

在未分组的数据框或 tibble 上会得到一个警告:

  1. tibble(iris) %&gt;%
  2. mutate(x = myfunc(Sepal.Length))
  3. #&gt; 警告信息:`myfunc`在未分组的数据框或 tibble 上被调用
  4. #&gt; # 一个 tibble: 150 x 6
  5. #&gt; Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
  6. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt; &lt;dbl&gt;
  7. #&gt; 1 5.1 3.5 1.4 0.2 setosa 26.0
  8. #&gt; 2 4.9 3 1.4 0.2 setosa 24.0
  9. #&gt; 3 4.7 3.2 1.3 0.2 setosa 22.1
  10. #&gt; 4 4.6 3.1 1.5 0.2 setosa 21.2
  11. #&gt; 5 5 3.6 1.4 0.2 setosa 25
  12. #&gt; 6 5.4 3.9 1.7 0.4 setosa 29.2
  13. #&gt; 7 4.6 3.4 1.4 0.3 setosa 21.2
  14. #&gt; 8 5 3.4 1.5 0.2 setosa 25
  15. #&gt; 9 4.4 2.9 1.4 0.2 setosa 19.4
  16. #&gt; 10 4.9 3.1 1.5 0.1 setosa 24.0
  17. #&gt; # ... 还有 140 行

如果 tibble 被分组,它会毫无怨言地运行:

  1. tibble(iris) %&gt;%
  2. group_by(Species) %&gt;%
  3. mutate(x = myfunc(Sepal.Length))
  4. #&gt; # 一个 tibble: 150 x 6
  5. #&gt; # 分组: Species [3]
  6. #&gt; Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
  7. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt; &lt;dbl&gt;
  8. #&gt; 1 5.1 3.5 1.4 0.2 setosa 26.0
  9. #&gt; 2 4.9 3 1.4 0.2 setosa 24.0
  10. #&gt; 3 4.7 3.2 1.3 0.2 setosa 22.1
  11. #&gt; 4 4.6 3.1 1.5 0.2 setosa 21.2
  12. #&gt; 5 5 3.6 1.4 0.2 setosa 25
  13. #&gt; 6 5.4 3.9 1.7 0.4 setosa 29.2
  14. #&gt; 7 4.6 3.4 1.4 0.3 setosa 21.2
  15. #&gt; 8 5 3.4 1.5 0.2 setosa 25
  16. #&gt; 9 4.4 2.9 1.4 0.2 setosa 19.4
  17. #&gt; 10 4.9 3.1 1.5 0.1 setosa 24.0
  18. #&gt; # ... 还有 140 行

<sup>在 2023-02-15 使用 reprex v2.0.2 创建</sup>

英文:

If you want your function used within a specific context, and emit a warning if the data frame is not grouped, then you can do:

  1. library(tidyverse)
  2. myfunc &lt;- function(x) {
  3. if(all(ls(envir = parent.frame()) == &quot;~&quot;)) {
  4. ss &lt;- sys.status()
  5. funcs &lt;- sapply(ss$sys.calls, function(x) deparse(as.list(x)[[1]]))
  6. wf &lt;- which(funcs == &quot;mutate&quot;)
  7. if(length(wf) == 0) stop(&quot;`myfunc` must be called from inside `mutate`&quot;)
  8. wf &lt;- max(wf)
  9. data &lt;- eval(substitute(.data), ss$sys.frames[[wf]])
  10. if(!inherits(data, &quot;grouped_df&quot;)) {
  11. warning(&quot;`myfunc` called on an ungrouped data frame / tibble.&quot;)
  12. }
  13. return(x^2)
  14. }
  15. stop(&quot;`myfunc` must be called from inside `mutate`&quot;)
  16. }

Used outside mutate, we get an error:

  1. myfunc(1:10)
  2. #&gt; Error in myfunc(1:10): `myfunc` must be called from inside `mutate`

With an ungrouped data frame or tibble we get a warning:

  1. tibble(iris) %&gt;%
  2. mutate(x = myfunc(Sepal.Length))
  3. #&gt; Warning in myfunc(Sepal.Length): `myfunc` called on an ungrouped data frame /
  4. #&gt; tibble.
  5. #&gt; # A tibble: 150 x 6
  6. #&gt; Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
  7. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt; &lt;dbl&gt;
  8. #&gt; 1 5.1 3.5 1.4 0.2 setosa 26.0
  9. #&gt; 2 4.9 3 1.4 0.2 setosa 24.0
  10. #&gt; 3 4.7 3.2 1.3 0.2 setosa 22.1
  11. #&gt; 4 4.6 3.1 1.5 0.2 setosa 21.2
  12. #&gt; 5 5 3.6 1.4 0.2 setosa 25
  13. #&gt; 6 5.4 3.9 1.7 0.4 setosa 29.2
  14. #&gt; 7 4.6 3.4 1.4 0.3 setosa 21.2
  15. #&gt; 8 5 3.4 1.5 0.2 setosa 25
  16. #&gt; 9 4.4 2.9 1.4 0.2 setosa 19.4
  17. #&gt; 10 4.9 3.1 1.5 0.1 setosa 24.0
  18. #&gt; # ... with 140 more rows

And it runs without complaint if the tibble is grouped:

  1. tibble(iris) %&gt;%
  2. group_by(Species) %&gt;%
  3. mutate(x = myfunc(Sepal.Length))
  4. #&gt; # A tibble: 150 x 6
  5. #&gt; # Groups: Species [3]
  6. #&gt; Sepal.Length Sepal.Width Petal.Length Petal.Width Species x
  7. #&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;fct&gt; &lt;dbl&gt;
  8. #&gt; 1 5.1 3.5 1.4 0.2 setosa 26.0
  9. #&gt; 2 4.9 3 1.4 0.2 setosa 24.0
  10. #&gt; 3 4.7 3.2 1.3 0.2 setosa 22.1
  11. #&gt; 4 4.6 3.1 1.5 0.2 setosa 21.2
  12. #&gt; 5 5 3.6 1.4 0.2 setosa 25
  13. #&gt; 6 5.4 3.9 1.7 0.4 setosa 29.2
  14. #&gt; 7 4.6 3.4 1.4 0.3 setosa 21.2
  15. #&gt; 8 5 3.4 1.5 0.2 setosa 25
  16. #&gt; 9 4.4 2.9 1.4 0.2 setosa 19.4
  17. #&gt; 10 4.9 3.1 1.5 0.1 setosa 24.0
  18. #&gt; # ... with 140 more rows

<sup>Created on 2023-02-15 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月16日 04:08:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464969.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定