如何在循环/函数中使用pROC时避免错误?

huangapple go评论64阅读模式
英文:

How to avoid errors when using pROC within a loop/function?

问题

如何在循环/函数中将参数传递给pROC::roc()函数?我尝试了多种不同的方法:!!sym(i){{i}}as.name(i),以及使用非公式语法,例如roclist <- roc(response, i, df, quiet = TRUE)。根据源代码来看,我怀疑问题可能出在变量名中的空格和“``”语法上。我还查看了roc_()函数的源代码,但也无法使其正常工作。

示例数据:

df <- structure(list(response = c("Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit", 
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", 
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit"), `Col 1` = c(436, 
304, 594, 360, 234, 1751, 52, 93, 600, 613, 442, 196, 2231, 274, 
204, 703, 392, 189, 139, 282, 201, 256, 382, 777, 514, 648, 175, 
484, 551, 135, 497, 731, 101, 420, 49, 378, 1015, 887, 283, 386, 
2439, 1006, 294, 296, 66, 317, 73, 131, 1515, 573, 233, 122, 
403, 538, 544, 61, 118, 39, 356, 87, 453, 337, 124, 112, 362, 
315, 264, 450, 511, 132, 78, 36, 109, 78, 503, 280, 105, 567, 
676, 132, 323, 356, 409, 277, 171, 114, 248, 36, 331, 510, 91, 
116, 263, 152, 259, 137, 171, 278, 198, 247), Col_2 = c(417, 
267, 561, 340, 218, 1681, 50, 90, 566, 566, 424, 184, 2044, 258, 
195, 665, 374, 181, 126, 262, 193, 249, 351, 717, 499, 589, 168, 
437, 505, 125, 451, 684, 97, 392, 44, 352, 915, 825, 259, 362, 
2238, 940, 264, 267, 63, 293, 70, 116, 1383, 538, 209, 118, 386, 
510, 514, 55, 110, 38, 338, 78, 434, 326, 112, 108, 340, 281, 
252, 418, 485, 128, 77, 35, 100, 73, 465, 257, 102, 534, 628, 
127, 297, 345, 391, 257, 149, 108, 229, 33, 307, 472, 84, 105, 
246, 137, 241, 120, 156, 251, 179, 235), `3` = c(9.832, 15.356, 
15.865, 18.529, 15.138, 13.623, 10, 11.111, 12.014, 16.784, 15.094, 
16.304, 13.209, 8.915, 7.692, 13.534, 10.963, 6.63, 19.048, 11.069, 
11.399, 4.819, 16.524, 17.992, 7.615, 12.054, 11.905, 13.959, 
14.851, 6.4, 15.743, 23.246, 8.247, 11.48, 22.727, 16.477, 14.645, 
8.242, 15.058, 15.47, 15.103, 11.064, 15.53, 19.85, 7.937, 14.676, 
15.714, 17.241, 15.04, 16.171, 13.876, 13

<details>
<summary>英文:</summary>

How do I pass arguments to the `pROC::roc()` function when used in a loop/function? I&#39;ve tried a number of different approaches: `!!sym(i)`, `{{i}}`, `as.name(i)`, and using non-formula syntax e.g. `roclist &lt;- roc(response, i, df, quiet = TRUE)`. Based on [the source code](https://github.com/cran/pROC/blob/master/R/roc.R) I suspect the issue is the spaces in the variable names and the &quot;\`\`&quot; syntax. I&#39;ve also looked at the source code for the [`roc_()` function]() but i couldn&#39;t get that to work either.

Example data:
``` r
df &lt;- structure(list(response = c(&quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, 
                                  &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, 
                                  &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;, &quot;No_Benefit&quot;), `Col 1` = c(436, 
                                                                                                       304, 594, 360, 234, 1751, 52, 93, 600, 613, 442, 196, 2231, 274, 
                                                                                                       204, 703, 392, 189, 139, 282, 201, 256, 382, 777, 514, 648, 175, 
                                                                                                       484, 551, 135, 497, 731, 101, 420, 49, 378, 1015, 887, 283, 386, 
                                                                                                       2439, 1006, 294, 296, 66, 317, 73, 131, 1515, 573, 233, 122, 
                                                                                                       403, 538, 544, 61, 118, 39, 356, 87, 453, 337, 124, 112, 362, 
                                                                                                       315, 264, 450, 511, 132, 78, 36, 109, 78, 503, 280, 105, 567, 
                                                                                                       676, 132, 323, 356, 409, 277, 171, 114, 248, 36, 331, 510, 91, 
                                                                                                       116, 263, 152, 259, 137, 171, 278, 198, 247), Col_2 = c(417, 
                                                                                                                                                               267, 561, 340, 218, 1681, 50, 90, 566, 566, 424, 184, 2044, 258, 
                                                                                                                                                               195, 665, 374, 181, 126, 262, 193, 249, 351, 717, 499, 589, 168, 
                                                                                                                                                               437, 505, 125, 451, 684, 97, 392, 44, 352, 915, 825, 259, 362, 
                                                                                                                                                               2238, 940, 264, 267, 63, 293, 70, 116, 1383, 538, 209, 118, 386, 
                                                                                                                                                               510, 514, 55, 110, 38, 338, 78, 434, 326, 112, 108, 340, 281, 
                                                                                                                                                               252, 418, 485, 128, 77, 35, 100, 73, 465, 257, 102, 534, 628, 
                                                                                                                                                               127, 297, 345, 391, 257, 149, 108, 229, 33, 307, 472, 84, 105, 
                                                                                                                                                               246, 137, 241, 120, 156, 251, 179, 235), `3` = c(9.832, 15.356, 
                                                                                                                                                                                                                15.865, 18.529, 15.138, 13.623, 10, 11.111, 12.014, 16.784, 15.094, 
                                                                                                                                                                                                                16.304, 13.209, 8.915, 7.692, 13.534, 10.963, 6.63, 19.048, 11.069, 
                                                                                                                                                                                                                11.399, 4.819, 16.524, 17.992, 7.615, 12.054, 11.905, 13.959, 
                                                                                                                                                                                                                14.851, 6.4, 15.743, 23.246, 8.247, 11.48, 22.727, 16.477, 14.645, 
                                                                                                                                                                                                                8.242, 15.058, 15.47, 15.103, 11.064, 15.53, 19.85, 7.937, 14.676, 
                                                                                                                                                                                                                15.714, 17.241, 15.04, 16.171, 13.876, 13.559, 30.829, 11.373, 
                                                                                                                                                                                                                17.899, 14.545, 14.545, 15.789, 8.876, 10.256, 6.682, 14.11, 
                                                                                                                                                                                                                16.071, 22.222, 12.647, 18.505, 8.333, 15.789, 15.052, 18.75, 
                                                                                                                                                                                                                5.195, 17.143, 17, 15.068, 13.548, 15.953, 16.667, 22.659, 12.261, 
                                                                                                                                                                                                                16.535, 10.101, 30.725, 14.834, 8.56, 20.134, 12.963, 14.41, 
                                                                                                                                                                                                                18.182, 13.355, 15.254, 20.238, 13.333, 11.789, 14.599, 14.523, 
                                                                                                                                                                                                                27.5, 14.744, 19.522, 20.67, 15.319)), class = c(&quot;grouped_df&quot;, 
                                                                                                                                                                                                                                                                 &quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -100L), groups = structure(list(
                                                                                                                                                                                                                                                                   response = c(&quot;Benefit&quot;, &quot;No_Benefit&quot;), .rows = structure(list(
                                                                                                                                                                                                                                                                     1:50, 51:100), ptype = integer(0), class = c(&quot;vctrs_list_of&quot;, 
                                                                                                                                                                                                                                                                                                                  &quot;vctrs_vctr&quot;, &quot;list&quot;))), row.names = c(NA, -2L), .drop = TRUE, class = c(&quot;tbl_df&quot;, 
                                                                                                                                                                                                                                                                                                                                                                                           &quot;tbl&quot;, &quot;data.frame&quot;)))

My code so far (runs as expected, but not with lapply):

library(tidyverse)
library(pROC)
#&gt; Type &#39;citation(&quot;pROC&quot;)&#39; for a citation.
#&gt; 
#&gt; Attaching package: &#39;pROC&#39;
#&gt; The following objects are masked from &#39;package:stats&#39;:
#&gt; 
#&gt;     cov, smooth, var

list_of_variables_to_plot &lt;- colnames(df[-c(1)])

plot_roc_curves &lt;- function(i) {
  roclist &lt;- roc(response ~ i, df, quiet = TRUE)
  ggroc(roclist, legacy.axes = TRUE) +
    annotate(&quot;text&quot;, label = sprintf(&quot;AUC: %.2f (%.2f-%.2f)&quot;,
                                     roclist$auc, ci(roclist)[1], 
                                     ci(roclist)[3]), 
             x = 0.1, y = 0.75, hjust = 0) +
    annotate(&quot;segment&quot;, x = 0, xend = 1, y = 0, yend = 1,
             color = &quot;red&quot;, linetype = &quot;dashed&quot;) +
    theme_minimal(base_size = 18) +
    ggtitle(paste(&quot;Test&quot;,
                  i, sep = &quot;: &quot;)) +
    geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),
              fill = NA, color = &quot;black&quot;, linewidth = 0.05) +
    theme(title = element_text(size = 11))
}

## Plot the data (one plot per page)
pdf(file = paste0(&quot;test_output_&quot;,
                  gsub(&quot;-&quot;, &quot;&quot;, Sys.Date()),
                  &quot;.pdf&quot;), width = 7, height = 7)
lapply(list_of_variables_to_plot, plot_roc_curves)
#&gt; Error in model.frame.default(formula = response ~ i, data = df, na.action = &quot;na.pass&quot;): variable lengths differ (found for &#39;i&#39;)
dev.off()
#&gt; quartz_off_screen 
#&gt;                 2

答案1

得分: 1

无法在公式中通过名称引用变量。

这不适用于任何以公式作为输入的函数:

> fit_glm <- function(i) {
  model <- glm(response ~ i, data = df)
}
lapply(list_of_variables_to_plot, fit_glm)
Error in model.frame.default(formula = response ~ i, data = df, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'i')

对于最终用户来说,公式和tidyverse风格的非标准评估非常好,但在编程中却是一场噩梦。我会避免在函数中使用它们,因为在这种情况下它们是完全不必要的。

请使用传统的R评估,最好显式传递参数。

plot_roc_curves <- function(predictor, response, df) {
  roc_curve <- roc(df[[response]], df[[predictor]], quiet = TRUE)
  ...
}
lapply(list_of_variables_to_plot, plot_roc_curves, df=df, response="response")
英文:

You can't refer to a variable by name in a formula.

This won't work with any function taking a formula as input:

&gt; fit_glm &lt;- function(i) {
model &lt;- glm(response ~ i, data = df)
}
lapply(list_of_variables_to_plot, fit_glm)
Error in model.frame.default(formula = response ~ i, data = df, drop.unused.levels = TRUE) : 
variable lengths differ (found for &#39;i&#39;)

Formulas and tidyverse-style non standard evaluation are great for the end user, but they are a nightmare to program with. I would avoid using them with a function, they are totally unnecessary in that context.

Use good old standard R evaluation, preferably passing parameters explicitly.

plot_roc_curves &lt;- function(predictor, response, df) {
roc_curve &lt;- roc(df[[response]], df[[predictor]], quiet = TRUE)
...
}
lapply(list_of_variables_to_plot, plot_roc_curves, df=df, response=&quot;response&quot;)

答案2

得分: 0

如果您在roc()函数中不使用公式语法,可以使用data[[i]]

library(tidyverse)
library(pROC)

plot_roc_curves <- function(i) {
  roclist <- roc(df$response, df[[i]], quiet = TRUE)
  ggroc(roclist, legacy.axes = TRUE) +
    annotate("text", label = sprintf("AUC: %.2f (%.2f-%.2f)",
                                     roclist$auc, ci(roclist)[1], 
                                     ci(roclist)[3]), 
             x = 0.1, y = 0.75, hjust = 0) +
    annotate("segment", x = 0, xend = 1, y = 0, yend = 1,
             color = "red", linetype = "dashed") +
    theme_minimal(base_size = 18) +
    ggtitle(paste("Test",
                  i, sep = ": ")) +
    geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),
              fill = NA, color = "black", linewidth = 0.05) +
    theme(title = element_text(size = 11))
}

list_of_variables_to_plot <- colnames(df[-c(1)])

## Plot the data (one plot per page)
pdf(file = paste0("test_output_",
                  gsub("-", "", Sys.Date()),
                  ".pdf"), width = 7, height = 7)
lapply(list_of_variables_to_plot, plot_roc_curves)

创建于2023-07-10,使用reprex v2.0.2

如何在循环/函数中使用pROC时避免错误?

不确定这是否是“最佳”方式,但它“有效”。

英文:

If you don't use the formula syntax in the roc() function you can use data[[i]]:

library(tidyverse)
library(pROC)

plot_roc_curves &lt;- function(i) {
  roclist &lt;- roc(df$response, df[[i]], quiet = TRUE)
  ggroc(roclist, legacy.axes = TRUE) +
    annotate(&quot;text&quot;, label = sprintf(&quot;AUC: %.2f (%.2f-%.2f)&quot;,
                                     roclist$auc, ci(roclist)[1], 
                                     ci(roclist)[3]), 
             x = 0.1, y = 0.75, hjust = 0) +
    annotate(&quot;segment&quot;, x = 0, xend = 1, y = 0, yend = 1,
             color = &quot;red&quot;, linetype = &quot;dashed&quot;) +
    theme_minimal(base_size = 18) +
    ggtitle(paste(&quot;Test&quot;,
                  i, sep = &quot;: &quot;)) +
    geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),
              fill = NA, color = &quot;black&quot;, linewidth = 0.05) +
    theme(title = element_text(size = 11))
}

list_of_variables_to_plot &lt;- colnames(df[-c(1)])

## Plot the data (one plot per page)
pdf(file = paste0(&quot;test_output_&quot;,
                  gsub(&quot;-&quot;, &quot;&quot;, Sys.Date()),
                  &quot;.pdf&quot;), width = 7, height = 7)
lapply(list_of_variables_to_plot, plot_roc_curves)
#&gt; [[1]]
#&gt; 
#&gt; [[2]]
#&gt; 
#&gt; [[3]]
dev.off()
#&gt; quartz_off_screen 
#&gt;                 2

<sup>Created on 2023-07-10 with reprex v2.0.2</sup>

如何在循环/函数中使用pROC时避免错误?

Not sure if this is the 'best' way, but it 'works'.

huangapple
  • 本文由 发表于 2023年7月10日 12:02:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76650615.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定