英文:
How to avoid errors when using pROC within a loop/function?
问题
如何在循环/函数中将参数传递给pROC::roc()
函数?我尝试了多种不同的方法:!!sym(i)
,{{i}}
,as.name(i)
,以及使用非公式语法,例如roclist <- roc(response, i, df, quiet = TRUE)
。根据源代码来看,我怀疑问题可能出在变量名中的空格和“``”语法上。我还查看了roc_()
函数的源代码,但也无法使其正常工作。
示例数据:
df <- structure(list(response = c("Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit"), `Col 1` = c(436,
304, 594, 360, 234, 1751, 52, 93, 600, 613, 442, 196, 2231, 274,
204, 703, 392, 189, 139, 282, 201, 256, 382, 777, 514, 648, 175,
484, 551, 135, 497, 731, 101, 420, 49, 378, 1015, 887, 283, 386,
2439, 1006, 294, 296, 66, 317, 73, 131, 1515, 573, 233, 122,
403, 538, 544, 61, 118, 39, 356, 87, 453, 337, 124, 112, 362,
315, 264, 450, 511, 132, 78, 36, 109, 78, 503, 280, 105, 567,
676, 132, 323, 356, 409, 277, 171, 114, 248, 36, 331, 510, 91,
116, 263, 152, 259, 137, 171, 278, 198, 247), Col_2 = c(417,
267, 561, 340, 218, 1681, 50, 90, 566, 566, 424, 184, 2044, 258,
195, 665, 374, 181, 126, 262, 193, 249, 351, 717, 499, 589, 168,
437, 505, 125, 451, 684, 97, 392, 44, 352, 915, 825, 259, 362,
2238, 940, 264, 267, 63, 293, 70, 116, 1383, 538, 209, 118, 386,
510, 514, 55, 110, 38, 338, 78, 434, 326, 112, 108, 340, 281,
252, 418, 485, 128, 77, 35, 100, 73, 465, 257, 102, 534, 628,
127, 297, 345, 391, 257, 149, 108, 229, 33, 307, 472, 84, 105,
246, 137, 241, 120, 156, 251, 179, 235), `3` = c(9.832, 15.356,
15.865, 18.529, 15.138, 13.623, 10, 11.111, 12.014, 16.784, 15.094,
16.304, 13.209, 8.915, 7.692, 13.534, 10.963, 6.63, 19.048, 11.069,
11.399, 4.819, 16.524, 17.992, 7.615, 12.054, 11.905, 13.959,
14.851, 6.4, 15.743, 23.246, 8.247, 11.48, 22.727, 16.477, 14.645,
8.242, 15.058, 15.47, 15.103, 11.064, 15.53, 19.85, 7.937, 14.676,
15.714, 17.241, 15.04, 16.171, 13.876, 13
<details>
<summary>英文:</summary>
How do I pass arguments to the `pROC::roc()` function when used in a loop/function? I've tried a number of different approaches: `!!sym(i)`, `{{i}}`, `as.name(i)`, and using non-formula syntax e.g. `roclist <- roc(response, i, df, quiet = TRUE)`. Based on [the source code](https://github.com/cran/pROC/blob/master/R/roc.R) I suspect the issue is the spaces in the variable names and the "\`\`" syntax. I've also looked at the source code for the [`roc_()` function]() but i couldn't get that to work either.
Example data:
``` r
df <- structure(list(response = c("Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "Benefit",
"Benefit", "Benefit", "Benefit", "Benefit", "Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit",
"No_Benefit", "No_Benefit", "No_Benefit", "No_Benefit"), `Col 1` = c(436,
304, 594, 360, 234, 1751, 52, 93, 600, 613, 442, 196, 2231, 274,
204, 703, 392, 189, 139, 282, 201, 256, 382, 777, 514, 648, 175,
484, 551, 135, 497, 731, 101, 420, 49, 378, 1015, 887, 283, 386,
2439, 1006, 294, 296, 66, 317, 73, 131, 1515, 573, 233, 122,
403, 538, 544, 61, 118, 39, 356, 87, 453, 337, 124, 112, 362,
315, 264, 450, 511, 132, 78, 36, 109, 78, 503, 280, 105, 567,
676, 132, 323, 356, 409, 277, 171, 114, 248, 36, 331, 510, 91,
116, 263, 152, 259, 137, 171, 278, 198, 247), Col_2 = c(417,
267, 561, 340, 218, 1681, 50, 90, 566, 566, 424, 184, 2044, 258,
195, 665, 374, 181, 126, 262, 193, 249, 351, 717, 499, 589, 168,
437, 505, 125, 451, 684, 97, 392, 44, 352, 915, 825, 259, 362,
2238, 940, 264, 267, 63, 293, 70, 116, 1383, 538, 209, 118, 386,
510, 514, 55, 110, 38, 338, 78, 434, 326, 112, 108, 340, 281,
252, 418, 485, 128, 77, 35, 100, 73, 465, 257, 102, 534, 628,
127, 297, 345, 391, 257, 149, 108, 229, 33, 307, 472, 84, 105,
246, 137, 241, 120, 156, 251, 179, 235), `3` = c(9.832, 15.356,
15.865, 18.529, 15.138, 13.623, 10, 11.111, 12.014, 16.784, 15.094,
16.304, 13.209, 8.915, 7.692, 13.534, 10.963, 6.63, 19.048, 11.069,
11.399, 4.819, 16.524, 17.992, 7.615, 12.054, 11.905, 13.959,
14.851, 6.4, 15.743, 23.246, 8.247, 11.48, 22.727, 16.477, 14.645,
8.242, 15.058, 15.47, 15.103, 11.064, 15.53, 19.85, 7.937, 14.676,
15.714, 17.241, 15.04, 16.171, 13.876, 13.559, 30.829, 11.373,
17.899, 14.545, 14.545, 15.789, 8.876, 10.256, 6.682, 14.11,
16.071, 22.222, 12.647, 18.505, 8.333, 15.789, 15.052, 18.75,
5.195, 17.143, 17, 15.068, 13.548, 15.953, 16.667, 22.659, 12.261,
16.535, 10.101, 30.725, 14.834, 8.56, 20.134, 12.963, 14.41,
18.182, 13.355, 15.254, 20.238, 13.333, 11.789, 14.599, 14.523,
27.5, 14.744, 19.522, 20.67, 15.319)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -100L), groups = structure(list(
response = c("Benefit", "No_Benefit"), .rows = structure(list(
1:50, 51:100), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), .drop = TRUE, class = c("tbl_df",
"tbl", "data.frame")))
My code so far (runs as expected, but not with lapply):
library(tidyverse)
library(pROC)
#> Type 'citation("pROC")' for a citation.
#>
#> Attaching package: 'pROC'
#> The following objects are masked from 'package:stats':
#>
#> cov, smooth, var
list_of_variables_to_plot <- colnames(df[-c(1)])
plot_roc_curves <- function(i) {
roclist <- roc(response ~ i, df, quiet = TRUE)
ggroc(roclist, legacy.axes = TRUE) +
annotate("text", label = sprintf("AUC: %.2f (%.2f-%.2f)",
roclist$auc, ci(roclist)[1],
ci(roclist)[3]),
x = 0.1, y = 0.75, hjust = 0) +
annotate("segment", x = 0, xend = 1, y = 0, yend = 1,
color = "red", linetype = "dashed") +
theme_minimal(base_size = 18) +
ggtitle(paste("Test",
i, sep = ": ")) +
geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),
fill = NA, color = "black", linewidth = 0.05) +
theme(title = element_text(size = 11))
}
## Plot the data (one plot per page)
pdf(file = paste0("test_output_",
gsub("-", "", Sys.Date()),
".pdf"), width = 7, height = 7)
lapply(list_of_variables_to_plot, plot_roc_curves)
#> Error in model.frame.default(formula = response ~ i, data = df, na.action = "na.pass"): variable lengths differ (found for 'i')
dev.off()
#> quartz_off_screen
#> 2
答案1
得分: 1
无法在公式中通过名称引用变量。
这不适用于任何以公式作为输入的函数:
> fit_glm <- function(i) {
model <- glm(response ~ i, data = df)
}
lapply(list_of_variables_to_plot, fit_glm)
Error in model.frame.default(formula = response ~ i, data = df, drop.unused.levels = TRUE) :
variable lengths differ (found for 'i')
对于最终用户来说,公式和tidyverse风格的非标准评估非常好,但在编程中却是一场噩梦。我会避免在函数中使用它们,因为在这种情况下它们是完全不必要的。
请使用传统的R评估,最好显式传递参数。
plot_roc_curves <- function(predictor, response, df) {
roc_curve <- roc(df[[response]], df[[predictor]], quiet = TRUE)
...
}
lapply(list_of_variables_to_plot, plot_roc_curves, df=df, response="response")
英文:
You can't refer to a variable by name in a formula.
This won't work with any function taking a formula as input:
> fit_glm <- function(i) {
model <- glm(response ~ i, data = df)
}
lapply(list_of_variables_to_plot, fit_glm)
Error in model.frame.default(formula = response ~ i, data = df, drop.unused.levels = TRUE) :
variable lengths differ (found for 'i')
Formulas and tidyverse-style non standard evaluation are great for the end user, but they are a nightmare to program with. I would avoid using them with a function, they are totally unnecessary in that context.
Use good old standard R evaluation, preferably passing parameters explicitly.
plot_roc_curves <- function(predictor, response, df) {
roc_curve <- roc(df[[response]], df[[predictor]], quiet = TRUE)
...
}
lapply(list_of_variables_to_plot, plot_roc_curves, df=df, response="response")
答案2
得分: 0
如果您在roc()
函数中不使用公式语法,可以使用data[[i]]
:
library(tidyverse)
library(pROC)
plot_roc_curves <- function(i) {
roclist <- roc(df$response, df[[i]], quiet = TRUE)
ggroc(roclist, legacy.axes = TRUE) +
annotate("text", label = sprintf("AUC: %.2f (%.2f-%.2f)",
roclist$auc, ci(roclist)[1],
ci(roclist)[3]),
x = 0.1, y = 0.75, hjust = 0) +
annotate("segment", x = 0, xend = 1, y = 0, yend = 1,
color = "red", linetype = "dashed") +
theme_minimal(base_size = 18) +
ggtitle(paste("Test",
i, sep = ": ")) +
geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),
fill = NA, color = "black", linewidth = 0.05) +
theme(title = element_text(size = 11))
}
list_of_variables_to_plot <- colnames(df[-c(1)])
## Plot the data (one plot per page)
pdf(file = paste0("test_output_",
gsub("-", "", Sys.Date()),
".pdf"), width = 7, height = 7)
lapply(list_of_variables_to_plot, plot_roc_curves)
创建于2023-07-10,使用reprex v2.0.2
不确定这是否是“最佳”方式,但它“有效”。
英文:
If you don't use the formula syntax in the roc()
function you can use data[[i]]
:
library(tidyverse)
library(pROC)
plot_roc_curves <- function(i) {
roclist <- roc(df$response, df[[i]], quiet = TRUE)
ggroc(roclist, legacy.axes = TRUE) +
annotate("text", label = sprintf("AUC: %.2f (%.2f-%.2f)",
roclist$auc, ci(roclist)[1],
ci(roclist)[3]),
x = 0.1, y = 0.75, hjust = 0) +
annotate("segment", x = 0, xend = 1, y = 0, yend = 1,
color = "red", linetype = "dashed") +
theme_minimal(base_size = 18) +
ggtitle(paste("Test",
i, sep = ": ")) +
geom_rect(aes(xmin = 0, xmax = 1, ymin = 0, ymax = 1),
fill = NA, color = "black", linewidth = 0.05) +
theme(title = element_text(size = 11))
}
list_of_variables_to_plot <- colnames(df[-c(1)])
## Plot the data (one plot per page)
pdf(file = paste0("test_output_",
gsub("-", "", Sys.Date()),
".pdf"), width = 7, height = 7)
lapply(list_of_variables_to_plot, plot_roc_curves)
#> [[1]]
#>
#> [[2]]
#>
#> [[3]]
dev.off()
#> quartz_off_screen
#> 2
<sup>Created on 2023-07-10 with reprex v2.0.2</sup>
Not sure if this is the 'best' way, but it 'works'.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论