2023年2月19日 05:58:54go评论117阅读模式

英文:

How to make beautiful ROC curves for two models in the same plot?

问题

I've trained two xgboost models, say model1 and model2. I have the AUC scores for each model and I want them to appear in the plot. I want to make beautiful ROC curves for both models in the same plot. Something like this:

如何在同一图中为两个模型制作漂亮的ROC曲线？

How can I do that?

I usually use the library pROC, and I know I need to extract the scores, and the truth from each model, right?

so something like this maybe:

roc1 = roc(model1$truth, model1$scores)
roc2 = roc(model2$truth, model2$scores)

I also need the fpr and tpr for each model:

D1 = data.frame(fpr = 1 - roc1$specificities, tpr = roc1$sensitivities)
D2 = data.frame(fpr = 1 - roc2$specificities, tpr = roc2$sensitivities)

Then I can maybe add arrows to point out which curve is which:

arrows = tibble(x1 = c(0.5, 0.13) , x2 = c(0.32, 0.2), y1 = c(0.52, 0.83), y2 = c(0.7, 0.7))

And finally ggplot: (this part is missing)

ggplot(data = D1, aes(x = fpr, y = tpr)) + 
geom_smooth(se = FALSE) + 
geom_smooth(data = D2, color = 'red', se = FALSE) + 
annotate("text", x = 0.5, 0.475, label = "score of model 1") + 
annotate("text", x = 0.13, y = 0.9, label = "scores of model 2")

So I need help with two things:

How do I get the right information out from the models, to make ROC curves? How do I get the truth and the prediction scores? The truth are just the labels of the target feature in the training set maybe?
How do I continue the code? and is my code right so far?

英文:

How can I do that?

I usually use the library pROC, and I know I need to extract the scores, and the truth from each model, right?

so something like this maybe:

roc1 = roc(model1$truth, model1$scores)
roc2 = roc(model2$truth, model2$scores)

I also need the fpr and tpr for each model:

D1 = data.frame = (fpr = 1 - roc1$specificities, tpr = roc1$sensitivities)
D2 = data.frame = (fpr = 1 - roc2$specificities, tpr = roc2$sensitivities)

Then I can maybe add arrows to point out which curve is which:

arrows = tibble(x1 = c(0.5, 0.13) , x2 = c(0.32, 0.2), y1 = c(0.52, 0.83), y2 = c(0.7,0.7) )

And finally ggplot: (this part is missing)

ggplot(data = D1, aes(x = fpr, y = tpr)) + 
geom_smooth(se = FALSE) + 
geom_smooth(data = D2, color = &#39;red&#39;, se = FALSE) + 
annotate(&quot;text&quot;, x = 0.5, 0.475, label = &#39;score of model 1&#39;) + 
annotate(&quot;text&quot;, x = 0.13, y = 0.9, label = scores of model 2&#39;) +

So I need help with two things:

How do I get the right information out from the models, to make ROC curves? How do I get the truth and the prediction scores? The truth are just the labels of the target feature in the training set maybe?
How do I continue the code? and is my code right so far?

答案1

得分: 3

以下是已翻译的内容：

You can get the sensitivity and specificity in a data frame using coords from pROC. Just rbind the results for the two models after first attaching a column labeling each set as model 1 or model 2. To get the smooth-looking ROC with automatic labels you can use geom_textsmooth from the geomtextpath package:

library(pROC)
library(geomtextpath)
roc1 <- roc(model1$truth, model1$scores)
roc2 <- roc(model2$truth, model2$scores)
df <- rbind(cbind(model = "Model 1", coords(roc1)), 
            cbind(model = "Model 2", coords(roc2)))
ggplot(df, aes(1 - specificity, sensitivity, color = model)) +
  geom_textsmooth(aes(label = model), size = 7, se = FALSE, span = 0.2,
                  textcolour = "black", vjust = 1.5, linewidth = 1,
                  text_smoothing = 50) +
  geom_abline() +
  scale_color_brewer(palette = "Set1", guide = "none", direction = -1) +
  scale_x_continuous("False Positive Rate", labels = scales::percent) +
  scale_y_continuous("True Positive Rate", labels = scales::percent) +
  coord_equal(expand = FALSE) +
  theme_classic(base_size = 20) +
  theme(plot.margin = margin(10, 30, 10, 10))

Data used

set.seed(2023)
model1 <- model2 <- data.frame(scores = rep(1:100, 50))
p1 <- model2$scores + rnorm(5000, 0, 20)
p2 <- model1$scores/100
model1$truth <- rbinom(5000, 1, (p1 - min(p1))/diff(range(p1)))
model2$truth <- rbinom(5000, 1, p2)

英文:

You can get the sensitivity and specifity in a data frame using coords from pROC. Just rbind the results for the two models after first attaching a column labelling each set as model 1 or model 2. To get the smooth-looking ROC with automatic labels you can use geom_textsmooth from the geomtextpath package:

library(pROC)
library(geomtextpath)
roc1 &lt;- roc(model1$truth, model1$scores)
roc2 &lt;- roc(model2$truth, model2$scores)
df &lt;- rbind(cbind(model = &quot;Model 1&quot;, coords(roc1)), 
            cbind(model = &quot;Model 2&quot;, coords(roc2)))
ggplot(df, aes(1 - specificity, sensitivity, color = model)) +
  geom_textsmooth(aes(label = model), size = 7, se = FALSE, span = 0.2,
                  textcolour = &quot;black&quot;, vjust = 1.5, linewidth = 1,
                  text_smoothing = 50) +
  geom_abline() +
  scale_color_brewer(palette = &quot;Set1&quot;, guide = &quot;none&quot;, direction = -1) +
  scale_x_continuous(&quot;False Positive Rate&quot;, labels = scales::percent) +
  scale_y_continuous(&quot;True Positive Rate&quot;, labels = scales::percent) +
  coord_equal(expand = FALSE) +
  theme_classic(base_size = 20) +
  theme(plot.margin = margin(10, 30, 10, 10))

Data used

set.seed(2023)
model1 &lt;- model2 &lt;- data.frame(scores = rep(1:100, 50))
p1 &lt;- model2$scores + rnorm(5000, 0, 20)
p2 &lt;- model1$scores/100
model1$truth &lt;- rbinom(5000, 1, (p1 - min(p1))/diff(range(p1)))
model2$truth &lt;- rbinom(5000, 1, p2)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在同一图中为两个模型制作漂亮的ROC曲线？

问题

答案1

每第n个元素的滞后

计算线性模型和其他广义线性模型的GAIC时出错。

错误：在R中，自定义函数的主体中包含了~（波浪号）和/或$（美元符号）。

Conditional calculation of new variable based on specific groups of rows, row values and columns in R dataframe

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。