英文:
How to plot logistic probability for custom generalized additive model (GAM) plot?
问题
我明白,以下是代码部分的中文翻译:
库和数据
library(mgcv)
library(gamair)
library(tidyverse)
data("wesdr")
拟合模型
fit <- gam(
ret ~ s(dur),
method = "REML",
family = binomial,
data = wesdr
)
评估平滑效应
sm <- smooth_estimates(fit) %>%
add_confint()
sm
添加偏残差
wesdr <- wesdr %>%
add_partial_residuals(fit)
绘图
p <- sm %>%
filter(smooth == "s(dur)") %>%
ggplot() +
geom_rug(aes(x = dur),
data = wesdr,
sides = "b",
length = grid::unit(0.02, "npc")) +
geom_ribbon(aes(ymin = lower_ci,
ymax = upper_ci,
x = dur),
alpha = 0.2) +
geom_line(aes(x = dur,
y = est),
lwd = 1.2) +
labs(y = "部分效应",
title = "s(dur)")
p
具体来说,我在寻找与以下功能等效的内容:
plot(fit,
trans = plogis,
shift = coef(fit)[1])
有关如何获得概率图的建议呢?
英文:
I realize that there is already a question pertaining to this here. However, I'm not looking to use the draw
function and would prefer to build up the plot by scratch like shown here. However, the plot shown in the link doesn't show how to do this with logistic probability. I show an example that gets close, but only plots by the link function.
#### Libraries and Data ####
library(mgcv)
library(gamair)
library(tidyverse)
data("wesdr")
#### Fit Model ####
fit <- gam(
ret ~ s(dur),
method = "REML",
family = binomial,
data = wesdr
)
#### Evaluate the Smooths ####
sm <- smooth_estimates(fit) %>%
add_confint()
sm
#### Add Partial Residuals ####
wesdr <- wesdr %>%
add_partial_residuals(fit)
#### Plot ####
p <- sm %>%
filter(smooth == "s(dur)") %>%
ggplot() +
geom_rug(aes(x = dur),
data = wesdr,
sides = "b",
length = grid::unit(0.02, "npc")) +
geom_ribbon(aes(ymin = lower_ci,
ymax = upper_ci,
x = dur),
alpha = 0.2) +
geom_line(aes(x = dur,
y = est),
lwd = 1.2) +
labs(y = "Partial effect",
title = "s(dur)")
p
Specifically, I'm looking for something functionally equivalent to this:
plot(fit,
trans = plogis,
shift = coef(fit)[1])
Any advice on how to get the probability plot?
Edit
I realize I wasn't precise about what I needed. The answer given is good, but I'm considering the most general-case use where a logistic GAM has multiple predictors. So I need a plot based off a model like this:
#### Fit Model ####
fit <- gam(
ret
~ s(dur)
+ s(bmi),
method = "REML",
family = binomial,
data = wesdr
)
To account for the average value of other predictors, I need to include the intercept into the plot, which is why I originally used shift
in the plot.gam
function in base R.
答案1
得分: 2
以下是代码的翻译部分:
链接函数返回对数几率。由于几率只是 `p / (1 - p)`,那么对数几率就是 `log(p / (1 - p))`。反之为 `exp(对数几率) / (1 + exp(对数几率))`。所以我们可以这样做:
```r
p$data <- mutate(p$data, across(c(est, lower_ci, upper_ci), ~exp(.x)/(1 + exp(.x))))
p + ylim(c(0, 1))
要在具有多个自变量的情况下手动获取部分效应,可以在其他变量的均值处预测结果,同时只更改感兴趣的变量。在这里,我们将美化 ggplot,使它看起来更像基本的 R 图:
fit <- gam(
ret
~ s(dur)
+ s(bmi),
method = "REML",
family = binomial,
data = wesdr
)
newdata <- data.frame(dur = 0:55, bmi = mean(wesdr$bmi))
pred <- predict(fit, newdata, se.fit = TRUE)
newdata$ret <- exp(pred$fit) / (1 + exp(pred$fit))
newdata$upper <- exp(pred$fit + 1.96 * pred$se.fit) /
(1 + exp(pred$fit + 1.96 * pred$se.fit))
newdata$lower <- exp(pred$fit - 1.96 * pred$se.fit) /
(1 + exp(pred$fit - 1.96 * pred$se.fit))
ggplot(newdata, aes(x = dur)) +
geom_rug(data = wesdr, sides = "b", length = grid::unit(0.02, "npc"),
lwd = 0.2, col = "gray60") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0, linetype = 2,
color = "black") +
geom_line(aes(y = ret), lwd = 1.2) +
labs(y = "Partial effect", title = "s(dur)") +
scale_y_continuous(breaks = 0.2 * 1:5) +
scale_x_continuous(breaks = 10 * 0:5) +
theme_classic(base_size = 20)
<details>
<summary>英文:</summary>
The link function returns log odds. Since odds are just `p / (1 - p)`, then log odds are `log(p / (1 - p))`. The inverse is `exp(odds) / (1 + exp(odds))`. So we can just do:
```r
p$data <- mutate(p$data, across(c(est, lower_ci, upper_ci), ~exp(.x)/(1 + exp(.x))))
p + ylim(c(0, 1))
To get the partial effect by hand when you have multiple independent variables, you can predict the outcome at the mean of the other variables while only changing the variable of interest. Here, we'll dress up the ggplot to make it look more like the base R plot:
fit <- gam(
ret
~ s(dur)
+ s(bmi),
method = "REML",
family = binomial,
data = wesdr
)
newdata <- data.frame(dur = 0:55, bmi = mean(wesdr$bmi))
pred <- predict(fit, newdata, se.fit = TRUE)
newdata$ret <- exp(pred$fit) / (1 + exp(pred$fit))
newdata$upper <- exp(pred$fit + 1.96 * pred$se.fit) /
(1 + exp(pred$fit + 1.96 * pred$se.fit))
newdata$lower <- exp(pred$fit - 1.96 * pred$se.fit) /
(1 + exp(pred$fit - 1.96 * pred$se.fit))
ggplot(newdata, aes(x = dur)) +
geom_rug(data = wesdr, sides = "b", length = grid::unit(0.02, "npc"),
lwd = 0.2, col = "gray60") +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0, linetype = 2,
color = "black") +
geom_line(aes(y = ret), lwd = 1.2) +
labs(y = "Partial effect", title = "s(dur)") +
scale_y_continuous(breaks = 0.2 * 1:5) +
scale_x_continuous(breaks = 10 * 0:5) +
theme_classic(base_size = 20)
答案2
得分: 2
你可以简单地使用 predict.gam
使用 type='link'
和 se=TRUE
。
mod <- gam(
ret ~ s(dur),
method = "REML",
family = binomial,
data = wesdr
)
`pred <- predict.gam(mod, wesdr, type = "link", se=TRUE)`
然后使用反函数链接适当地缩放。
```R
wesdr %>%
mutate(fit = mod$family$linkinv(pred$fit),
lci = mod$family$linkinv(pred$fit - 1.96 * pred$se.fit),
uci = mod$family$linkinv(pred$fit + 1.96 * pred$se.fit)) %>%
ggplot(aes(dur, fit)) +
geom_line() +
geom_ribbon(aes(ymin=lci, ymax = uci), alpha = 0.3) +
geom_rug(aes(x=dur), sides = "b",
length = grid::unit(0.02, "npc"))
英文:
You can simply use predict.gam
using type='link'
and se=TRUE
mod <- gam(
ret ~ s(dur),
method = "REML",
family = binomial,
data = wesdr
)
pred <- predict.gam(mod, wesdr, type = "link", se=TRUE)
Then use the inverse link to scale appropriately
wesdr %>%
mutate(fit = mod$family$linkinv(pred$fit),
lci = mod$family$linkinv(pred$fit - 1.96 * pred$se.fit),
uci = mod$family$linkinv(pred$fit + 1.96 * pred$se.fit)) %>%
ggplot(aes(dur, fit)) +
geom_line() +
geom_ribbon(aes(ymin=lci, ymax = uci), alpha = 0.3) +
geom_rug(aes(x=dur), sides = "b",
length = grid::unit(0.02, "npc"))
答案3
得分: 2
以下是代码部分的翻译:
使用你的扩展示例,通常的方法是从模型中预测你想要的值。
library("gratia")
library("mgcv")
data("wesdr", package = "gamair")
fit <- gam(ret ~ s(dur) + s(bmi),
data = wesdr, method = "REML", family = binomial)
在你想要的协变量值上创建一个数据切片。如果你只指定了`dur`,则会得到其他协变量,`bmi` 被设置为训练数据中最接近`bmi`中位数观察值的值。
ds1 <- data_slice(fit, dur = evenly(dur, n = 100))
但是也可以轻松指定其他值;你提到将其他协变量设置为它们的均值:
ds2 <- data_slice(fit, dur = evenly(dur, n = 100), bmi = mean(bmi))
然后使用predict()
:
fv2 <- fitted_values(fit, data = ds2, scale = "response")
然后绘图:
library("ggplot2")
fv2 %>%
ggplot(aes(x = dur, y = fitted)) +
geom_ribbon(aes(x = dur, ymin = lower, ymax = upper),
inherit.aes = FALSE, alpha = 0.2) +
geom_line() +
geom_rug(data = wesdr, aes(x = dur), sides = "b", inherit.aes = FALSE,
length = grid::unit(0.01, "npc"), alpha = 0.5)
这将产生如下图所示的结果。
英文:
The general way to do this is to predict from the model at the values you want. Using your extended example,
library("gratia")
library("mgcv")
data("wesdr", package = "gamair")
fit <- gam(ret ~ s(dur) + s(bmi),
data = wesdr, method = "REML", family = binomial)
Create a data slice at the values of the covariates you want. If you just specify dur
in the data slice then you will get the other covariate, bmi
set to the value of the observation closest to the median of bmi
in the training data
ds1 <- data_slice(fit, dur = evenly(dur, n = 100))
But it is easy to specify other values; you mentioned setting the other covariates to their mean:
ds2 <- data_slice(fit, dur = evenly(dur, n = 100), bmi = mean(bmi))
Then you predict()
:
fv2 <- fitted_values(fit, data = ds2, scale = "response")
Then plot
library("ggplot2")
fv2 |>
ggplot(aes(x = dur, y = fitted)) +
geom_ribbon(aes(x = dur, ymin = lower, ymax = upper),
inherit.aes = FALSE, alpha = 0.2) +
geom_line() +
geom_rug(data = wesdr, aes(x = dur), sides = "b", inherit.aes = FALSE,
length = grid::unit(0.01, "npc"), alpha = 0.5)
which produces
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论