2023年5月21日 06:52:41go评论105阅读模式

英文:

Manually plot GLM predictions with unscaled values

问题

模型有两个变量，crop和pop，均具有缩放值。
我正在使用GLMMtmb对其运行模型，但不确定如何手动绘制该模型的预测，对缩放的变量值进行反变换。
这是一个可重现示例的数据块。
对于我的数据框列：
Animal是动物的存在或不存在，crop和pop是可能影响存在或不存在的变量。
所以我运行了模型
```R
model <- glmmTMB(animal ~ crop + pop, family = "poisson", data = dummy)

我收到了一些代码，用于手动绘制预测，但它没有起作用。
这是代码，例如，对于'pop'变量。

range <- range(dummy$pop)
nd <- expand.grid(pop = seq(range[1], range[2], by = 0.5))
dummy <- as.data.frame(predict(model, newdata = nd, se.fit = TRUE))
dummy$lower <- dummy$fit - 1.96 * dummy$se.fit
dummy$upper <- dummy$fit + 1.96 * dummy$se.fit
dummy$fit_trans <- plogis(dummy$fit)
dummy$low_trans <- plogis(dummy$lower)
dummy$up_trans <- plogis(dummy$upper)
dummy <- cbind(dummy, nd)

这应该给我一个数据框，我可以传递给ggplot，但在创建对象'nd'（新数据）时出错。需要在那里添加一些内容，但我不确定是什么。
此外，我在这段代码中看不到在创建新数据框之前对变量数据进行非缩放的地方？


<details>
<summary>英文:</summary>
I have a model with two variables, crop and pop,
both have scaled values.
I am using GLMMtmb to run a model on it,
but I&#39;m unsure how to manually plot predictions from this model, backtransforming the scaled variable values.
A chunk of data is at the end of the post for a reproducible example.
so for my data frame columns:
Animal is the presence or absence of the animal, crop and pop the variables that may affect presence or absence.
So I run the model

model <- glmmTMB(animal~crop+pop,family="poisson",data=dummy)

I received some code from someone to manually plot predictions but it&#39;s not working.
This is the code, for for example, the &#39;pop&#39; variable.

range <- range(dummy$pop)

nd <- expand.grid(pop = seq(range1, range2, by=0.5))

dummy <- as.data.frame(predict(model, newdata=nd, se.fit=TRUE))
dummy$lower <- dummy$fit - 1.96dummy$se.fit
dummy$upper <- dummy$fit + 1.96dummy$se.fit

dummy$fit_trans <- plogis(dummy$fit)
dummy$low_trans <- plogis(dummy$lower)
dummy$up_trans <- plogis(dummy$upper)
dummy <- cbind(dummy, nd)

this should then give me a dataframe I can feed to ggplot, but it goes wrong where I create the object &#39;nd&#39; (newdata). Something needs to be added there but I&#39;m unsure what.
Additionally, I don&#39;t see in this code where my variable data is unscaled before creating the new data frame?

structure(list(animal = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0), crop = structure(c(-0.911367192804466, -0.656468577561402,
0.937306298620515, -0.748068186743915, -0.950289715380232, -0.950289715380232,
-0.311140903098454, -0.950289715380232, -0.420299555068266, -0.204030814471523,
0.50447619733435, -0.950289715380232, -0.777918539996587, -0.950289715380232,
-0.950289715380232, -0.840253112662893, -0.839667809844811, -0.949704414301948,
-0.950289715380232, -0.332797072571516, -0.950289715380232, 0.28586616195498,
-0.950289715380232, -0.949704414301948, -0.950289715380232, -0.217785406339271,
-0.950289715380232, -0.949704414301948, -0.950289715380232, -0.950289715380232,
-0.950289715380232, -0.917220203587303, -0.950289715380232, -0.950289715380232,
-0.950289715380232, -0.950289715380232, -0.683099761835028, -0.950289715380232,
-0.950289715380232, -0.950289715380232, -0.950289715380232, -0.950289715380232,
-0.950289715380232, -0.420006934975588, -0.950289715380232, -0.86512840935985,
-0.950289715380232, -0.880053585986186, -0.950289715380232, -0.947070559449672,
0.886092437742609, 1.83281709105803, 1.11640850164685, -0.950289715380232,
-0.946485258371388, 1.37335566631441, -0.126771093015647, -0.949704414301948,
-0.896734664107575, -0.123551937954986, -0.826498527754337, -0.950289715380232,
-0.307921748037792, 1.68180931195255, 0.606903807743094, -0.190568898369987,
-0.669052549874602, 1.74970429966619, -0.941217548666834, 1.81701382450034,
0.217678665495733, -0.235637069219248, 0.166464804617827, -0.949411763762807,
-0.945022005675679, 0.0827666616935611, 0.362833263318178, 1.5957700969398,
0.778982349115678, 1.85652176820044, 1.54836074265497, -0.950289715380232,
0.153588184375181, 1.1480148120681, -0.950289715380232, -0.860153345844944,
-0.551699672370031, -0.916927552178262, -0.791673117945951, -0.949704414301948,
-0.580672067915984, 1.70053900213117, 0.212703567184868, 0.465553605166667,
-0.490535698380696, 0.512963070798555, -0.40449645553117, -0.949704414301948,
0.437459181245815, 1.78921202067216, -0.939168994892841), dim = c(101L,
1L)), pop = structure(c(-0.197547924591922, -0.194351971156629,
-0.188274258566538, -0.196448483438051, -0.197547924591922, -0.197534996338939,
-0.197544742856045, -0.19666972080707, -0.184138726259079, -0.187096628792688,
-0.181474291574343, -0.197126836606957, -0.191909922110918, -0.197541385064299,
-0.197547924591922, -0.180014490953617, -0.196345541198624, -0.194080008065368,
-0.197547924591922, -0.169718619816262, -0.1973883809182, -0.19069419228899,
-0.19750465288443, -0.197547924591922, -0.197547924591922, -0.196222722433591,
-0.197547924591922, -0.197007381120755, -0.197547924591922, -0.197547924591922,
-0.197547924591922, -0.196084866482173, -0.196199779840005, -0.197547924591922,
-0.197547924591922, -0.196775288017525, -0.195128528488411, -0.197547924591922,
-0.197547924591922, -0.196661275901739, -0.197547924591922, -0.197547924591922,
-0.197547924591922, -0.195309309220705, -0.197547924591922, -0.196083127378007,
-0.197547924591922, -0.197047712962236, -0.197547924591922, -0.19661973821873,
0.624878194166003, -0.194073305851747, -0.165976797458507, -0.197547924591922,
-0.189654680869701, -0.138488532034927, -0.182502054986232, -0.151862484978888,
-0.146757177620867, -0.180411925114601, -0.196752195942981, -0.197416680215279,
-0.183794718471439, -0.173372247384172, -0.196556998242149, -0.174186298469269,
-0.178206794332005, -0.197547924591922, -0.196558496041342, -0.182985603845477,
0.0359655292697121, 0.14216237743388, -0.170265068895674, -0.197547924591922,
-0.19328285943275, -0.148427554030811, -0.115746893472626, -0.189714944074946,
-0.185416997205207, -0.1959619567473, -0.0975791931695816, -0.197326675563388,
-0.17579755900438, -0.0809452064393561, -0.197246026550915, 0.888153565621317,
-0.189168753981969, -0.193326520514112, -0.191939980937669, -0.197547923770278,
-0.193648598847386, -0.140792688570448, -0.162820943186907, -0.0971587686665226,
-0.184731765563276, -0.166566876527242, -0.0244512966150544,
-0.19624995376397, -0.147791915216302, -0.167901037976233, 16.3965287744725
), dim = c(101L, 1L))), class = "data.frame", row.names = c(2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L,
30L, 31L, 32L, 33L, 34L, 35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L,
43L, 45L, 46L, 47L, 49L, 50L, 51L, 52L, 53L, 3786L, 3787L, 3788L,
3789L, 3790L, 3791L, 3792L, 3793L, 3794L, 3795L, 3796L, 3797L,
3798L, 3799L, 3800L, 3801L, 3802L, 3803L, 3804L, 3805L, 3806L,
3807L, 3808L, 3809L, 3810L, 3811L, 3812L, 3813L, 3814L, 3815L,
3816L, 3817L, 3818L, 3819L, 3820L, 3821L, 3822L, 3823L, 3824L,
3825L, 3826L, 3827L, 3828L, 3829L, 3830L, 3831L, 3832L, 3833L,
3834L, 3835L, 3836L))


</details>
# 答案1
**得分**: 0
1) 如果您只是测量动物的存在或缺失，那么您需要使用 logistic 回归，而不是泊松回归（即在您的模型中应该有 `family = "binomial"`）。
2) 不清楚您为什么使用 `glmmTMB`，因为您的模型只有固定效应，这意味着基本的 R 的 `glm` 也可以胜任。
3) 您的预测数据框的主要问题是，您的模型需要同时提供 `crop` 和 `pop` 的值，但您只提供了 `pop` 的值。
4) 在您的虚拟数据框中，`crop` 存在一个极端异常值，这意味着整个范围内等间隔的值并不能代表您的数据。
5) 如果有两个相关的因变量，要可视化带有上下置信区间的预测是困难的。
要获得关于不同 `crop` 和 `pop` 值的概率预测的二维网格，我们可以执行以下操作：
```r
library(glmmTMB)
model <- glmmTMB(animal ~ crop + pop, family = "binomial", data = dummy)
nd <- expand.grid(crop = seq(min(dummy$crop), max(dummy$crop), length = 10),
                  pop  = seq(-0.2, -0.15, length = 10))
preds <- predict(model, newdata = nd, se.fit = TRUE)
nd$pred  <- plogis(preds$fit)
nd$lower <- plogis(preds$fit - 1.96 * preds$se.fit)
nd$upper <- plogis(preds$fit + 1.96 * preds$se.fit)
head(nd)

关于绘图，我们可以将预测网格显示为 geom_tile，如下所示：

library(ggplot2)
ggplot(nd, aes(crop, pop, fill = pred)) +
  geom_tile() +
  scale_fill_viridis_c("Probability of\nAnimal presence") +
  coord_fixed(60) +
  theme_minimal(base_size = 16)

如果您真的想看到置信区间，您需要使用 facets：

ggplot(within(nd, pop <- factor(paste("pop =", round(pop, 3)))),
       aes(crop, pred)) +
  geom_errorbar(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  geom_point() +
  facet_wrap(. ~ pop) +
  theme_bw()

英文:

There are a few issues here.

If you are only measuring presence / absence of animals, then you need a logistic regression rather than a Poisson regression (i.e. you should have family = "binomial" in your model).
It's not clear why you are using glmmTMB, since you have a fixed effect only model, meaning that base R's glm would do just as well.
The main issue with your prediction data frame is that your model needs values for both crop and pop, but you are only supplying values for pop
In your dummy data frame, there is a single extreme outlier for crop which means that equally-spaced values across the whole range are not representative of your data.
It is difficult to visualise predictions with upper and lower confidence intervals if you have two dependent variables.

To get a 2D grid of probability predictions over varying values of crop and pop, we can do

library(glmmTMB)
model &lt;- glmmTMB(animal ~ crop + pop, family = &quot;binomial&quot;, data = dummy)
nd &lt;- expand.grid(crop = seq(min(dummy$crop), max(dummy$crop), length = 10),
                  pop  = seq(-0.2, -0.15, length = 10))
preds &lt;- predict(model, newdata = nd, se.fit = TRUE)
nd$pred  &lt;- plogis(preds$fit)
nd$lower &lt;- plogis(preds$fit - 1.96 * preds$se.fit)
nd$upper &lt;- plogis(preds$fit + 1.96 * preds$se.fit)
head(nd)
#&gt;          crop  pop      pred     lower     upper
#&gt; 1 -0.95028972 -0.2 0.8231875 0.6957251 0.9045785
#&gt; 2 -0.63842177 -0.2 0.7764348 0.6327054 0.8750293
#&gt; 3 -0.32655383 -0.2 0.7215021 0.5445071 0.8488165
#&gt; 4 -0.01468589 -0.2 0.6589997 0.4379221 0.8273954
#&gt; 5  0.29718206 -0.2 0.5904329 0.3280019 0.8098054
#&gt; 6  0.60905000 -0.2 0.5181596 0.2299935 0.7947321

When it comes to plotting, we can show the prediction grid as a geom_tile as follows:

library(ggplot2)
ggplot(nd, aes(crop, pop, fill = pred)) +
  geom_tile() +
  scale_fill_viridis_c(&quot;Probability of\nAnimal presence&quot;) +
  coord_fixed(60) +
  theme_minimal(base_size = 16)

If you really want to see the confidence bands, you will need to use facets:

ggplot(within(nd, pop &lt;- factor(paste(&quot;pop =&quot;, round(pop, 3)))),
              aes(crop, pred)) +
  geom_errorbar(aes(ymin = lower, ymax = upper), alpha = 0.2) +
  geom_point() +
  facet_wrap(. ~ pop) +
  theme_bw()

答案2

得分: 0

最好的方法是在运行模型之前不要重新缩放变量，而是在模型公式中提供scale(variables)。这样，当你为预测创建一个新的数据框时，值会自动取消缩放。

英文:

I found out the best way is to not rescale variables before running a model, but supplying scale(variables) to the model formula. This way, when you create a new data frame for predictions, the values are automatically unscaled.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

手动绘制未缩放数值的GLM预测结果。

问题

答案2

Conditional text colour based on background colour (High contrast text) for ggplot (geom_col, and geom_text)

Magrittr Pipes 的五种不同类型和使用案例是什么？ %>%，%<>%，%$%，%!>% 和 %T>%？

如何在HTML表格中实现展开/折叠行

如何使用两个或更多行的聚合来创建新行？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。