2023年5月18日 08:04:33go评论111阅读模式

英文:

Group vs individual data: how to correct standard errors?

问题

我有一组5个变量：Y是响应变量，而X1、X2、X3和X4是预测变量。共有3,000个观察值，我在R中拟合了一个回归模型：

fit <- lm(Y ~ X1 + X2 +X3 + x4, data = mydata)

问题是，我的数据涉及3,000个小单位（教区），这些单位嵌套在40个较大单位（县）中。Y、X1和X2是在教区级别上测量的，而X3和X4仅在县级别上可用。我担心lm函数会低估最后两个变量的系数的标准误差。有没有办法纠正这个问题？

英文:

I have a set of 5 variables: Y is a response, and X1, X2, x3, and X4 are the predictors. There are 3,000 observations, and I fit a regression model in R:

fit <- lm(Y ~ X1 + X2 +X3 + x4, data = mydata)

The problem is that my data relate to 3,000 small units (parishes) nested in 40 larger units (counties), and Y, X1, and X2 are measured at the parish level, whilst X3 and X4 are only available at a county level. I am concerned that the lm function will underestimate the standard errors for the coefficients for the last two variables. Is there a way to correct this?

答案1

得分: 3

混合模型！类似于

library(lmerTest)  ## 这样你可以得到固定效应的 p 值
fit <- lmer(Y ~ X1 + X2 + X3 + X4 + (1 + X1 + X2 | county), data = mydata)

随机效应项允许截距以及 X1 和 X2 的教区级效应在不同县之间变化。

这里的逻辑是，固定效应 X1 + x2 + X3 + X4 指定模型应该允许这些变量影响总体水平的响应（即总体水平），而随机效应允许截距和 X1、X2 的效应在不同县之间变化。一般规则是：（1）我们可以在随机效应项中包括在分组变量的级别内变化的任何变量（即因为 X1 和 X2 在教区级别变化，我们可以估计它们在不同县之间的效应的方差；由于 X3 和 X4 仅在县之间而不是县内变化，我们不能估计它们的效应在不同县之间的变化）；（2）一般来说，因为随机效应是零中心化的，我们通常应该包括与每个随机效应项相对应的固定效应。

值得一提的是，你可以使用 equatiomatic 包提取 LaTeX 格式的模型规范，参见 vignette。

英文:

Mixed models! Something like

library(lmerTest)  ## so you can get p-values on fixed effects
fit &lt;- lmer(Y ~ X1 + X2 + X3 + X4 + (1 + X1 + X2 | county), data = mydata)

The random-effect term allows the intercept, as well as the parish-level effect of X1 and X2, to vary across counties.

The logic here is that the fixed effects X1 + x2 + X3 + X4 specify that the model should allow these variables to affect the response at the population level (i.e., overall), while the random effects allow for the intercepts and the effects of X1 and X2 to vary across counties. The general rules are that (1) we can include any variable in a random-effects term that varies within the levels of the grouping variables (i.e., since X1 and X2 vary at the parish level, we can estimate the variance of their effects across counties; since X3 and X4 vary only between and not within counties, we cannot estimate the variation of their effects across counties) and (2) in general, because the random effects are zero-centered, we should usually include a fixed effect corresponding to each random-effects term.

For what it's worth you can use the equatiomatic package to extract LaTeX-formatted model specifications, see the vignette.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Group vs individual data: 如何校正标准误差？

问题

答案1

在ggplot2中添加图例中的额外分组。

抽样数据，更新因子水平数量

StatsModel线性回归：初始模型与简化模型 – 哪个更好？

删除包含特定字符串的所有行在 R 中

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。