Group vs individual data: 如何校正标准误差?

huangapple go评论58阅读模式
英文:

Group vs individual data: how to correct standard errors?

问题

我有一组5个变量:Y是响应变量,而X1、X2、X3和X4是预测变量。共有3,000个观察值,我在R中拟合了一个回归模型:

fit <- lm(Y ~ X1 + X2 +X3 + x4, data = mydata)

问题是,我的数据涉及3,000个小单位(教区),这些单位嵌套在40个较大单位(县)中。Y、X1和X2是在教区级别上测量的,而X3和X4仅在县级别上可用。我担心lm函数会低估最后两个变量的系数的标准误差。有没有办法纠正这个问题?

英文:

I have a set of 5 variables: Y is a response, and X1, X2, x3, and X4 are the predictors. There are 3,000 observations, and I fit a regression model in R:

fit <- lm(Y ~ X1 + X2 +X3 + x4, data = mydata)

The problem is that my data relate to 3,000 small units (parishes) nested in 40 larger units (counties), and Y, X1, and X2 are measured at the parish level, whilst X3 and X4 are only available at a county level. I am concerned that the lm function will underestimate the standard errors for the coefficients for the last two variables. Is there a way to correct this?

答案1

得分: 3

混合模型!类似于

library(lmerTest)  ## 这样你可以得到固定效应的 p 值
fit <- lmer(Y ~ X1 + X2 + X3 + X4 + (1 + X1 + X2 | county), data = mydata)

随机效应项允许截距以及 X1X2 的教区级效应在不同县之间变化。

这里的逻辑是,固定效应 X1 + x2 + X3 + X4 指定模型应该允许这些变量影响总体水平的响应(即总体水平),而随机效应允许截距和 X1X2 的效应在不同县之间变化。一般规则是:(1)我们可以在随机效应项中包括在分组变量的级别变化的任何变量(即因为 X1X2 在教区级别变化,我们可以估计它们在不同县之间的效应的方差;由于 X3X4 仅在县之间而不是县内变化,我们不能估计它们的效应在不同县之间的变化);(2)一般来说,因为随机效应是零中心化的,我们通常应该包括与每个随机效应项相对应的固定效应。

值得一提的是,你可以使用 equatiomatic 包提取 LaTeX 格式的模型规范,参见 vignette

英文:

Mixed models! Something like

library(lmerTest)  ## so you can get p-values on fixed effects
fit &lt;- lmer(Y ~ X1 + X2 + X3 + X4 + (1 + X1 + X2 | county), data = mydata)

The random-effect term allows the intercept, as well as the parish-level effect of X1 and X2, to vary across counties.

The logic here is that the fixed effects X1 + x2 + X3 + X4 specify that the model should allow these variables to affect the response at the population level (i.e., overall), while the random effects allow for the intercepts and the effects of X1 and X2 to vary across counties. The general rules are that (1) we can include any variable in a random-effects term that varies within the levels of the grouping variables (i.e., since X1 and X2 vary at the parish level, we can estimate the variance of their effects across counties; since X3 and X4 vary only between and not within counties, we cannot estimate the variation of their effects across counties) and (2) in general, because the random effects are zero-centered, we should usually include a fixed effect corresponding to each random-effects term.

For what it's worth you can use the equatiomatic package to extract LaTeX-formatted model specifications, see the vignette.

huangapple
  • 本文由 发表于 2023年5月18日 08:04:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/76276919.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定