fit <- lm(Y ~ X1 + X2 +X3 + x4, data = mydata)



I have a set of 5 variables: Y is a response, and X1, X2, x3, and X4 are the predictors. There are 3,000 observations, and I fit a regression model in R:

fit <- lm(Y ~ X1 + X2 +X3 + x4, data = mydata)

The problem is that my data relate to 3,000 small units (parishes) nested in 40 larger units (counties), and Y, X1, and X2 are measured at the parish level, whilst X3 and X4 are only available at a county level. I am concerned that the lm function will underestimate the standard errors for the coefficients for the last two variables. Is there a way to correct this?


library(lmerTest)  ## 这样你可以得到固定效应的 p 值
fit <- lmer(Y ~ X1 + X2 + X3 + X4 + (1 + X1 + X2 | county), data = mydata)

随机效应项允许截距以及 X1X2 的教区级效应在不同县之间变化。

这里的逻辑是,固定效应 X1 + x2 + X3 + X4 指定模型应该允许这些变量影响总体水平的响应(即总体水平),而随机效应允许截距和 X1X2 的效应在不同县之间变化。一般规则是:(1)我们可以在随机效应项中包括在分组变量的级别变化的任何变量(即因为 X1X2 在教区级别变化,我们可以估计它们在不同县之间的效应的方差;由于 X3X4 仅在县之间而不是县内变化,我们不能估计它们的效应在不同县之间的变化);(2)一般来说,因为随机效应是零中心化的,我们通常应该包括与每个随机效应项相对应的固定效应。

值得一提的是,你可以使用 equatiomatic 包提取 LaTeX 格式的模型规范,参见 vignette


Mixed models! Something like

library(lmerTest)  ## so you can get p-values on fixed effects
fit &lt;- lmer(Y ~ X1 + X2 + X3 + X4 + (1 + X1 + X2 | county), data = mydata)

The random-effect term allows the intercept, as well as the parish-level effect of X1 and X2, to vary across counties.

The logic here is that the fixed effects X1 + x2 + X3 + X4 specify that the model should allow these variables to affect the response at the population level (i.e., overall), while the random effects allow for the intercepts and the effects of X1 and X2 to vary across counties. The general rules are that (1) we can include any variable in a random-effects term that varies within the levels of the grouping variables (i.e., since X1 and X2 vary at the parish level, we can estimate the variance of their effects across counties; since X3 and X4 vary only between and not within counties, we cannot estimate the variation of their effects across counties) and (2) in general, because the random effects are zero-centered, we should usually include a fixed effect corresponding to each random-effects term.

For what it's worth you can use the equatiomatic package to extract LaTeX-formatted model specifications, see the vignette.

