英文:
Diff in Diff with panel dataset on R
问题
我有一个面板数据集,我想进行差异分析。现在这是我的回归模型:
fit3 <- glm(df$empstat ~ factor(year) + factor(stateicp) + migrant_category + treated*post + treated*migrant_category
+ post*migrant_category + treated*post*migrant_category + race + educ + age +
marst, data = df, weights = perwt, family = 'gaussian'
)
但这会让R假定每个观察都是相互独立的吗?如果是的话,我应该怎么做才能让R意识到这是一个面板数据集?
英文:
I have a panel dataset that I'd like to conduct diff in diff on. Right now this is my regression:
fit3 <- glm(df$empstat ~ factor(year) + factor(stateicp) + migrant_category + treated*post + treated*migrant_category
+ post*migrant_category + treated*post*migrant_category + race + educ + age +
marst, data = df, weights = perwt, family = 'gaussian'
)
but will this make R assume that each observation is independent of each other? If yes, what should I do to make R realize that this is a panel data?
答案1
得分: 0
如果您对固定效应模型和差异和差异感兴趣,请使用 plm
软件包。以下是来自Christopher Zorn的示例:
# 面板数据
WDI<-read_csv("https://github.com/PrisonRodeo/GSERM-Ljubljana-APD-git/raw/main/Data/WDI3.csv")
# 添加“冷战”变量:
WDI$ColdWar <- with(WDI, ifelse(Year < 1990, 1, 0))
# 保留一个数值年份变量(用于 -panelAR-):
WDI$YearNumeric <- WDI$Year
# 将数据转换为面板数据框:
WDI <- pdata.frame(WDI, index = c("ISO3", "Year"))
# 仅提取那些在观察期间的某个时刻实施有薪育儿假政策的国家:
WDI <- WDI %>%
group_by(ISO3) %>%
filter(any(PaidParentalLeave == 1))
# 创建更好的趋势变量:
WDI$Time <- WDI$YearNumeric - 1950
# 固定效应模型...
fe1 <- plm(ChildMortality ~ PaidParentalLeave + Time +
PaidParentalLeave * Time, data = WDI,
effect = "individual", model = "within")
fe2 <- plm(ChildMortality ~ PaidParentalLeave + Time +
PaidParentalLeave * Time + log(GDPPerCapita) +
log(NetAidReceived) + GovtExpenditures,
data = WDI, effect = "individual", model = "within")
fe3 <- plm(ChildMortality ~ PaidParentalLeave + Time +
PaidParentalLeave * Time, data = WDI,
effect = "twoway", model = "within")
fe4 <- plm(ChildMortality ~ PaidParentalLeave + Time +
PaidParentalLeave * Time + log(GDPPerCapita) +
log(NetAidReceived) + GovtExpenditures,
data = WDI, effect = "twoway", model = "within")
# 表格时间
stargazer(fe1, fe2, fe3, fe4,
title = "DiD Models of log(Child Mortality)",
column.separate = c(1, 1, 1), align = TRUE,
dep.var.labels.include = FALSE,
dep.var.caption = "",
covariate.labels = c("Paid Parental Leave", "Time (1950=0)",
"Paid Parental Leave x Time",
"ln(GDP Per Capita)",
"ln(Net Aid Received)",
"Government Expenditures"),
header = FALSE, model.names = FALSE,
model.numbers = FALSE, multicolumn = FALSE,
object.names = TRUE, notes.label = "",
column.sep.width = "-15pt",
omit.stat = c("f", "ser"), type = "text")
DiD模型的log(儿童死亡率)
fe1 fe2 fe3 fe4
Paid Parental Leave -15.500*** -26.200*** -12.500*** -17.300*
(2.420) (7.220) (2.960) (9.360)
Time (1950=0) -0.838*** -1.480***
(0.025) (0.094)
Paid Parental Leave x Time -7.110*** -4.910*
(2.290) (2.600)
ln(GDP Per Capita) -1.780*** -3.020***
(0.471) (0.552)
ln(Net Aid Received) 0.873*** 0.842***
(0.139) (0.146)
Government Expenditures 0.310*** 0.524*** 0.247*** 0.319*
(0.044) (0.128) (0.056) (0.169)
Observations 2,360 622 2,360 622
R2 0.496 0.717 0.009 0.143
Adjusted R2 0.485 0.701 -0.035 0.014
*p<0.1; **p<0.05; ***p<0.01
<details>
<summary>英文:</summary>
If you are interested in fixed effects models and difference in difference, use the `plm` package. Here is an example from Christopher Zorn:
# Panel data
WDI<-read_csv("https://github.com/PrisonRodeo/GSERM-Ljubljana-APD-git/raw/main/Data/WDI3.csv")
# Add a "Cold War" variable:
WDI$ColdWar <- with(WDI,ifelse(Year<1990,1,0))
# Keep a numeric year variable (for -panelAR-):
WDI$YearNumeric<-WDI$Year
# Make the data a panel dataframe:
WDI<-pdata.frame(WDI,index=c("ISO3","Year"))
# Pull out *only* those countries that, at some
# point during the observed periods, instituted
# a paid parental leave policy:
WDI<-WDI %>% group_by(ISO3) %>%
filter(any(PaidParentalLeave==1))
# Create a better trend variable:
WDI$Time<-WDI$YearNumeric-1950
# FE models...
fe1<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time,data=WDI,
effect="individual",model="within")
fe2<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time+log(GDPPerCapita)+
log(NetAidReceived)+GovtExpenditures,
data=WDI,effect="individual",model="within")
fe3<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time,data=WDI,
effect="twoway",model="within")
fe4<-plm(ChildMortality~PaidParentalLeave+Time+
PaidParentalLeave*Time+log(GDPPerCapita)+
log(NetAidReceived)+GovtExpenditures,
data=WDI,effect="twoway",model="within")
# TABLE TIME
stargazer(fe1,fe2,fe3,fe4,
title="DiD Models of log(Child Mortality)",
column.separate=c(1,1,1),align=TRUE,
dep.var.labels.include=FALSE,
dep.var.caption="",
covariate.labels=c("Paid Parental Leave","Time (1950=0)",
"Paid Parental Leave x Time",
"ln(GDP Per Capita)",
"ln(Net Aid Received)",
"Government Expenditures"),
header=FALSE,model.names=FALSE,
model.numbers=FALSE,multicolumn=FALSE,
object.names=TRUE,notes.label="",
column.sep.width="-15pt",
omit.stat=c("f","ser"),type="text")
DiD Models of log(Child Mortality)
=====================================================================
fe1 fe2 fe3 fe4
---------------------------------------------------------------------
Paid Parental Leave -15.500*** -26.200*** -12.500*** -17.300*
(2.420) (7.220) (2.960) (9.360)
Time (1950=0) -0.838*** -1.480***
(0.025) (0.094)
Paid Parental Leave x Time -7.110*** -4.910*
(2.290) (2.600)
ln(GDP Per Capita) -1.780*** -3.020***
(0.471) (0.552)
ln(Net Aid Received) 0.873*** 0.842***
(0.139) (0.146)
Government Expenditures 0.310*** 0.524*** 0.247*** 0.319*
(0.044) (0.128) (0.056) (0.169)
---------------------------------------------------------------------
Observations 2,360 622 2,360 622
R2 0.496 0.717 0.009 0.143
Adjusted R2 0.485 0.701 -0.035 0.014
=====================================================================
*p<0.1; **p<0.05; ***p<0.01
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论