如何在R中复制生存分析并获得与Stata中获得的完全相同的标准误差?

huangapple go评论65阅读模式
英文:

How to get the exact same standard errors obtained in Stata when reproducing survival analysis in R?

问题

我正在R中重新生成一些在期刊上发布的生存分析结果。原始结果是在Stata中生成的。以下是原始结果:

如何在R中复制生存分析并获得与Stata中获得的完全相同的标准误差?

以下是在R中生成这些结果的代码:

# 载入包
library(dplyr)
library(foreign)
library(msm)
library(stargazer)

# 载入Svolik的原始数据
data = read_stata("leaders, institutions, covariates, updated tvc.dta")

# 为每一行设置t0
data = mutate(data, t0 = lag(t, default = 0), .by = leadid)

# 政变生存对象原始
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)

# 政变模型原始
coups_original <- coxph(survobj_coup ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age,
      data = data, ties = "breslow")

# 叛乱生存对象原始
survobj_revolt = Surv(data[["t0"]], data[["_t"]], data$c_revolt)

# 叛乱模型原始
revolt_original <- coxph(survobj_revolt ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + mil + cw + age,
                        data = data, ties = "breslow")

# 自然原因生存对象原始
survobj_natural = Surv(data[["t0"]], data[["_t"]], data$c_natural)

# 自然原因模型原始
natural_original <- coxph(survobj_natural ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age,
                        data = data, ties = "breslow")

# 定义指数化系数的函数
exp_coef <- function(x) {exp(x) }

# 使用stargazer创建表格
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef, p.auto = FALSE)

虽然我能够生成完全相同的系数(除了四舍五入略有差异),具有完全相同的显著性水平,但标准误差不匹配。例如,在图中的模型1中(Natural Causes的第一列),我得到的Legislature系数的标准误差是0.414,而不是0.198(0.456*)。我读到了这些差异可能是由于标准误差如何转换而引起的(可能与delta方法有关)。是否有人有任何建议?谢谢。

英文:

I am reproducing in R some survival analysis results published in a journal. The original results were produced in Stata. Here are the original results:

如何在R中复制生存分析并获得与Stata中获得的完全相同的标准误差?

Here is the code to produce these results in R:

# load packages
library(dplyr)
library(foreign)
library(msm)
library(stargazer)

# load Svolik&#39;s original data 
data = read_stata(&quot;leaders, institutions, covariates, updated tvc.dta&quot;)

# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)

# coup survival object original
survobj_coup = Surv(data[[&quot;t0&quot;]], data[[&quot;_t&quot;]], data$c_coup)

# coup model original
coups_original &lt;- coxph(survobj_coup~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
      data=data, ties=&quot;breslow&quot;)

# revolt survival object original 
survobj_revolt = Surv(data[[&quot;t0&quot;]], data[[&quot;_t&quot;]], data$c_revolt)

# revolt model original 
revolt_original &lt;- coxph(survobj_revolt~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ mil+ cw+ age, 
                        data=data, ties=&quot;breslow&quot;)

# natural survival object original
survobj_natural = Surv(data[[&quot;t0&quot;]], data[[&quot;_t&quot;]], data$c_natural)

# natural model original
natural_original &lt;- coxph(survobj_natural~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
                        data=data, ties=&quot;breslow&quot;)

# Define a function to exponentiate coefficients
exp_coef &lt;- function(x) {exp(x) }

# Create the table using stargazer
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef, p.auto = FALSE)

While I am able to produce the exact same coefficients (save for slight differences in rounding) with the exact same significance levels, the standard errors do not match. For example, in Model 1 in the figure (first column in Natural Causes), I obtain a standard error of 0.414 rather than 0.198 for the coefficient on Legislature (0.456*). I was reading that the differences may be due to how the standard errors are transformed (something to do with the delta method perhaps). Does anyone have any advice? Thanks.

答案1

得分: 2

快速答案是它们测量不同的东西。在您的示例中,Stata报告了风险比的近似标准误差,而R报告了系数 se(coef) 的标准误差。我使用了Stata手册中的一个示例:

library(survival)
library(webuse)

webuse("drugtr")
m_1 <- coxph(Surv(studytime, died) ~ drug + age, drugtr)
summary(m_1)

在Stata中,'drug' 的标准误差为0.0477017(风险比的标准误差),但在R中为0.46052(系数的标准误差)。

要获得与Stata相同(或类似)的标准误差值,您可以从R包中借用一些函数。我使用了car包中的 deltaMethod 函数:

library(car)
deltaMethod(m_1, "exp(drug)")

现在它是0.044426,与使用Stata的值类似。可能还有其他替代方法。

英文:

Quick answer is they measure different things. In your example, Stata reports approximate standard errors of HRs, while R reports SE of coefficient se(coef). I used an example from Stata manual:

library(survival)
library(webuse)

webuse(&quot;drugtr&quot;)
m_1 &lt;- coxph(Surv(studytime, died) ~ drug + age, drugtr)
summary(m_1)

SE for 'drug' is .0477017 in Stata (SE for HR), but 0.46052 in R (SE for coefficient).

To get the same (or similar) SE values as Stata, you could borrow some functions from R packages. I used deltaMethod in car package.:

library(car)
deltaMethod(m_1, &quot;exp(drug)&quot;)

Now it is 0.044426, similar to the value using Stata. There may be other alternatives.

huangapple
  • 本文由 发表于 2023年6月9日 09:08:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76436580.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定