2023年6月9日 09:08:21go评论110阅读模式

英文:

How to get the exact same standard errors obtained in Stata when reproducing survival analysis in R?

问题

我正在R中重新生成一些在期刊上发布的生存分析结果。原始结果是在Stata中生成的。以下是原始结果：

以下是在R中生成这些结果的代码：

# 载入包
library(dplyr)
library(foreign)
library(msm)
library(stargazer)
# 载入Svolik的原始数据
data = read_stata("leaders, institutions, covariates, updated tvc.dta")
# 为每一行设置t0
data = mutate(data, t0 = lag(t, default = 0), .by = leadid)
# 政变生存对象原始
survobj_coup = Surv(data[["t0"]], data[["_t"]], data$c_coup)
# 政变模型原始
coups_original <- coxph(survobj_coup ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age,
      data = data, ties = "breslow")
# 叛乱生存对象原始
survobj_revolt = Surv(data[["t0"]], data[["_t"]], data$c_revolt)
# 叛乱模型原始
revolt_original <- coxph(survobj_revolt ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + mil + cw + age,
                        data = data, ties = "breslow")
# 自然原因生存对象原始
survobj_natural = Surv(data[["t0"]], data[["_t"]], data$c_natural)
# 自然原因模型原始
natural_original <- coxph(survobj_natural ~ legislature + lgdp_1 + growth_1 + exportersoffuelsmainlyoil_EL2008 + ethfrac_FIXED + communist + mil + cw + age,
                        data = data, ties = "breslow")
# 定义指数化系数的函数
exp_coef <- function(x) {exp(x) }
# 使用stargazer创建表格
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef, p.auto = FALSE)

虽然我能够生成完全相同的系数（除了四舍五入略有差异），具有完全相同的显著性水平，但标准误差不匹配。例如，在图中的模型1中（Natural Causes的第一列），我得到的Legislature系数的标准误差是0.414，而不是0.198（0.456*）。我读到了这些差异可能是由于标准误差如何转换而引起的（可能与delta方法有关）。是否有人有任何建议？谢谢。

英文:

I am reproducing in R some survival analysis results published in a journal. The original results were produced in Stata. Here are the original results:

Here is the code to produce these results in R:

# load packages
library(dplyr)
library(foreign)
library(msm)
library(stargazer)
# load Svolik&#39;s original data 
data = read_stata(&quot;leaders, institutions, covariates, updated tvc.dta&quot;)
# set a t0 for each row
data = mutate(data,t0 = lag(t,default=0), .by=leadid)
# coup survival object original
survobj_coup = Surv(data[[&quot;t0&quot;]], data[[&quot;_t&quot;]], data$c_coup)
# coup model original
coups_original &lt;- coxph(survobj_coup~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
      data=data, ties=&quot;breslow&quot;)
# revolt survival object original 
survobj_revolt = Surv(data[[&quot;t0&quot;]], data[[&quot;_t&quot;]], data$c_revolt)
# revolt model original 
revolt_original &lt;- coxph(survobj_revolt~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ mil+ cw+ age, 
                        data=data, ties=&quot;breslow&quot;)
# natural survival object original
survobj_natural = Surv(data[[&quot;t0&quot;]], data[[&quot;_t&quot;]], data$c_natural)
# natural model original
natural_original &lt;- coxph(survobj_natural~legislature +  lgdp_1+ growth_1 +exportersoffuelsmainlyoil_EL2008+ ethfrac_FIXED+ communist+ mil+ cw+ age, 
                        data=data, ties=&quot;breslow&quot;)
# Define a function to exponentiate coefficients
exp_coef &lt;- function(x) {exp(x) }
# Create the table using stargazer
stargazer(natural_original, coups_original, revolt_original, apply.coef = exp_coef, p.auto = FALSE)

While I am able to produce the exact same coefficients (save for slight differences in rounding) with the exact same significance levels, the standard errors do not match. For example, in Model 1 in the figure (first column in Natural Causes), I obtain a standard error of 0.414 rather than 0.198 for the coefficient on Legislature (0.456*). I was reading that the differences may be due to how the standard errors are transformed (something to do with the delta method perhaps). Does anyone have any advice? Thanks.

答案1

得分: 2

快速答案是它们测量不同的东西。在您的示例中，Stata报告了风险比的近似标准误差，而R报告了系数 se(coef) 的标准误差。我使用了Stata手册中的一个示例：

library(survival)
library(webuse)
webuse("drugtr")
m_1 <- coxph(Surv(studytime, died) ~ drug + age, drugtr)
summary(m_1)

在Stata中，'drug' 的标准误差为0.0477017（风险比的标准误差），但在R中为0.46052（系数的标准误差）。

要获得与Stata相同（或类似）的标准误差值，您可以从R包中借用一些函数。我使用了car包中的 deltaMethod 函数：

library(car)
deltaMethod(m_1, "exp(drug)")

现在它是0.044426，与使用Stata的值类似。可能还有其他替代方法。

英文:

Quick answer is they measure different things. In your example, Stata reports approximate standard errors of HRs, while R reports SE of coefficient se(coef). I used an example from Stata manual:

library(survival)
library(webuse)
webuse(&quot;drugtr&quot;)
m_1 &lt;- coxph(Surv(studytime, died) ~ drug + age, drugtr)
summary(m_1)

SE for 'drug' is .0477017 in Stata (SE for HR), but 0.46052 in R (SE for coefficient).

To get the same (or similar) SE values as Stata, you could borrow some functions from R packages. I used deltaMethod in car package.:

library(car)
deltaMethod(m_1, &quot;exp(drug)&quot;)

Now it is 0.044426, similar to the value using Stata. There may be other alternatives.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中复制生存分析并获得与Stata中获得的完全相同的标准误差？

问题

答案1

为什么 ylim 和 xlim 在成对绘图中不起作用？

In R leaflet interactive map, all my values get incorrectly displayed (while values inside data frames are all correct)

白色间隙/空白空间与堆叠面积图（ggplot）

如何在R中对具有相同列数值的行进行排序

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。