英文:
(Cumulative) baseline hazard in cox models with time-dependent coefficients
问题
我想知道是否有一种简单的方法来估计考虑不同时间间隔上的时间变化系数的Cox模型的(累积)基线风险。在使用 survSplit()
创建时间分割数据之后,使用 predict.coxph()
方法,参数为 type='expected',会给出每行的预期值,我猜测这考虑了同一受试者的效应多次。是否有一种简单的方法来获得这些估计值?我的想法正确吗?让我们通过一个示例进一步讨论:
library(survival)
library(riskRegression)
data(Melanoma)
d <- survSplit(formula = Surv(time,status==1)~age+sex+epicel,
data = Melanoma,
cut = 1095,
episode = 'tgroup',
id = 'id')
fit <- coxph(Surv(tstart,time,event)~age:strata(tgroup)+sex+epicel,data=d,x=TRUE)
例如,对于一个在时间=1700被截尾的患者,如果我们想要在时间800和1500估计基线风险,患者在两个时间点都处于风险集中,但具有不同的线性预测器(因为时间点在时间=1095设置)。似乎 predict.coxph()
不考虑这一点。我的想法正确吗?是否可以调整 predict.coxph()
?是否有其他函数可以自动执行此操作,还是我需要自己编写函数?我想使用这些值来获得每位患者的绝对风险估计。提前感谢您的回答。
我尝试了上述示例作为演示。
英文:
I would like to know if there is an easy way to estimate the (cumulative) baseline hazard from a cox model with time-varying coefficients over different time intervals. After creating the time splitted data with survSplit()
, the predict.coxph()
method with type='expected' gives expected values per row which I guess considers the effect of same subjects multiple times. Is there an easy way to obtain these estimates? Am I thinking correctly? Let's discuss it further through an example:
library(survival)
library(riskRegression)
data(Melanoma)
d <- survSplit(formula = Surv(time,status==1)~age+sex+epicel,
data = Melanoma,
cut = 1095,
episode = 'tgroup',
id = 'id')
fit <- coxph(Surv(tstart,time,event)~age:strata(tgroup)+sex+epicel,data=d,x=TRUE)
Now for example, for a patient who got censored at time=1700, if we want to estimate the baseline hazard at times 800 & 1500, the patient is in the risk set for both times but with different linear predictors (as the cut-point in time was set at time=1095). Looks like predict.coxph()
doesn't take this into account. Am I thinking correctly? Is there an adjustment to predict.coxph()
? Are there any other functions to do this automatically or do I need to start writing the function myself? I want to use these values to obtain absolute risk estimates of each patient. Thanks in advance.
I tried the above example as demonstration.
答案1
得分: 0
我成功找到了一种处理它的方法。如果survfit
函数的newdata
参数也以时间分割的数据格式进行整理,它就可以处理基于时间变化系数的时间分割数据。在这里,我们将患者的所有基线变量信息整理成时间分割的格式,以获取累积基线风险的值。
library(tidyverse)
nd <- data.frame(tstart=c(0,1095),
time=c(1095,max(Melanoma$time)),
event=0,
tgroup=1:2,
id=1,
age=0,
sex='Female',
epicel='not present')
sfit <- survfit(fit,newdata=nd,id=id)
ggplot(data=tibble('time'=sfit$time,'Cumulative baseline hazard'=sfit$cumhaz)) +
geom_line(aes(x=time,y=`Cumulative baseline hazard`),linewidth=.75) + theme_light()
然后,可以使用这些值以及线性预测器来形成绝对风险估计。这个vignette提供了关于时间变化协变量和系数的全面且有用的描述。这里的回答也非常有帮助。
英文:
I managed to figure out a way to deal with it myself. survfit
works with time-splitted data for time-varying coefficient if newdata argument is shaped in a time-splitted data format as well. Here we shape the information of a patient with all variables at baseline to get values of cumulative baseline hazard.
library(tidyverse)
nd <- data.frame(tstart=c(0,1095),
time=c(1095,max(Melanoma$time)),
event=0,
tgroup=1:2,
id=1,
age=0,
sex='Female',
epicel='not present')
sfit <- survfit(fit,newdata=nd,id=id)
ggplot(data=tibble('time'=sfit$time,'Cumulative baseline hazard'=sfit$cumhaz)) +
geom_line(aes(x=time,y=`Cumulative baseline hazard`),linewidth=.75) + theme_light()
These values along with linear predictors can then be used to shape the absolute risk estimates. This vignette is a comprehensive and helpful description of time-varying covariates & coefficients. The response here is also really helpful.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论