Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

huangapple go评论59阅读模式
英文:

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

问题

我刚刚开始使用R中的survivalsurvminer包,并试图理解它的输出。在下面的代码中,我创建了一个包含实际数据集前12行的数据框,作为问题的代表。在这个代表性数据中:

  • ID = 每个元素的唯一标识符
  • time = 元素的生存时间,以月为单位,其中值> 0表示死亡(死亡发生的月份),值= 0表示在研究期间没有死亡(右侧截尾)
  • status = 元素的截尾状态,其中1=截尾,2=死亡
  • node = 与每个元素相关的变量之一,我试图评估它与死亡概率的关系

运行length(which(testDF$status == 2))/nrow(testDF)显示了这些数据的死亡率为66.67%,但下面图像中显示的生存概率曲线在0%结束。它们不应该至少以所有数据的平均值66.67%结束吗?我在这里做错了什么,还是我误解了生存概率?

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

代码:

library(ggplot2)
library(survival)
library(survminer)

testDF <- data.frame(
  ID = 1:12,
  time = c(0,34,0,12,12,21,0,0,39,11,13,26),
  status = c(1,2,1,2,2,2,1,1,2,2,2,2),
  node = c("C","C","B","A","C","C","B","C","B","C","A","B")
)

fit <- survfit(Surv(time, status) ~ node, data = testDF)

ggsurvplot(fit,
           pval = TRUE, 
           conf.int = TRUE,
           linetype = "strata",
           surv.median.line = "hv",
           ggtheme = theme_bw()
           )

# 死亡百分比
length(which(testDF$status == 2))/nrow(testDF)
英文:

I've just started using the survival and survminer packages in R and am trying to understand its output. In the code below I create a dataframe with the first 12 rows of my actual dataset, as representative of the issue/question. In this representative data:

  • ID = unique identifier for each element
  • time = survival time for the element in months where value > 0 means death (the month that death occurs) and value = 0 means no death (right censored) during the study period
  • status = the element's censoring status where 1=censored and 2=dead
  • node = one of the variables associated with each element where I try to assess its association with the probability of death

Running length(which(testDF$status == 2))/nrow(testDF) shows a death rate of 66.67% with this data, but the survival probability curves shown in the image below end at 0%. Should they not be ending at 66.67% at least for the average of all the data? What am I doing wrong here or am I misinterpreting survival probability?

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

Code:

library(ggplot2)
library(survival)
library(survminer)

testDF &lt;- data.frame(
  ID = 1:12,
  time = c(0,34,0,12,12,21,0,0,39,11,13,26),
  status = c(1,2,1,2,2,2,1,1,2,2,2,2),
  node = c(&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;)
)

fit &lt;- survfit(Surv(time, status) ~ node, data = testDF)

ggsurvplot(fit,
           pval = TRUE, 
           conf.int = TRUE,
           linetype = &quot;strata&quot;,
           surv.median.line = &quot;hv&quot;,
           ggtheme = theme_bw()
           )

# percentage of deaths
length(which(testDF$status == 2))/nrow(testDF)

答案1

得分: 0

我将不会提供代码的翻译,只是返回你想要的翻译部分:

我的对“时间”列中的censored观察(没有死亡,幸存者)进行零编码的代码是错误的,正如Edward在他的评论中指出的那样。现在,我重新对那些幸存者的观察进行编码,研究期限为40个月。我还重新运行绘图,去除置信区间以提高解决方案的清晰度。

英文:

My coding of censored (no death, the survivors) observations in the "time" column with 0's was incorrect as Edward points out in his comments. Now I recode those survivor observations with the time length of the study of 40. I also re-run the plot without confidence intervals for solution clarity.

testDF &lt;- data.frame(
  ID = 1:12,
  time = c(40,34,40,12,12,21,40,40,39,11,13,26), # 40 month study window (0&#39;s for no death changed to 40)
  status = c(0,1,0,1,1,1,0,0,1,1,1,1), # 0 = censored, 1 = death
  node = c(&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;)
)

# survival rates: total = 33.3%, node A = 0%, node B = 50%, node C = 33.3%
length(which(testDF$status == 0))/nrow(testDF)
length(which(testDF$status == 0 &amp; testDF$node == &quot;A&quot;))/length(which(testDF$node == &quot;A&quot;))
length(which(testDF$status == 0 &amp; testDF$node == &quot;B&quot;))/length(which(testDF$node == &quot;B&quot;))
length(which(testDF$status == 0 &amp; testDF$node == &quot;C&quot;))/length(which(testDF$node == &quot;C&quot;))

Plot running the above revised DF:

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

huangapple
  • 本文由 发表于 2023年2月27日 16:49:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75578365.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定