2023年2月27日 16:49:17go评论106阅读模式

英文:

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

问题

我刚刚开始使用R中的survival和survminer包，并试图理解它的输出。在下面的代码中，我创建了一个包含实际数据集前12行的数据框，作为问题的代表。在这个代表性数据中：

ID = 每个元素的唯一标识符
time = 元素的生存时间，以月为单位，其中值> 0表示死亡（死亡发生的月份），值= 0表示在研究期间没有死亡（右侧截尾）
status = 元素的截尾状态，其中1=截尾，2=死亡
node = 与每个元素相关的变量之一，我试图评估它与死亡概率的关系

运行length(which(testDF$status == 2))/nrow(testDF)显示了这些数据的死亡率为66.67%，但下面图像中显示的生存概率曲线在0%结束。它们不应该至少以所有数据的平均值66.67%结束吗？我在这里做错了什么，还是我误解了生存概率？

代码：

library(ggplot2)
library(survival)
library(survminer)
testDF <- data.frame(
  ID = 1:12,
  time = c(0,34,0,12,12,21,0,0,39,11,13,26),
  status = c(1,2,1,2,2,2,1,1,2,2,2,2),
  node = c("C","C","B","A","C","C","B","C","B","C","A","B")
)
fit <- survfit(Surv(time, status) ~ node, data = testDF)
ggsurvplot(fit,
           pval = TRUE, 
           conf.int = TRUE,
           linetype = "strata",
           surv.median.line = "hv",
           ggtheme = theme_bw()
           )
# 死亡百分比
length(which(testDF$status == 2))/nrow(testDF)

英文:

I've just started using the survival and survminer packages in R and am trying to understand its output. In the code below I create a dataframe with the first 12 rows of my actual dataset, as representative of the issue/question. In this representative data:

ID = unique identifier for each element
time = survival time for the element in months where value > 0 means death (the month that death occurs) and value = 0 means no death (right censored) during the study period
status = the element's censoring status where 1=censored and 2=dead
node = one of the variables associated with each element where I try to assess its association with the probability of death

Running length(which(testDF$status == 2))/nrow(testDF) shows a death rate of 66.67% with this data, but the survival probability curves shown in the image below end at 0%. Should they not be ending at 66.67% at least for the average of all the data? What am I doing wrong here or am I misinterpreting survival probability?

Code:

library(ggplot2)
library(survival)
library(survminer)
testDF &lt;- data.frame(
  ID = 1:12,
  time = c(0,34,0,12,12,21,0,0,39,11,13,26),
  status = c(1,2,1,2,2,2,1,1,2,2,2,2),
  node = c(&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;)
)
fit &lt;- survfit(Surv(time, status) ~ node, data = testDF)
ggsurvplot(fit,
           pval = TRUE, 
           conf.int = TRUE,
           linetype = &quot;strata&quot;,
           surv.median.line = &quot;hv&quot;,
           ggtheme = theme_bw()
           )
# percentage of deaths
length(which(testDF$status == 2))/nrow(testDF)

答案1

得分: 0

我将不会提供代码的翻译，只是返回你想要的翻译部分：

我的对“时间”列中的censored观察（没有死亡，幸存者）进行零编码的代码是错误的，正如Edward在他的评论中指出的那样。现在，我重新对那些幸存者的观察进行编码，研究期限为40个月。我还重新运行绘图，去除置信区间以提高解决方案的清晰度。

英文:

My coding of censored (no death, the survivors) observations in the "time" column with 0's was incorrect as Edward points out in his comments. Now I recode those survivor observations with the time length of the study of 40. I also re-run the plot without confidence intervals for solution clarity.

testDF &lt;- data.frame(
  ID = 1:12,
  time = c(40,34,40,12,12,21,40,40,39,11,13,26), # 40 month study window (0&#39;s for no death changed to 40)
  status = c(0,1,0,1,1,1,0,0,1,1,1,1), # 0 = censored, 1 = death
  node = c(&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;A&quot;,&quot;C&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;B&quot;,&quot;C&quot;,&quot;A&quot;,&quot;B&quot;)
)
# survival rates: total = 33.3%, node A = 0%, node B = 50%, node C = 33.3%
length(which(testDF$status == 0))/nrow(testDF)
length(which(testDF$status == 0 &amp; testDF$node == &quot;A&quot;))/length(which(testDF$node == &quot;A&quot;))
length(which(testDF$status == 0 &amp; testDF$node == &quot;B&quot;))/length(which(testDF$node == &quot;B&quot;))
length(which(testDF$status == 0 &amp; testDF$node == &quot;C&quot;))/length(which(testDF$node == &quot;C&quot;))

Plot running the above revised DF:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Why does the survival probability of the survival package return 0% at the end of the time horizon when there are survivors in the dataset?

问题

答案1

提取半结构化 .txt 中的文本和表格。

如何按自定义顺序排列数据框列中的字符向量？

改进循环，使用mutate函数。

如何最好地按区域求和

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。