2023年5月28日 23:12:53go评论626阅读模式

英文:

How to resolve an issue of large confidence intervals while running CoxPH analysis in R?

问题

我在使用以下示例数据集执行CoxPH分析时遇到了问题：

structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
 = c("靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗",
 "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗",
 "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗",
 "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗",
 "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗", "靶向治疗"), Time.on.systemic.Tx =
 c("2.069815195", "2.332648871", "2.069815195", "1.215605749",
 "2.661190965", "0.689938398", "1.839835729", "2.858316222",
 "0.657084189", "2.529774127", "1.80698152", "3.482546201",
 "2.891170431", "3.515400411", "2.431211499", "3.515400411",
 "1.347022587", "5.519507187", "17.47843943", "26.90759754",
 "6.176591376", "5.979466119", "8.246406571", "15.40862423",
 "5.749486653", "6.242299795", "5.683778234", "6.636550308",
 "10.15195072", "10.0862423", "18.52977413", "5.749486653",
 "10.7761807", "6.965092402"), PFS2 = c(2.595482546, 2.37, 2.069815195,
 1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526,
 0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667,
 2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532,
 7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187,
 5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388,
 18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,  1, 1,
 1, 1, 1, 0, 1, 1, 0, 1, 1, 1), Binarised_Time.on.Tx.2 = c("≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52 months", "≤ 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months", "> 3.52 months", "> 3.52 months", "> 3.52
 months",  "> 3.52 months")), row.names = c(NA, -34L), class =
 "data.frame")

这是我用于此分析的代码：

fit1 <- coxph(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
summary(fit1)

在运行此代码后，我收到了以下警告：

警告信息：在 coxph.fit(X, Y, istrat, offset, init, control, weights = weights, : 对于变量 1，对数似然在变量达到收敛之前就已经达到，系数可能为无穷大。

更重要的是，我收到了不正确的结果，因为置信区间从0到无穷大，系数和p值都非常高。我已经在相同数据集上使用相同的方法进行了总生存期分析，没有任何问题。对于我的PFS2值，

英文:

I am running into an issue while performing CoxPH analysis using the following sample dataset:

structure(list(Systemic.Tx...2.classification..Chemotherapy..PD1.monotherapy..PD.1.CTLA.4.combo..PD.1.chemo..targetted.Tx..targetted.chemo.combo..etc.
 = c(&quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,
 &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,
 &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,
 &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,
 &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,
 &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,
 &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;, &quot;Targetted/chemo combo&quot;, &quot;Targetted Tx&quot;, &quot;Targetted Tx&quot;,  &quot;Targetted Tx&quot;), Time.on.systemic.Tx =
 c(&quot;2.069815195&quot;, &quot;2.332648871&quot;,  &quot;2.069815195&quot;, &quot;1.215605749&quot;,
 &quot;2.661190965&quot;, &quot;0.689938398&quot;, &quot;1.839835729&quot;,  &quot;2.858316222&quot;,
 &quot;0.657084189&quot;, &quot;2.529774127&quot;, &quot;1.80698152&quot;, &quot;3.482546201&quot;, 
 &quot;2.891170431&quot;, &quot;3.515400411&quot;, &quot;2.431211499&quot;, &quot;3.515400411&quot;,
 &quot;1.347022587&quot;,  &quot;5.519507187&quot;, &quot;17.47843943&quot;, &quot;26.90759754&quot;,
 &quot;6.176591376&quot;, &quot;5.979466119&quot;,  &quot;8.246406571&quot;, &quot;15.40862423&quot;,
 &quot;5.749486653&quot;, &quot;6.242299795&quot;, &quot;5.683778234&quot;,  &quot;6.636550308&quot;,
 &quot;10.15195072&quot;, &quot;10.0862423&quot;, &quot;18.52977413&quot;, &quot;5.749486653&quot;, 
 &quot;10.7761807&quot;, &quot;6.965092402&quot;), PFS2 = c(2.595482546, 2.37, 2.069815195, 
1.412731006, 1.938398357, 0.657084189, 2.529774127, 3.219712526, 
 0.657084189, 2.529774127, 2.2, 3.482546201, 2.529774127, 3.712525667, 
 2.234086242, 3.778234086, 1.347022587, 5.55, 17.3798768, 30.32443532, 
 7.12936345, 7.09650924, 8.246406571, 15.24435318, 5.519507187, 
 5.749486653, 5.420944559, 6.636550308, 9.264887064, 10.02053388, 
 18.20123203, 6.110882957, 10.61190965, 6.866529774), PFS2_event = c(1,  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1,  1, 1,
 1, 1, 1, 0, 1, 1, 0, 1, 1, 1), Binarised_Time.on.Tx.2 = c(&quot;≤ 3.52
 months&quot;,  &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52
 months&quot;,  &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52
 months&quot;,  &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52
 months&quot;,  &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52 months&quot;, &quot;≤ 3.52
 months&quot;,  &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52
 months&quot;,  &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52
 months&quot;,  &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52
 months&quot;,  &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52 months&quot;, &quot;&gt; 3.52
 months&quot;,  &quot;&gt; 3.52 months&quot;)), row.names = c(NA, -34L), class =
 &quot;data.frame&quot;)

And here is the code I am using for this analysis:

fit1 &lt;- coxph(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
summary(fit1)

I receive the following warning after running this code:

> Warning message: In coxph.fit(X, Y, istrat, offset, init, control, weights = weights, : Loglik converged before variable 1 ;
coefficient may be infinite.

And more importantly I am receiving incorrect results, since the confidence interval goes from 0 to Inf and the co-efficient and p-values are really high. I have run this analysis for Overall Survival using the same dataset which has worked well without any issues. Any suggestions as to what might be driving this issue with respect to my PFS2 values?

答案1

得分: 1

这是完全分离问题的一个变种，你可以开始阅读关于它的信息（例如）这里。

这些不是真正的不正确估计，它们是显示无限估计的尝试。在这种情况下，瓦尔德估计的标准误差失败了（这被称为豪克-唐纳效应）。

一些可能的解决方案：

你仍然可以使用 anova.coxph 来比较拟合与零模型的拟合，并以此获得有效的 p 值
考虑不对预测变量进行二分化...
拟合一个正则化模型，例如使用 glmnet 包，采用岭回归惩罚（alpha = 0）和一个小的惩罚

最容易通过绘制数据（使用 Kaplan-Meier 估计）来看到：

library(ggfortify)
fit2 &lt;- survfit(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
autoplot(fit2)

所有"≤3.52"层的个体在另一层的第一个个体死亡之前都会死亡（失败）或被截尾...

我们也可以绘制拟合的 Cox 模型（使用 autoplot(survfit(fit))），尽管发生的情况不太明显...

英文:

This is a variant of the complete separation problem, which you can start to read about (e.g.) here.

These aren't really incorrect estimates, they're the attempt to show infinite estimates. The Wald estimates of the standard errors fail in this case (this is called the Hauck-Donner effect).

Some possible solutions:

you can still use anova.coxph to compare the fit to the fit of a null model and get a valid p-value that way
consider not dichotomizing your predictor ...
fit a regularized model, e.g. using the glmnet package with a ridge penalty (alpha = 0) and a small penalty

Easiest to see by plotting the data (using a Kaplan-Meier estimate):

library(ggfortify)
fit2 &lt;- survfit(Surv(PFS2, PFS2_event) ~ Binarised_Time.on.Tx.2, data =
 Test_Dataset) 
autoplot(fit2)

All of the individuals in the "≤3.52" stratum die (fail) or are censored before the first individual in the other stratum dies ...

We can plot the fitted Cox model (with autoplot(survfit(fit))) too, although it's less obvious what's going on ...

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何解决在R中运行CoxPH分析时置信区间过大的问题？

问题

答案1

Error in checkForRemoteErrors(val) : 7 nodes produced errors; first error: could not find function "fread"

gtsummary: 控制连续变量的小数位数2

阻止列表编号出现在 `do.call(“cbind.data.frame”, my_list)` 后的列名中。

通过将位置表示为整数向量来更新嵌套列表中的值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。