2023年2月24日 07:22:20go评论89阅读模式

英文:

How do I remove the automatic breakpoints/coefficients from the segmented package?

问题

我一直在对涉及断点的数据进行分析，我使用了segmented包。当我运行分析时，我得到以下输出：

fit <- lm(XBRrate~Gap, data = batstraining2)
summary(fit)
segmented.fit <- segmented(fit, seg.Z = ~Gap, fixed.psi = 100)
summary(segmented.fit)

> summary(segmented.fit)
	***具有分段关系的回归模型***
调用： 
segmented.lm(obj = fit, seg.Z = ~Gap, fixed.psi = 100)
估计的断点：
          估计值  标准错误
psi1.Gap   74  3.761
线性项的有意义系数：
               估计值   标准误差   t值   Pr(>|t|)    
(Intercept)   0.0157829  0.0118917   1.327   0.1847    
Gap           0.0036064  0.0001935  18.635   <2e-16 ***
U1.Gap       -0.0012212  0.0002588  -4.719       NA    
U1.fixed.Gap -0.0017419  0.0007509  -2.320   0.0205 *  
---
显著性代码： 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
剩余标准误差：在1046自由度上为0.03699
多重R-平方：0.6456，调整后的R-平方：0.6443 
基于10个样本重新启动引导。最后拟合：
在1次迭代中达到收敛（相对变化2.627e-07）

问题在于它在最终模型中包括了U1.Gap，这在图表中最容易看出。

我不希望包括这个自动断点。在社区中，我们理解在固定间隔处存在断点，这就是为什么我想使用这些固定断点。但由于它在74之后包括了这个自动断点，之后的一切都是错误的。

我也尝试使用npsi，虽然断点不会有完全相同的问题，但有时会在在上下文中没有意义的非常奇怪的位置放置断点。

我还尝试使用strucchange包，但似乎没有给出有意义的结果。我在网上找到的示例是针对日期数据格式的，而这不涉及日期。我不确定这是否是问题，但如果这是更好的选择，我需要有人向我展示不同的使用方法，很可能是。

英文:

I have been running analysis on data that involves breakpoints and I'm using the segmented package. When I run my analysis, I get the following output:

fit &lt;- lm(XBRrate~Gap, data = batstraining2)
summary(fit)
segmented.fit &lt;- segmented(fit, seg.Z = ~Gap, fixed.psi = 100)
summary(segmented.fit)

&gt; summary(segmented.fit)
	***Regression Model with Segmented Relationship(s)***
Call: 
segmented.lm(obj = fit, seg.Z = ~Gap, fixed.psi = 100)
Estimated Break-Point(s):
          Est. St.Err
psi1.Gap   74  3.761
Meaningful coefficients of the linear terms:
               Estimate Std. Error t value Pr(&gt;|t|)    
(Intercept)   0.0157829  0.0118917   1.327   0.1847    
Gap           0.0036064  0.0001935  18.635   &lt;2e-16 ***
U1.Gap       -0.0012212  0.0002588  -4.719       NA    
U1.fixed.Gap -0.0017419  0.0007509  -2.320   0.0205 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03699 on 1046 degrees of freedom
Multiple R-Squared: 0.6456,  Adjusted R-squared: 0.6443 
Boot restarting based on 10 samples. Last fit:
Convergence attained in 1 iterations (rel. change 2.627e-07)

The problem is it includes the U1.Gap in its final model, which is easiest to see in the plot.

I need to not have this auto breakpoint included. In the community, we understand there are breakpoints at fixed intervals, which is why I want to use those fixed breakpoints. but since it includes that auto breakpoint at 74 everything is wrong after 74.

I have tried using npsi as well, and while the breakpoints won't have the exact same issue, sometimes it will put breakpoints at very odd places that make no sense to actually have breakpoints in context.

I also tried using the strucchange package but it didn't seem to give results that made sense. The examples I found online were formatted for date data, and this is not date related. I'm not 100% sure if this is the issue, but if that's a better option I would need someone to show me a different way to use it, most likely.

答案1

得分: 0

如果您的目标是进行分段回归，而不是估计分段点（即，已知先验的分段点），您可以使用 lm() 来估计分段线性回归。具体来说，您需要包括感兴趣的变量和分段线性基函数。以下是基函数的样子：

pwl <- function(x, k) ifelse(x >= k, x - k, 0)

其中 x 是变量，k 是节点位置。您可以在回归模型中如下使用它：

data(mtcars)
library(car)
library(ggeffects)
mod <- lm(mpg ~ hp + pwl(hp, 180), data=mtcars)
summary(mod)

注意，hp 上的系数给出了节点之前的斜率，而 hp 上的系数加上 pwl(hp, 180) 上的系数给出了节点之后的斜率。您可以使用 car 包中的 linearHypothesis() 函数来测试节点之后的斜率是否与零不同：

linearHypothesis(mod, "hp + pwl(hp, 180) = 0")

您甚至可以使用 ggeffects 包中的 ggpredict() 函数来绘制效应图：

p <- ggpredict(mod, term="hp [all]")
plot(p)

希望这些信息对您有所帮助。

英文:

If your goal is to fit a segmented regression without estimating breakpoints (i.e., with breakpoints that are known a priori), you could just estimated a piecewise linear regression using lm(). Specifically, you would need to include the variable of interest and a piecewise linear basis function. Here's what the basis function would look like:

pwl &lt;- function(x, k)ifelse(x &gt;= k, x-k, 0)

where x is the variable and k is the knot location. You could use it in the regression model as follows:

data(mtcars)
library(car)
library(ggeffects)
mod &lt;- lm(mpg ~ hp + pwl(hp, 180), data=mtcars)
summary(mod)
#&gt; 
#&gt; Call:
#&gt; lm(formula = mpg ~ hp + pwl(hp, 180), data = mtcars)
#&gt; 
#&gt; Residuals:
#&gt;     Min      1Q  Median      3Q     Max 
#&gt; -5.0587 -2.1073 -0.6383  1.5470  8.1293 
#&gt; 
#&gt; Coefficients:
#&gt;              Estimate Std. Error t value Pr(&gt;|t|)    
#&gt; (Intercept)  34.81289    1.94632  17.887  &lt; 2e-16 ***
#&gt; hp           -0.11099    0.01504  -7.382 3.92e-08 ***
#&gt; pwl(hp, 180)  0.10415    0.02996   3.476  0.00162 ** 
#&gt; ---
#&gt; Signif. codes:  0 &#39;***&#39; 0.001 &#39;**&#39; 0.01 &#39;*&#39; 0.05 &#39;.&#39; 0.1 &#39; &#39; 1
#&gt; 
#&gt; Residual standard error: 3.301 on 29 degrees of freedom
#&gt; Multiple R-squared:  0.7194, Adjusted R-squared:    0.7 
#&gt; F-statistic: 37.17 on 2 and 29 DF,  p-value: 9.947e-09

Note that the coefficient on hp gives the slope of the relationship before the knot and the coefficient on hp plus the coefficient on pwl(hp, 180) gives the slope after the knot. You could use the linearHypothesis() function from the car package to test whether the slope after the knot is different from zero.

linearHypothesis(mod, &quot;hp + pwl(hp, 180) = 0&quot;)
#&gt; Linear hypothesis test
#&gt; 
#&gt; Hypothesis:
#&gt; hp  + pwl(hp, 180) = 0
#&gt; 
#&gt; Model 1: restricted model
#&gt; Model 2: mpg ~ hp + pwl(hp, 180)
#&gt; 
#&gt;   Res.Df    RSS Df Sum of Sq      F Pr(&gt;F)
#&gt; 1     30 317.31                           
#&gt; 2     29 315.99  1    1.3211 0.1212 0.7302

You could even use the ggpredict() function from the ggeffects package to plot what the effect looks like:

p &lt;- ggpredict(mod, term=&quot;hp [all]&quot;)
plot(p)

如何从分段包中删除自动断点/系数？

<sup>Created on 2023-02-23 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从分段包中删除自动断点/系数？

问题

答案1

创建新变量，基于组中其他变量的结果 – R

动态创建的图表在 Quarto 中不会与章节标题交替显示。

Fitting data to a Boltzmann sigmoid function, and estimating parameters with a non-linear least squares fitting in R

根据数据范围在R中每行计算事件数。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。