英文:
How do I remove the automatic breakpoints/coefficients from the segmented package?
问题
我一直在对涉及断点的数据进行分析,我使用了segmented
包。当我运行分析时,我得到以下输出:
fit <- lm(XBRrate~Gap, data = batstraining2)
summary(fit)
segmented.fit <- segmented(fit, seg.Z = ~Gap, fixed.psi = 100)
summary(segmented.fit)
> summary(segmented.fit)
***具有分段关系的回归模型***
调用:
segmented.lm(obj = fit, seg.Z = ~Gap, fixed.psi = 100)
估计的断点:
估计值 标准错误
psi1.Gap 74 3.761
线性项的有意义系数:
估计值 标准误差 t值 Pr(>|t|)
(Intercept) 0.0157829 0.0118917 1.327 0.1847
Gap 0.0036064 0.0001935 18.635 <2e-16 ***
U1.Gap -0.0012212 0.0002588 -4.719 NA
U1.fixed.Gap -0.0017419 0.0007509 -2.320 0.0205 *
---
显著性代码: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
剩余标准误差:在1046自由度上为0.03699
多重R-平方:0.6456,调整后的R-平方:0.6443
基于10个样本重新启动引导。最后拟合:
在1次迭代中达到收敛(相对变化2.627e-07)
问题在于它在最终模型中包括了U1.Gap,这在图表中最容易看出。
我不希望包括这个自动断点。在社区中,我们理解在固定间隔处存在断点,这就是为什么我想使用这些固定断点。但由于它在74之后包括了这个自动断点,之后的一切都是错误的。
我也尝试使用npsi,虽然断点不会有完全相同的问题,但有时会在在上下文中没有意义的非常奇怪的位置放置断点。
我还尝试使用strucchange
包,但似乎没有给出有意义的结果。我在网上找到的示例是针对日期数据格式的,而这不涉及日期。我不确定这是否是问题,但如果这是更好的选择,我需要有人向我展示不同的使用方法,很可能是。
英文:
I have been running analysis on data that involves breakpoints and I'm using the segmented
package. When I run my analysis, I get the following output:
fit <- lm(XBRrate~Gap, data = batstraining2)
summary(fit)
segmented.fit <- segmented(fit, seg.Z = ~Gap, fixed.psi = 100)
summary(segmented.fit)
> summary(segmented.fit)
***Regression Model with Segmented Relationship(s)***
Call:
segmented.lm(obj = fit, seg.Z = ~Gap, fixed.psi = 100)
Estimated Break-Point(s):
Est. St.Err
psi1.Gap 74 3.761
Meaningful coefficients of the linear terms:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.0157829 0.0118917 1.327 0.1847
Gap 0.0036064 0.0001935 18.635 <2e-16 ***
U1.Gap -0.0012212 0.0002588 -4.719 NA
U1.fixed.Gap -0.0017419 0.0007509 -2.320 0.0205 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03699 on 1046 degrees of freedom
Multiple R-Squared: 0.6456, Adjusted R-squared: 0.6443
Boot restarting based on 10 samples. Last fit:
Convergence attained in 1 iterations (rel. change 2.627e-07)
The problem is it includes the U1.Gap in its final model, which is easiest to see in the plot.
I need to not have this auto breakpoint included. In the community, we understand there are breakpoints at fixed intervals, which is why I want to use those fixed breakpoints. but since it includes that auto breakpoint at 74 everything is wrong after 74.
I have tried using npsi as well, and while the breakpoints won't have the exact same issue, sometimes it will put breakpoints at very odd places that make no sense to actually have breakpoints in context.
I also tried using the strucchange
package but it didn't seem to give results that made sense. The examples I found online were formatted for date data, and this is not date related. I'm not 100% sure if this is the issue, but if that's a better option I would need someone to show me a different way to use it, most likely.
答案1
得分: 0
如果您的目标是进行分段回归,而不是估计分段点(即,已知先验的分段点),您可以使用 lm()
来估计分段线性回归。具体来说,您需要包括感兴趣的变量和分段线性基函数。以下是基函数的样子:
pwl <- function(x, k) ifelse(x >= k, x - k, 0)
其中 x
是变量,k
是节点位置。您可以在回归模型中如下使用它:
data(mtcars)
library(car)
library(ggeffects)
mod <- lm(mpg ~ hp + pwl(hp, 180), data=mtcars)
summary(mod)
注意,hp
上的系数给出了节点之前的斜率,而 hp
上的系数加上 pwl(hp, 180)
上的系数给出了节点之后的斜率。您可以使用 car
包中的 linearHypothesis()
函数来测试节点之后的斜率是否与零不同:
linearHypothesis(mod, "hp + pwl(hp, 180) = 0")
您甚至可以使用 ggeffects
包中的 ggpredict()
函数来绘制效应图:
p <- ggpredict(mod, term="hp [all]")
plot(p)
希望这些信息对您有所帮助。
英文:
If your goal is to fit a segmented regression without estimating breakpoints (i.e., with breakpoints that are known a priori), you could just estimated a piecewise linear regression using lm()
. Specifically, you would need to include the variable of interest and a piecewise linear basis function. Here's what the basis function would look like:
pwl <- function(x, k)ifelse(x >= k, x-k, 0)
where x
is the variable and k
is the knot location. You could use it in the regression model as follows:
data(mtcars)
library(car)
library(ggeffects)
mod <- lm(mpg ~ hp + pwl(hp, 180), data=mtcars)
summary(mod)
#>
#> Call:
#> lm(formula = mpg ~ hp + pwl(hp, 180), data = mtcars)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -5.0587 -2.1073 -0.6383 1.5470 8.1293
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 34.81289 1.94632 17.887 < 2e-16 ***
#> hp -0.11099 0.01504 -7.382 3.92e-08 ***
#> pwl(hp, 180) 0.10415 0.02996 3.476 0.00162 **
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.301 on 29 degrees of freedom
#> Multiple R-squared: 0.7194, Adjusted R-squared: 0.7
#> F-statistic: 37.17 on 2 and 29 DF, p-value: 9.947e-09
Note that the coefficient on hp
gives the slope of the relationship before the knot and the coefficient on hp
plus the coefficient on pwl(hp, 180)
gives the slope after the knot. You could use the linearHypothesis()
function from the car
package to test whether the slope after the knot is different from zero.
linearHypothesis(mod, "hp + pwl(hp, 180) = 0")
#> Linear hypothesis test
#>
#> Hypothesis:
#> hp + pwl(hp, 180) = 0
#>
#> Model 1: restricted model
#> Model 2: mpg ~ hp + pwl(hp, 180)
#>
#> Res.Df RSS Df Sum of Sq F Pr(>F)
#> 1 30 317.31
#> 2 29 315.99 1 1.3211 0.1212 0.7302
You could even use the ggpredict()
function from the ggeffects
package to plot what the effect looks like:
p <- ggpredict(mod, term="hp [all]")
plot(p)
<!-- -->
<sup>Created on 2023-02-23 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论