2023年7月3日 10:53:06go评论98阅读模式

英文:

What's wrong with the piecewise fitting

问题

以下是您提供的代码的翻译部分：

我是R的新手。我想问一个问题：
这是数据：
    year <- c(2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022)
    score <- c(85,  85,  88,  88,  94,  94,  94,  82,  82,  84,  84,  84,  84,  84, 84)
我想将2015年设置为分段线性拟合的分界点（2008-2015和2015-2022）。我尝试了以下代码，得到了下面的结果。然而，我认为结果不正确，特别是第二阶段，它应该是一个增长趋势。
    stage1 <- year - 2008
    stage2 <- (year - 2015) * (year >= 2015)
    fm <- lm(score ~ stage1 + stage2)
    summary(fm)
    
    library(car)
    linearHypothesis(fm, "stage1 + stage2", verbose = TRUE)
    
    plot(score ~ year)
    lines(fitted(fm) ~ year, col = "red")
    abline(v = 2015, lty = 2)
分段线性拟合结果
![分段线性拟合结果][1]

请注意，代码部分未被翻译，只翻译了您提供的文本信息。

英文:

I am new to R. I want to ask a question below:

Here is the data:

year &lt;- c(2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022)
score &lt;- c(85,  85,  88,  88,  94,  94,  94,  82,  82,  84,  84,  84,  84,  84, 84)

I want to set 2015 as the breakpoint to do the piecewise linear fitting (2008-2015 and 2015-2022). I have tried the following code, it get the results below. However, I think the result is not correct, especially for stage2, which shoube an increasing trend.

stage1 &lt;- year - 2008
stage2 &lt;- (year - 2015) * (year &gt;= 2015)
fm &lt;- lm(score~ stage1 + stage2)
summary(fm)
library(car)
linearHypothesis(fm, &quot;stage1 + stage2&quot;, verbose = TRUE)
plot(score ~ year)
lines(fitted(fm) ~ year, col = &quot;red&quot;)
abline(v = 2015, lty = 2)

The piecewise linaer fitting result
关于分段拟合有什么问题？

答案1

得分: 2

以下是翻译好的内容：

主要问题是使用了不合适的模型。该模型描述了从Stack Overflow帖子中获取的数据（https://stackoverflow.com/questions/76480532/how-can-i-set-the-breakpoints-myself-to-do-the-piecewise-linear-fitting-with-man/76482328#76482328），但不适用于这份数据。在这种情况下，由不连续而不是连续线段组成的模型似乎更合适。

stage1.slope <- (year < 2015) * (year - 2015)
stage1.icept <- +(year < 2015)
stage2.slope <- (year >= 2015) * (year - 2015)
stage2.icept <- +(year >= 2015)
fm <- lm(score ~ stage1.icept + stage1.slope + stage2.icept + stage2.slope + 0)
summary(fm)
## Call:
## lm(formula = score ~ stage1.icept + stage1.slope + stage2.icept + 
##     stage2.slope + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71429 -0.64286  0.07143  0.64286  2.46429 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## stage1.icept  97.0000     0.9904  97.936  < 2e-16 ***
## stage1.slope   1.8214     0.2215   8.224 5.02e-06 ***
## stage2.icept  82.5000     0.7565 109.060  < 2e-16 ***
## stage2.slope   0.2857     0.1808   1.580    0.142    
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Residual standard error: 1.172 on 11 degrees of freedom
## Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9998 
## F-statistic: 2.043e+04 on 4 and 11 DF,  p-value: < 2.2e-16

或者，考虑到stage2.slope不显著，我们可以考虑删除该项。我们可以选择将fm2<-行替换为等效的已注释行。

# fm2 <- update(fm, . ~ . - stage2.slope)
fm2 <- lm(score ~ stage1.icept + stage1.slope + stage2.icept + 0)
summary(fm2)
## Call:
## lm(formula = score ~ stage1.icept + stage1.slope + stage2.icept + 0
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.714 -1.125  0.500  0.500  2.464 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## stage1.icept  97.0000     1.0504  92.347  < 2e-16 ***
## stage1.slope   1.8214     0.2349   7.755 5.16e-06 ***
## stage2.icept  83.5000     0.4394 190.028  < 2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Residual standard error: 1.243 on 12 degrees of freedom
## Multiple R-squared:  0.9998,    Adjusted R-squared:  0.9998 
## F-statistic: 2.422e+04 on 3 and 12 DF,  p-value: < 2.2e-16

绘制图形并添加图例：

plot(score ~ year)
lines(fitted(fm) ~ year, col = "red")
lines(fitted(fm2) ~ year, col = "blue", lty = 2, lwd = 2)
legend("topright", c("fm", "fm2"), col = c("red", "blue"), lty = 1:2, lwd = 1:2)

希望这对你有所帮助！

英文:

The main problem is using an inappropriate model. The model described the data in the SO post it was taken from (https://stackoverflow.com/questions/76480532/how-can-i-set-the-breakpoints-myself-to-do-the-piecewise-linear-fitting-with-man/76482328#76482328) but not this data. In this case a model consisting of discontinuous rather than continuous line segments seems more appropriate.

stage1.slope &lt;- (year &lt; 2015) * (year - 2015)
stage1.icept &lt;- +(year &lt; 2015)
stage2.slope &lt;- (year &gt;= 2015) * (year - 2015)
stage2.icept &lt;- +(year &gt;= 2015)
fm &lt;- lm(score ~ stage1.icept + stage1.slope + stage2.icept + stage2.slope + 0)
summary(fm)
## Call:
## lm(formula = score ~ stage1.icept + stage1.slope + stage2.icept + 
##     stage2.slope + 0)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.71429 -0.64286  0.07143  0.64286  2.46429 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(&gt;|t|)    
## stage1.icept  97.0000     0.9904  97.936  &lt; 2e-16 ***
## stage1.slope   1.8214     0.2215   8.224 5.02e-06 ***
## stage2.icept  82.5000     0.7565 109.060  &lt; 2e-16 ***
## stage2.slope   0.2857     0.1808   1.580    0.142    
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Residual standard error: 1.172 on 11 degrees of freedom
## Multiple R-squared:  0.9999,    Adjusted R-squared:  0.9998 
## F-statistic: 2.043e+04 on 4 and 11 DF,  p-value: &lt; 2.2e-16

or given that stage2.slope is not significant we could consider dropping that term. We can optionally replace the fm2<- line with the equivalent commented out line.

# fm2 &lt;- update(fm, . ~ . - stage2.slope)
fm2 &lt;- lm(score ~ stage1.icept + stage1.slope + stage2.icept + 0)
summary(fm2)
## Call:
## lm(formula = score ~ stage1.icept + stage1.slope + stage2.icept + 0
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.714 -1.125  0.500  0.500  2.464 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(&gt;|t|)    
## stage1.icept  97.0000     1.0504  92.347  &lt; 2e-16 ***
## stage1.slope   1.8214     0.2349   7.755 5.16e-06 ***
## stage2.icept  83.5000     0.4394 190.028  &lt; 2e-16 ***
## ---
## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
## 
## Residual standard error: 1.243 on 12 degrees of freedom
## Multiple R-squared:  0.9998,    Adjusted R-squared:  0.9998 
## F-statistic: 2.422e+04 on 3 and 12 DF,  p-value: &lt; 2.2e-16
plot(score ~ year)
lines(fitted(fm) ~ year, col = &quot;red&quot;)
lines(fitted(fm2) ~ year, col = &quot;blue&quot;, lty = 2, lwd = 2)
legend(&quot;topright&quot;, c(&quot;fm&quot;, &quot;fm2&quot;), col = c(&quot;red&quot;, &quot;blue&quot;), lty = 1:2, lwd = 1:2)

答案2

得分: 0

给定你的数据框为 d：

d <- 
  data.frame(
    year = c(2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022),
    score = c(85,  85,  88,  88,  94,  94,  94,  82,  82,  84,  84,  84,  84,  84, 84)
  )

通过基础R，你可以绘制整个数据集，然后在所需的点处拆分数据，并将结果的数据帧列表映射到适当的对象列表（这里是ablines用于单独模型的系数）。在进行操作时添加ablines：

plot(score ~ year, data = d) 
d %>%
  split(list(d$year >= 2015)) %>%
  Map(f = \(chunk) abline(coef(lm(score ~ year, data = chunk))))

关于分段拟合有什么问题？

或者，你可以在ggplot中使用分组线性平滑的分组标准：

library(ggplot2)
d %>%
  ggplot(aes(year, score, group = year >= 2015)) +
  geom_point() +
  geom_smooth(method = 'lm',
              se = FALSE ## 隐藏置信区间
  )

关于分段拟合有什么问题？

英文:

given your data as dataframe d:

d &lt;- 
  data.frame(
    year = c(2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022),
    score = c(85,  85,  88,  88,  94,  94,  94,  82,  82,  84,  84,  84,  84,  84, 84)
  )

with base R, you could plot the whole set, then split the data at the desired point, and Map the resulting list of dataframes to a list of appropriate objects (here: the ablines for the coefficients of separate models). Add the ablines as you go along:

plot(score ~ year, data = d) 
d |&gt;
  split(list(d$year &gt;= 2015)) |&gt;
  Map(f = \(chunk) abline(coef(lm(score ~ year, data = chunk))))

alternative, you could use the split criterion for groupwise linear smoothing in ggplot:

library(ggplot2)
d |&gt;
  ggplot(aes(year, score, group = year &gt;= 2015)) +
  geom_point() +
  geom_smooth(method = &#39;lm&#39;,
              se = FALSE ## hide confidence bands
  )

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

关于分段拟合有什么问题？

问题

答案1

答案2

如何在geom_smooth中使用lmer

如何创建一个先前调查过的区域的子集，涵盖所有调查团队和土地类型？

提取给定列名的最后一个非NA值

手动将GeoJSON解析为数据框架。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论