从分组数据中使用分段回归提取多个变量的断点。

huangapple go评论99阅读模式
英文:

Extract breakpoints from multiple variables from grouped data using piecewise (segmented) regression

问题

我想使用分段回归法并从我的分组数据中提取多个变量的分段点。

我已经使用以下代码逐个处理每个变量和组:

  1. library(segmented)
  2. mod_lm <- lm(y ~ x, data = df) #进行线性回归
  3. mod_seg <- segmented(mod_lm, seg.Z = ~ x) #进行分段回归
  4. mod_seg$psi #提取分段点和估计的标准误差

我想在因变量上运行这个操作,而自变量保持不变。我的数据看起来像这样:

  1. x Group Var y
  2. 9 Group1 Var1 0.6901
  3. 6 Group1 Var1 0.6346
  4. 5 Group1 Var1 0.8089
  5. 5 Group1 Var1 0.1274
  6. 7 Group1 Var1 0.6426
  7. 1 Group1 Var2 0.1059
  8. 2 Group1 Var2 0.6989
  9. 4 Group1 Var2 0.1129
  10. 7 Group1 Var2 0.1458
  11. 7 Group1 Var2 0.8185
  12. 2 Group2 Var1 0.7950
  13. 0 Group2 Var1 0.0533
  14. 1 Group2 Var1 0.1866
  15. 3 Group2 Var1 0.3876
  16. 8 Group2 Var1 0.2788
  17. 2 Group2 Var2 0.1559
  18. 8 Group2 Var2 0.3382
  19. 1 Group2 Var2 0.6346
  20. 9 Group2 Var2 0.6038
  21. 8 Group2 Var2 0.2026

我该如何获取这些分段点并将它们存储在一个新的数据框中?

英文:

I would like to use piecewise regression and extract breakpoints across multiple variables from my grouped data.

I have used following code to do it one by one for each variable & group:

  1. library(segmented)
  2. mod_lm &lt;- lm(y ~ x, data = df) #Do LM
  3. mod_seg &lt;- segmented(mod_lm, seg.Z = ~ x) #Do segmented regression
  4. mod_seg$psi #Extract breakpoint &amp; standard error of the estimate

I would like to run this across dependent variables, while independent variable remains the same. I also have grouping variable in the data, which I would also like to have included.

My data looks like this:

  1. x Group Var y
  2. 9 Group1 Var1 0.6901
  3. 6 Group1 Var1 0.6346
  4. 5 Group1 Var1 0.8089
  5. 5 Group1 Var1 0.1274
  6. 7 Group1 Var1 0.6426
  7. 1 Group1 Var2 0.1059
  8. 2 Group1 Var2 0.6989
  9. 4 Group1 Var2 0.1129
  10. 7 Group1 Var2 0.1458
  11. 7 Group1 Var2 0.8185
  12. 2 Group2 Var1 0.7950
  13. 0 Group2 Var1 0.0533
  14. 1 Group2 Var1 0.1866
  15. 3 Group2 Var1 0.3876
  16. 8 Group2 Var1 0.2788
  17. 2 Group2 Var2 0.1559
  18. 8 Group2 Var2 0.3382
  19. 1 Group2 Var2 0.6346
  20. 9 Group2 Var2 0.6038
  21. 8 Group2 Var2 0.2026

How can I get the breakpoints and store them in a new dataframe?

答案1

得分: 1

This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to

我不确定这是否符合你的需求,但我们可以循环遍历各个组并在最后提取psi。如果你愿意的话,也可以将列名改为initialEst.St.Err。在这个数据集中,我无法完全理解“清理”这个概念,因为它只返回了其中一个组的结果。

  1. library(tidyverse)
  2. library(segmented)
  3. suppressWarnings(
  4. df %>%
  5. nest_by(Group, Var) %>%
  6. mutate(mod_lm = list(lm(y ~ x, data = data)),
  7. mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
  8. error = function(e) list(NA))),
  9. psi = list(mod_seg[['psi']])) %>%
  10. unnest(cols = psi, keep_empty = TRUE)
  11. )
  12. #> 断点估计值:5.895779
  13. #> # A tibble: 4 x 6
  14. #> # Groups: Group, Var [4]
  15. #> Group Var data mod_lm mod_seg psi[,1] [,2] [,3]
  16. #> <chr> <chr> <list<tibble[,2]>> <list> <list> <dbl> <dbl> <dbl>
  17. #> 1 Group1 Var1 [5 x 2] <lm> <list [1]> NA NA NA
  18. #> 2 Group1 Var2 [5 x 2] <lm> <lm> NA NA NA
  19. #> 3 Group2 Var1 [5 x 2] <lm> <segmentd> 2 2.00 1.17
  20. #> 4 Group2 Var2 [5 x 2] <lm> <list [1]> NA NA NA

数据:

  1. read.table(text = "x Group Var y
  2. 9 Group1 Var1 0.6901
  3. 6 Group1 Var1 0.6346
  4. 5 Group1 Var1 0.8089
  5. 5 Group1 Var1 0.1274
  6. 7 Group1 Var1 0.6426
  7. 1 Group1 Var2 0.1059
  8. 2 Group1 Var2 0.6989
  9. 4 Group1 Var2 0.1129
  10. 7 Group1 Var2 0.1458
  11. 7 Group1 Var2 0.8185
  12. 2 Group2 Var1 0.7950
  13. 0 Group2 Var1 0.0533
  14. 1 Group2 Var1 0.1866
  15. 3 Group2 Var1 0.3876
  16. 8 Group2 Var1 0.2788
  17. 2 Group2 Var2 0.1559
  18. 8 Group2 Var2 0.3382
  19. 1 Group2 Var2 0.6346
  20. 9 Group2 Var2 0.6038
  21. 8 Group2 Var2 0.2026", header = T, stringsAsFactor = F) -> df
英文:

This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to

I am not sure if this is what you are after, but we can loop over the groups and extract psi at the end. You can also rename the columns to be initial, Est., and St.Err if you like. With this dataset, I cannot quite wrap my head around "cleaning up" since it only returns results for one of the groups.

  1. library(tidyverse)
  2. library(segmented)
  3. suppressWarnings(
  4. df %&gt;%
  5. nest_by(Group, Var) %&gt;%
  6. mutate(mod_lm = list(lm(y ~ x, data = data)),
  7. mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
  8. error = function(e) list(NA))),
  9. psi = list(mod_seg[[&#39;psi&#39;]])) %&gt;%
  10. unnest(cols = psi, keep_empty = TRUE)
  11. )
  12. #&gt; breakpoint estimate(s): 5.895779
  13. #&gt; # A tibble: 4 x 6
  14. #&gt; # Groups: Group, Var [4]
  15. #&gt; Group Var data mod_lm mod_seg psi[,1] [,2] [,3]
  16. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;list&lt;tibble[,2]&gt;&gt; &lt;list&gt; &lt;list&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  17. #&gt; 1 Group1 Var1 [5 x 2] &lt;lm&gt; &lt;list [1]&gt; NA NA NA
  18. #&gt; 2 Group1 Var2 [5 x 2] &lt;lm&gt; &lt;lm&gt; NA NA NA
  19. #&gt; 3 Group2 Var1 [5 x 2] &lt;lm&gt; &lt;segmentd&gt; 2 2.00 1.17
  20. #&gt; 4 Group2 Var2 [5 x 2] &lt;lm&gt; &lt;list [1]&gt; NA NA NA

Data:

  1. read.table(text = &quot;x Group Var y
  2. 9 Group1 Var1 0.6901
  3. 6 Group1 Var1 0.6346
  4. 5 Group1 Var1 0.8089
  5. 5 Group1 Var1 0.1274
  6. 7 Group1 Var1 0.6426
  7. 1 Group1 Var2 0.1059
  8. 2 Group1 Var2 0.6989
  9. 4 Group1 Var2 0.1129
  10. 7 Group1 Var2 0.1458
  11. 7 Group1 Var2 0.8185
  12. 2 Group2 Var1 0.7950
  13. 0 Group2 Var1 0.0533
  14. 1 Group2 Var1 0.1866
  15. 3 Group2 Var1 0.3876
  16. 8 Group2 Var1 0.2788
  17. 2 Group2 Var2 0.1559
  18. 8 Group2 Var2 0.3382
  19. 1 Group2 Var2 0.6346
  20. 9 Group2 Var2 0.6038
  21. 8 Group2 Var2 0.2026&quot;, header = T, stringsAsFactor = F) -&gt; df

huangapple
  • 本文由 发表于 2023年4月17日 19:12:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定