从分组数据中使用分段回归提取多个变量的断点。

huangapple go评论65阅读模式
英文:

Extract breakpoints from multiple variables from grouped data using piecewise (segmented) regression

问题

我想使用分段回归法并从我的分组数据中提取多个变量的分段点。

我已经使用以下代码逐个处理每个变量和组:

library(segmented)

mod_lm <- lm(y ~ x, data = df) #进行线性回归
mod_seg <- segmented(mod_lm, seg.Z = ~ x) #进行分段回归
mod_seg$psi #提取分段点和估计的标准误差

我想在因变量上运行这个操作,而自变量保持不变。我的数据看起来像这样:

x	Group	Var 	y
9	Group1	Var1	0.6901
6	Group1	Var1	0.6346
5	Group1	Var1	0.8089
5	Group1	Var1	0.1274
7	Group1	Var1	0.6426
1	Group1	Var2	0.1059
2	Group1	Var2	0.6989
4	Group1	Var2	0.1129
7	Group1	Var2	0.1458
7	Group1	Var2	0.8185
2	Group2	Var1	0.7950
0	Group2	Var1	0.0533
1	Group2	Var1	0.1866
3	Group2	Var1	0.3876
8	Group2	Var1	0.2788
2	Group2	Var2	0.1559
8	Group2	Var2	0.3382
1	Group2	Var2	0.6346
9	Group2	Var2	0.6038
8	Group2	Var2	0.2026

我该如何获取这些分段点并将它们存储在一个新的数据框中?

英文:

I would like to use piecewise regression and extract breakpoints across multiple variables from my grouped data.

I have used following code to do it one by one for each variable & group:

library(segmented)

mod_lm &lt;- lm(y ~ x, data = df) #Do LM
mod_seg &lt;- segmented(mod_lm, seg.Z = ~ x) #Do segmented regression
mod_seg$psi #Extract breakpoint &amp; standard error of the estimate

I would like to run this across dependent variables, while independent variable remains the same. I also have grouping variable in the data, which I would also like to have included.

My data looks like this:

x	Group	Var 	y
9	Group1	Var1	0.6901
6	Group1	Var1	0.6346
5	Group1	Var1	0.8089
5	Group1	Var1	0.1274
7	Group1	Var1	0.6426
1	Group1	Var2	0.1059
2	Group1	Var2	0.6989
4	Group1	Var2	0.1129
7	Group1	Var2	0.1458
7	Group1	Var2	0.8185
2	Group2	Var1	0.7950
0	Group2	Var1	0.0533
1	Group2	Var1	0.1866
3	Group2	Var1	0.3876
8	Group2	Var1	0.2788
2	Group2	Var2	0.1559
8	Group2	Var2	0.3382
1	Group2	Var2	0.6346
9	Group2	Var2	0.6038
8	Group2	Var2	0.2026

How can I get the breakpoints and store them in a new dataframe?

答案1

得分: 1

This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to

我不确定这是否符合你的需求,但我们可以循环遍历各个组并在最后提取psi。如果你愿意的话,也可以将列名改为initialEst.St.Err。在这个数据集中,我无法完全理解“清理”这个概念,因为它只返回了其中一个组的结果。

library(tidyverse)
library(segmented)

suppressWarnings(
df %>%
  nest_by(Group, Var) %>%
  mutate(mod_lm = list(lm(y ~ x, data = data)),
         mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
                        error = function(e) list(NA))),
         psi = list(mod_seg[['psi']])) %>%
  unnest(cols = psi, keep_empty = TRUE)
)
#> 断点估计值:5.895779
#> # A tibble: 4 x 6
#> # Groups:   Group, Var [4]
#>   Group  Var                 data mod_lm mod_seg    psi[,1]  [,2]  [,3]
#>   <chr>  <chr> <list<tibble[,2]>> <list> <list>       <dbl> <dbl> <dbl>
#> 1 Group1 Var1             [5 x 2] <lm>   <list [1]>      NA NA    NA   
#> 2 Group1 Var2             [5 x 2] <lm>   <lm>            NA NA    NA   
#> 3 Group2 Var1             [5 x 2] <lm>   <segmentd>       2  2.00  1.17
#> 4 Group2 Var2             [5 x 2] <lm>   <list [1]>      NA NA    NA

数据:

read.table(text = "x  Group Var y
9 Group1  Var1  0.6901
6 Group1  Var1  0.6346
5 Group1  Var1  0.8089
5 Group1  Var1  0.1274
7 Group1  Var1  0.6426
1 Group1  Var2  0.1059
2 Group1  Var2  0.6989
4 Group1  Var2  0.1129
7 Group1  Var2  0.1458
7 Group1  Var2  0.8185
2 Group2  Var1  0.7950
0 Group2  Var1  0.0533
1 Group2  Var1  0.1866
3 Group2  Var1  0.3876
8 Group2  Var1  0.2788
2 Group2  Var2  0.1559
8 Group2  Var2  0.3382
1 Group2  Var2  0.6346
9 Group2  Var2  0.6038
8 Group2  Var2  0.2026", header = T, stringsAsFactor = F) -> df
英文:

This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to

I am not sure if this is what you are after, but we can loop over the groups and extract psi at the end. You can also rename the columns to be initial, Est., and St.Err if you like. With this dataset, I cannot quite wrap my head around "cleaning up" since it only returns results for one of the groups.

library(tidyverse)
library(segmented)

suppressWarnings(
df %&gt;% 
  nest_by(Group, Var) %&gt;%
  mutate(mod_lm = list(lm(y ~ x, data = data)),
         mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
                        error = function(e) list(NA))),
         psi = list(mod_seg[[&#39;psi&#39;]])) %&gt;% 
  unnest(cols = psi, keep_empty = TRUE)
)
#&gt; breakpoint estimate(s): 5.895779
#&gt; # A tibble: 4 x 6
#&gt; # Groups:   Group, Var [4]
#&gt;   Group  Var                 data mod_lm mod_seg    psi[,1]  [,2]  [,3]
#&gt;   &lt;chr&gt;  &lt;chr&gt; &lt;list&lt;tibble[,2]&gt;&gt; &lt;list&gt; &lt;list&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 Group1 Var1             [5 x 2] &lt;lm&gt;   &lt;list [1]&gt;      NA NA    NA   
#&gt; 2 Group1 Var2             [5 x 2] &lt;lm&gt;   &lt;lm&gt;            NA NA    NA   
#&gt; 3 Group2 Var1             [5 x 2] &lt;lm&gt;   &lt;segmentd&gt;       2  2.00  1.17
#&gt; 4 Group2 Var2             [5 x 2] &lt;lm&gt;   &lt;list [1]&gt;      NA NA    NA

Data:

read.table(text = &quot;x  Group Var y
9 Group1  Var1  0.6901
6 Group1  Var1  0.6346
5 Group1  Var1  0.8089
5 Group1  Var1  0.1274
7 Group1  Var1  0.6426
1 Group1  Var2  0.1059
2 Group1  Var2  0.6989
4 Group1  Var2  0.1129
7 Group1  Var2  0.1458
7 Group1  Var2  0.8185
2 Group2  Var1  0.7950
0 Group2  Var1  0.0533
1 Group2  Var1  0.1866
3 Group2  Var1  0.3876
8 Group2  Var1  0.2788
2 Group2  Var2  0.1559
8 Group2  Var2  0.3382
1 Group2  Var2  0.6346
9 Group2  Var2  0.6038
8 Group2  Var2  0.2026&quot;, header = T, stringsAsFactor = F) -&gt; df

huangapple
  • 本文由 发表于 2023年4月17日 19:12:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034537.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定