英文:
Extract breakpoints from multiple variables from grouped data using piecewise (segmented) regression
问题
我想使用分段回归法并从我的分组数据中提取多个变量的分段点。
我已经使用以下代码逐个处理每个变量和组:
library(segmented)
mod_lm <- lm(y ~ x, data = df) #进行线性回归
mod_seg <- segmented(mod_lm, seg.Z = ~ x) #进行分段回归
mod_seg$psi #提取分段点和估计的标准误差
我想在因变量上运行这个操作,而自变量保持不变。我的数据看起来像这样:
x Group Var y
9 Group1 Var1 0.6901
6 Group1 Var1 0.6346
5 Group1 Var1 0.8089
5 Group1 Var1 0.1274
7 Group1 Var1 0.6426
1 Group1 Var2 0.1059
2 Group1 Var2 0.6989
4 Group1 Var2 0.1129
7 Group1 Var2 0.1458
7 Group1 Var2 0.8185
2 Group2 Var1 0.7950
0 Group2 Var1 0.0533
1 Group2 Var1 0.1866
3 Group2 Var1 0.3876
8 Group2 Var1 0.2788
2 Group2 Var2 0.1559
8 Group2 Var2 0.3382
1 Group2 Var2 0.6346
9 Group2 Var2 0.6038
8 Group2 Var2 0.2026
我该如何获取这些分段点并将它们存储在一个新的数据框中?
英文:
I would like to use piecewise regression and extract breakpoints across multiple variables from my grouped data.
I have used following code to do it one by one for each variable & group:
library(segmented)
mod_lm <- lm(y ~ x, data = df) #Do LM
mod_seg <- segmented(mod_lm, seg.Z = ~ x) #Do segmented regression
mod_seg$psi #Extract breakpoint & standard error of the estimate
I would like to run this across dependent variables, while independent variable remains the same. I also have grouping variable in the data, which I would also like to have included.
My data looks like this:
x Group Var y
9 Group1 Var1 0.6901
6 Group1 Var1 0.6346
5 Group1 Var1 0.8089
5 Group1 Var1 0.1274
7 Group1 Var1 0.6426
1 Group1 Var2 0.1059
2 Group1 Var2 0.6989
4 Group1 Var2 0.1129
7 Group1 Var2 0.1458
7 Group1 Var2 0.8185
2 Group2 Var1 0.7950
0 Group2 Var1 0.0533
1 Group2 Var1 0.1866
3 Group2 Var1 0.3876
8 Group2 Var1 0.2788
2 Group2 Var2 0.1559
8 Group2 Var2 0.3382
1 Group2 Var2 0.6346
9 Group2 Var2 0.6038
8 Group2 Var2 0.2026
How can I get the breakpoints and store them in a new dataframe?
答案1
得分: 1
This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to
我不确定这是否符合你的需求,但我们可以循环遍历各个组并在最后提取psi
。如果你愿意的话,也可以将列名改为initial
、Est.
和St.Err
。在这个数据集中,我无法完全理解“清理”这个概念,因为它只返回了其中一个组的结果。
library(tidyverse)
library(segmented)
suppressWarnings(
df %>%
nest_by(Group, Var) %>%
mutate(mod_lm = list(lm(y ~ x, data = data)),
mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
error = function(e) list(NA))),
psi = list(mod_seg[['psi']])) %>%
unnest(cols = psi, keep_empty = TRUE)
)
#> 断点估计值:5.895779
#> # A tibble: 4 x 6
#> # Groups: Group, Var [4]
#> Group Var data mod_lm mod_seg psi[,1] [,2] [,3]
#> <chr> <chr> <list<tibble[,2]>> <list> <list> <dbl> <dbl> <dbl>
#> 1 Group1 Var1 [5 x 2] <lm> <list [1]> NA NA NA
#> 2 Group1 Var2 [5 x 2] <lm> <lm> NA NA NA
#> 3 Group2 Var1 [5 x 2] <lm> <segmentd> 2 2.00 1.17
#> 4 Group2 Var2 [5 x 2] <lm> <list [1]> NA NA NA
数据:
read.table(text = "x Group Var y
9 Group1 Var1 0.6901
6 Group1 Var1 0.6346
5 Group1 Var1 0.8089
5 Group1 Var1 0.1274
7 Group1 Var1 0.6426
1 Group1 Var2 0.1059
2 Group1 Var2 0.6989
4 Group1 Var2 0.1129
7 Group1 Var2 0.1458
7 Group1 Var2 0.8185
2 Group2 Var1 0.7950
0 Group2 Var1 0.0533
1 Group2 Var1 0.1866
3 Group2 Var1 0.3876
8 Group2 Var1 0.2788
2 Group2 Var2 0.1559
8 Group2 Var2 0.3382
1 Group2 Var2 0.6346
9 Group2 Var2 0.6038
8 Group2 Var2 0.2026", header = T, stringsAsFactor = F) -> df
英文:
This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to
I am not sure if this is what you are after, but we can loop over the groups and extract psi
at the end. You can also rename the columns to be initial
, Est.
, and St.Err
if you like. With this dataset, I cannot quite wrap my head around "cleaning up" since it only returns results for one of the groups.
library(tidyverse)
library(segmented)
suppressWarnings(
df %>%
nest_by(Group, Var) %>%
mutate(mod_lm = list(lm(y ~ x, data = data)),
mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
error = function(e) list(NA))),
psi = list(mod_seg[['psi']])) %>%
unnest(cols = psi, keep_empty = TRUE)
)
#> breakpoint estimate(s): 5.895779
#> # A tibble: 4 x 6
#> # Groups: Group, Var [4]
#> Group Var data mod_lm mod_seg psi[,1] [,2] [,3]
#> <chr> <chr> <list<tibble[,2]>> <list> <list> <dbl> <dbl> <dbl>
#> 1 Group1 Var1 [5 x 2] <lm> <list [1]> NA NA NA
#> 2 Group1 Var2 [5 x 2] <lm> <lm> NA NA NA
#> 3 Group2 Var1 [5 x 2] <lm> <segmentd> 2 2.00 1.17
#> 4 Group2 Var2 [5 x 2] <lm> <list [1]> NA NA NA
Data:
read.table(text = "x Group Var y
9 Group1 Var1 0.6901
6 Group1 Var1 0.6346
5 Group1 Var1 0.8089
5 Group1 Var1 0.1274
7 Group1 Var1 0.6426
1 Group1 Var2 0.1059
2 Group1 Var2 0.6989
4 Group1 Var2 0.1129
7 Group1 Var2 0.1458
7 Group1 Var2 0.8185
2 Group2 Var1 0.7950
0 Group2 Var1 0.0533
1 Group2 Var1 0.1866
3 Group2 Var1 0.3876
8 Group2 Var1 0.2788
2 Group2 Var2 0.1559
8 Group2 Var2 0.3382
1 Group2 Var2 0.6346
9 Group2 Var2 0.6038
8 Group2 Var2 0.2026", header = T, stringsAsFactor = F) -> df
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论