2023年4月17日 19:12:57go评论99阅读模式

英文:

Extract breakpoints from multiple variables from grouped data using piecewise (segmented) regression

问题

我想使用分段回归法并从我的分组数据中提取多个变量的分段点。

我已经使用以下代码逐个处理每个变量和组：

library(segmented)
mod_lm <- lm(y ~ x, data = df) #进行线性回归
mod_seg <- segmented(mod_lm, seg.Z = ~ x) #进行分段回归
mod_seg$psi #提取分段点和估计的标准误差

我想在因变量上运行这个操作，而自变量保持不变。我的数据看起来像这样：

x	Group	Var 	y
9	Group1	Var1	0.6901
6	Group1	Var1	0.6346
5	Group1	Var1	0.8089
5	Group1	Var1	0.1274
7	Group1	Var1	0.6426
1	Group1	Var2	0.1059
2	Group1	Var2	0.6989
4	Group1	Var2	0.1129
7	Group1	Var2	0.1458
7	Group1	Var2	0.8185
2	Group2	Var1	0.7950
0	Group2	Var1	0.0533
1	Group2	Var1	0.1866
3	Group2	Var1	0.3876
8	Group2	Var1	0.2788
2	Group2	Var2	0.1559
8	Group2	Var2	0.3382
1	Group2	Var2	0.6346
9	Group2	Var2	0.6038
8	Group2	Var2	0.2026

我该如何获取这些分段点并将它们存储在一个新的数据框中？

英文:

I would like to use piecewise regression and extract breakpoints across multiple variables from my grouped data.

I have used following code to do it one by one for each variable & group:

library(segmented)
mod_lm &lt;- lm(y ~ x, data = df) #Do LM
mod_seg &lt;- segmented(mod_lm, seg.Z = ~ x) #Do segmented regression
mod_seg$psi #Extract breakpoint &amp; standard error of the estimate

I would like to run this across dependent variables, while independent variable remains the same. I also have grouping variable in the data, which I would also like to have included.

My data looks like this:

x	Group	Var 	y
9	Group1	Var1	0.6901
6	Group1	Var1	0.6346
5	Group1	Var1	0.8089
5	Group1	Var1	0.1274
7	Group1	Var1	0.6426
1	Group1	Var2	0.1059
2	Group1	Var2	0.6989
4	Group1	Var2	0.1129
7	Group1	Var2	0.1458
7	Group1	Var2	0.8185
2	Group2	Var1	0.7950
0	Group2	Var1	0.0533
1	Group2	Var1	0.1866
3	Group2	Var1	0.3876
8	Group2	Var1	0.2788
2	Group2	Var2	0.1559
8	Group2	Var2	0.3382
1	Group2	Var2	0.6346
9	Group2	Var2	0.6038
8	Group2	Var2	0.2026

How can I get the breakpoints and store them in a new dataframe?

答案1

得分: 1

This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to

我不确定这是否符合你的需求，但我们可以循环遍历各个组并在最后提取psi。如果你愿意的话，也可以将列名改为initial、Est.和St.Err。在这个数据集中，我无法完全理解“清理”这个概念，因为它只返回了其中一个组的结果。

library(tidyverse)
library(segmented)
suppressWarnings(
df %>%
  nest_by(Group, Var) %>%
  mutate(mod_lm = list(lm(y ~ x, data = data)),
         mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
                        error = function(e) list(NA))),
         psi = list(mod_seg[['psi']])) %>%
  unnest(cols = psi, keep_empty = TRUE)
)
#> 断点估计值：5.895779
#> # A tibble: 4 x 6
#> # Groups:   Group, Var [4]
#>   Group  Var                 data mod_lm mod_seg    psi[,1]  [,2]  [,3]
#>   <chr>  <chr> <list<tibble[,2]>> <list> <list>       <dbl> <dbl> <dbl>
#> 1 Group1 Var1             [5 x 2] <lm>   <list [1]>      NA NA    NA   
#> 2 Group1 Var2             [5 x 2] <lm>   <lm>            NA NA    NA   
#> 3 Group2 Var1             [5 x 2] <lm>   <segmentd>       2  2.00  1.17
#> 4 Group2 Var2             [5 x 2] <lm>   <list [1]>      NA NA    NA

数据：

read.table(text = "x  Group Var y
9 Group1  Var1  0.6901
6 Group1  Var1  0.6346
5 Group1  Var1  0.8089
5 Group1  Var1  0.1274
7 Group1  Var1  0.6426
1 Group1  Var2  0.1059
2 Group1  Var2  0.6989
4 Group1  Var2  0.1129
7 Group1  Var2  0.1458
7 Group1  Var2  0.8185
2 Group2  Var1  0.7950
0 Group2  Var1  0.0533
1 Group2  Var1  0.1866
3 Group2  Var1  0.3876
8 Group2  Var1  0.2788
2 Group2  Var2  0.1559
8 Group2  Var2  0.3382
1 Group2  Var2  0.6346
9 Group2  Var2  0.6038
8 Group2  Var2  0.2026", header = T, stringsAsFactor = F) -> df

英文:

This heavily relies on this answer: https://stackoverflow.com/questions/68460350/how-to-use-segmented-package-when-working-with-data-frames-with-dplyr-package-to

I am not sure if this is what you are after, but we can loop over the groups and extract psi at the end. You can also rename the columns to be initial, Est., and St.Err if you like. With this dataset, I cannot quite wrap my head around "cleaning up" since it only returns results for one of the groups.

library(tidyverse)
library(segmented)
suppressWarnings(
df %&gt;% 
  nest_by(Group, Var) %&gt;%
  mutate(mod_lm = list(lm(y ~ x, data = data)),
         mod_seg = list(tryCatch(segmented(mod_lm, seg.Z = ~x),
                        error = function(e) list(NA))),
         psi = list(mod_seg[[&#39;psi&#39;]])) %&gt;% 
  unnest(cols = psi, keep_empty = TRUE)
)
#&gt; breakpoint estimate(s): 5.895779
#&gt; # A tibble: 4 x 6
#&gt; # Groups:   Group, Var [4]
#&gt;   Group  Var                 data mod_lm mod_seg    psi[,1]  [,2]  [,3]
#&gt;   &lt;chr&gt;  &lt;chr&gt; &lt;list&lt;tibble[,2]&gt;&gt; &lt;list&gt; &lt;list&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
#&gt; 1 Group1 Var1             [5 x 2] &lt;lm&gt;   &lt;list [1]&gt;      NA NA    NA   
#&gt; 2 Group1 Var2             [5 x 2] &lt;lm&gt;   &lt;lm&gt;            NA NA    NA   
#&gt; 3 Group2 Var1             [5 x 2] &lt;lm&gt;   &lt;segmentd&gt;       2  2.00  1.17
#&gt; 4 Group2 Var2             [5 x 2] &lt;lm&gt;   &lt;list [1]&gt;      NA NA    NA

Data:

read.table(text = &quot;x  Group Var y
9 Group1  Var1  0.6901
6 Group1  Var1  0.6346
5 Group1  Var1  0.8089
5 Group1  Var1  0.1274
7 Group1  Var1  0.6426
1 Group1  Var2  0.1059
2 Group1  Var2  0.6989
4 Group1  Var2  0.1129
7 Group1  Var2  0.1458
7 Group1  Var2  0.8185
2 Group2  Var1  0.7950
0 Group2  Var1  0.0533
1 Group2  Var1  0.1866
3 Group2  Var1  0.3876
8 Group2  Var1  0.2788
2 Group2  Var2  0.1559
8 Group2  Var2  0.3382
1 Group2  Var2  0.6346
9 Group2  Var2  0.6038
8 Group2  Var2  0.2026&quot;, header = T, stringsAsFactor = F) -&gt; df

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从分组数据中使用分段回归提取多个变量的断点。

问题

答案1

数据：

Data:

Function table(x, y) 兼容 R 基础语言和本地管道。

Backtranform lsmeans in R: 在R中反转换lsmeans：

st_buffer 函数与几何图形 sfc_POINT。

将两个具有一对多关系的数据框合并。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。