2023年7月23日 19:41:22go评论107阅读模式

英文:

Use map() and map2() to run regressions and add fitted values to data frame

问题

我想在数据框的一部分列上对同一数据框的另一列（以及一部分观察）进行回归，然后将拟合值附加为原始数据框的新命名列。作为示例，我将使用带有R的EuStockMarkets数据，转换为数据框df。

df <- zoo::fortify.zoo(EuStockMarkets)

我想要在Index的值为1996.877之前的情况下，使用{purrr}函数在列DAX、SMI、CAC和FTSE上进行回归。然后，将拟合值作为新列附加到df，并命名为DAX_fitted、SMI_fitted、CAC_fitted和FTSE_fitted。

我提出了两个选项：

使用以下代码获取拟合值的列表：

fitted <- df %>%
  select(-Index) %>%
  names() %>%
  paste(., ' ~ Index') %>%
  map(as.formula) %>%
  map2(., .y = rep(list(df %>% filter(Index < 1996.877)), length(.)), ~ predict(lm(.x, data = .y)))

使用以下代码获取回归结果的列表：

lm <- df %>%
    select(-Index) %>%
    names() %>%
    paste(., ' ~ Index') %>%
    map(as.formula) %>%
    map(lm, data = df %>% filter(Index < 1996.877))

请问您如何完善这些代码行，以将命名的拟合值添加到原始数据框中？谢谢！

英文:

I want to regress each of a subset of columns of a data frame on another column of the same data frame (and on top on a subset of observations), and then append the fitted values as new and named columns to the original data frame. As an example, I'll use the EuStockMarkets data that comes with R, transformed into a data frame df.

df &lt;- zoo::fortify.zoo(EuStockMarkets)

I would like to regress the columns DAX, SMI, CAC, and FTSE on Index

&gt; colnames(df)
[1] &quot;Index&quot; &quot;DAX&quot;   &quot;SMI&quot;   &quot;CAC&quot;   &quot;FTSE&quot;

for values of Index up to Index == 1996.877 using {purrr} functions to avoid a loop. Then, add the fitted values as new columns to df with names DAX_fitted, SMI_fitted, CAC_fitted, and FTSE_fitted.

I came up with two options until now:

fitted &lt;- df %&gt;% 
  select(-Index) %&gt;% 
  names() %&gt;% 
  paste(.,&#39; ~ Index&#39;) %&gt;%  
  map(as.formula) %&gt;% 
  map2(., .y = rep(list(df %&gt;% filter(Index &lt; 1996.877)), length(.)), ~ predict(lm(.x, data = .y)))

which gives me a list of the fitted values or

lm &lt;- df %&gt;% 
    select(-Index) %&gt;% 
    names() %&gt;% 
    paste(.,&#39; ~ Index&#39;) %&gt;%  
    map(as.formula) %&gt;% 
    map(lm, data = df %&gt;% filter(Index &lt; 1996.877))

which returns a list lm with regression results.

Ideas on how to complete these code lines to add the named fitted values to the original data frame? Thanks!

答案1

得分: 3

使用dplyr，您可以使用mutate() + across()来实现这个目标：

library(dplyr)
df %>%
  filter(Index < 1996.877) %>%
  mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
                .names = "{.col}_fitted"))
# # A tibble: 1,400 × 9
#    Index   DAX   SMI   CAC  FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#    <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#  1 1991. 1629. 1678. 1773. 2444.      1465.      1535.      1865.       2363.
#  2 1992. 1614. 1688. 1750. 2460.      1465.      1536.      1865.       2364.
#  3 1992. 1607. 1679. 1718  2448.      1466.      1538.      1865.       2365.
#  4 1992. 1621. 1684. 1708. 2470.      1467.      1539.      1865.       2366.
#  5 1992. 1618. 1687. 1723. 2485.      1468.      1541.      1865.       2367.
#  6 1992. 1611. 1672. 1714. 2467.      1468.      1542.      1865.       2368.
#  7 1992. 1631. 1683. 1734. 2488.      1469.      1544.      1865.       2369.
#  8 1992. 1640. 1704. 1757. 2508.      1470.      1545.      1866.       2370.
#  9 1992. 1635. 1698. 1754  2510.      1471.      1547.      1866.       2371.
# 10 1992. 1646. 1716. 1754. 2497.      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

要最小程度地修改您已经尝试的代码，您只需要使用set_names()和map_dfc()：

library(purrr)
df %>%
  select(-Index) %>%
  names() %>%
  set_names(paste0, "_fitted") %>%
  map_dfc(~ lm(as.formula(paste(.x, "~ Index")), filter(df, Index < 1996.877))$fitted.values)
# # A tibble: 1,400 × 4
#    DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#         <dbl>      <dbl>      <dbl>       <dbl>
#  1      1465.      1535.      1865.       2363.
#  2      1465.      1536.      1865.       2364.
#  3      1466.      1538.      1865.       2365.
#  4      1467.      1539.      1865.       2366.
#  5      1468.      1541.      1865.       2367.
#  6      1468.      1542.      1865.       2368.
#  7      1469.      1544.      1865.       2369.
#  8      1470.      1545.      1866.       2370.
#  9      1471.      1547.      1866.       2371.
# 10      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

英文:

With dplyr, you can use mutate() + across() to achieve that:

library(dplyr)
df %&gt;%
  filter(Index &lt; 1996.877) %&gt;%
  mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
                .names = &quot;{.col}_fitted&quot;))
# # A tibble: 1,400 &#215; 9
#    Index   DAX   SMI   CAC  FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;       &lt;dbl&gt;
#  1 1991. 1629. 1678. 1773. 2444.      1465.      1535.      1865.       2363.
#  2 1992. 1614. 1688. 1750. 2460.      1465.      1536.      1865.       2364.
#  3 1992. 1607. 1679. 1718  2448.      1466.      1538.      1865.       2365.
#  4 1992. 1621. 1684. 1708. 2470.      1467.      1539.      1865.       2366.
#  5 1992. 1618. 1687. 1723. 2485.      1468.      1541.      1865.       2367.
#  6 1992. 1611. 1672. 1714. 2467.      1468.      1542.      1865.       2368.
#  7 1992. 1631. 1683. 1734. 2488.      1469.      1544.      1865.       2369.
#  8 1992. 1640. 1704. 1757. 2508.      1470.      1545.      1866.       2370.
#  9 1992. 1635. 1698. 1754  2510.      1471.      1547.      1866.       2371.
# 10 1992. 1646. 1716. 1754. 2497.      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

To minimally modify the code you have tried, you just need set_names() and map_dfc():

library(purrr)
df %&gt;%
  select(-Index) %&gt;% 
  names() %&gt;%
  set_names(paste0, &quot;_fitted&quot;) %&gt;%
  map_dfc(~ lm(as.formula(paste(.x, &quot;~ Index&quot;)), filter(df, Index &lt; 1996.877))$fitted.values)
# # A tibble: 1,400 &#215; 4
#    DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#         &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;       &lt;dbl&gt;
#  1      1465.      1535.      1865.       2363.
#  2      1465.      1536.      1865.       2364.
#  3      1466.      1538.      1865.       2365.
#  4      1467.      1539.      1865.       2366.
#  5      1468.      1541.      1865.       2367.
#  6      1468.      1542.      1865.       2368.
#  7      1469.      1544.      1865.       2369.
#  8      1470.      1545.      1866.       2370.
#  9      1471.      1547.      1866.       2371.
# 10      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

答案2

得分: 3

由于您正在更改y值，您可以只运行一次回归，而不是多次运行。以下是代码：

values <- fitted(lm(as.matrix(subset(df,,-Index)) ~ Index, df, Index < 1996.877))
colnames(values) <- paste0(colnames(values), "_fitted")   
cbind(subset(df, Index < 1996.877), values)

       Index     DAX    SMI    CAC   FTSE DAX_fitted SMI_fitted CAC_fitted
1   1991.496 1628.75 1678.1 1772.8 2443.6   1464.602   1535.031   1864.686
2   1991.500 1613.63 1688.5 1750.5 2460.2   1465.356   1536.489   1864.810
3   1991.504 1606.51 1678.6 1718.0 2448.2   1466.110   1537.948   1864.934
4   1991.508 1621.04 1684.1 1708.1 2470.4   1466.864   1539.407   1865.059
5   1991.512 1618.16 1686.6 1723.1 2484.7   1467.618   1540.866   1865.183
6   1991.515 1610.61 1671.6 1714.3 2466.8   1468.373   1542.325   1865.307
7   1991.519 1630.75 1682.9 1734.5 2487.9   1469.127   1543.784   1865.431

请注意，这仅是代码的翻译部分，不包括其他内容。

英文:

As you are changing the y-values, you could run the regression only once, instead of running it multiple times. This is given in the code below:

values &lt;- fitted(lm(as.matrix(subset(df,,-Index)) ~ Index, df, Index &lt; 1996.877))
colnames(values) &lt;- paste0(colnames(values), &quot;_fitted&quot;)   
cbind(subset(df, Index &lt; 1996.877), values)
       Index     DAX    SMI    CAC   FTSE DAX_fitted SMI_fitted CAC_fitted
1   1991.496 1628.75 1678.1 1772.8 2443.6   1464.602   1535.031   1864.686
2   1991.500 1613.63 1688.5 1750.5 2460.2   1465.356   1536.489   1864.810
3   1991.504 1606.51 1678.6 1718.0 2448.2   1466.110   1537.948   1864.934
4   1991.508 1621.04 1684.1 1708.1 2470.4   1466.864   1539.407   1865.059
5   1991.512 1618.16 1686.6 1723.1 2484.7   1467.618   1540.866   1865.183
6   1991.515 1610.61 1671.6 1714.3 2466.8   1468.373   1542.325   1865.307
7   1991.519 1630.75 1682.9 1734.5 2487.9   1469.127   1543.784   1865.431

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用map()和map2()来运行回归并将拟合值添加到数据框中。

问题

答案1

答案2

在ggplot2中显示点与线的图例

过滤数值列名称上的非NA值

在R中使用Highcharter在Highcharts饼图上显示标签名称和数值。

kable的collapse_rows()和border

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。