使用map()和map2()来运行回归并将拟合值添加到数据框中。

huangapple go评论82阅读模式
英文:

Use map() and map2() to run regressions and add fitted values to data frame

问题

我想在数据框的一部分列上对同一数据框的另一列(以及一部分观察)进行回归,然后将拟合值附加为原始数据框的新命名列。作为示例,我将使用带有R的EuStockMarkets数据,转换为数据框df

df <- zoo::fortify.zoo(EuStockMarkets)

我想要在Index的值为1996.877之前的情况下,使用{purrr}函数在列DAXSMICACFTSE上进行回归。然后,将拟合值作为新列附加到df,并命名为DAX_fittedSMI_fittedCAC_fittedFTSE_fitted

我提出了两个选项:

  1. 使用以下代码获取拟合值的列表:
fitted <- df %>%
  select(-Index) %>%
  names() %>%
  paste(., ' ~ Index') %>%
  map(as.formula) %>%
  map2(., .y = rep(list(df %>% filter(Index < 1996.877)), length(.)), ~ predict(lm(.x, data = .y))) 
  1. 使用以下代码获取回归结果的列表:
lm <- df %>%
    select(-Index) %>%
    names() %>%
    paste(., ' ~ Index') %>%
    map(as.formula) %>%
    map(lm, data = df %>% filter(Index < 1996.877))

请问您如何完善这些代码行,以将命名的拟合值添加到原始数据框中?谢谢!

英文:

I want to regress each of a subset of columns of a data frame on another column of the same data frame (and on top on a subset of observations), and then append the fitted values as new and named columns to the original data frame. As an example, I'll use the EuStockMarkets data that comes with R, transformed into a data frame df.

df &lt;- zoo::fortify.zoo(EuStockMarkets)

I would like to regress the columns DAX, SMI, CAC, and FTSE on Index

&gt; colnames(df)
[1] &quot;Index&quot; &quot;DAX&quot;   &quot;SMI&quot;   &quot;CAC&quot;   &quot;FTSE&quot; 

for values of Index up to Index == 1996.877 using {purrr} functions to avoid a loop. Then, add the fitted values as new columns to df with names DAX_fitted, SMI_fitted, CAC_fitted, and FTSE_fitted.

I came up with two options until now:

fitted &lt;- df %&gt;% 
  select(-Index) %&gt;% 
  names() %&gt;% 
  paste(.,&#39; ~ Index&#39;) %&gt;%  
  map(as.formula) %&gt;% 
  map2(., .y = rep(list(df %&gt;% filter(Index &lt; 1996.877)), length(.)), ~ predict(lm(.x, data = .y))) 

which gives me a list of the fitted values or

lm &lt;- df %&gt;% 
    select(-Index) %&gt;% 
    names() %&gt;% 
    paste(.,&#39; ~ Index&#39;) %&gt;%  
    map(as.formula) %&gt;% 
    map(lm, data = df %&gt;% filter(Index &lt; 1996.877))

which returns a list lm with regression results.

Ideas on how to complete these code lines to add the named fitted values to the original data frame? Thanks!

答案1

得分: 3

使用dplyr,您可以使用mutate() + across()来实现这个目标:

library(dplyr)

df %>%
  filter(Index < 1996.877) %>%
  mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
                .names = "{.col}_fitted"))

# # A tibble: 1,400 × 9
#    Index   DAX   SMI   CAC  FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#    <dbl> <dbl> <dbl> <dbl> <dbl>      <dbl>      <dbl>      <dbl>       <dbl>
#  1 1991. 1629. 1678. 1773. 2444.      1465.      1535.      1865.       2363.
#  2 1992. 1614. 1688. 1750. 2460.      1465.      1536.      1865.       2364.
#  3 1992. 1607. 1679. 1718  2448.      1466.      1538.      1865.       2365.
#  4 1992. 1621. 1684. 1708. 2470.      1467.      1539.      1865.       2366.
#  5 1992. 1618. 1687. 1723. 2485.      1468.      1541.      1865.       2367.
#  6 1992. 1611. 1672. 1714. 2467.      1468.      1542.      1865.       2368.
#  7 1992. 1631. 1683. 1734. 2488.      1469.      1544.      1865.       2369.
#  8 1992. 1640. 1704. 1757. 2508.      1470.      1545.      1866.       2370.
#  9 1992. 1635. 1698. 1754  2510.      1471.      1547.      1866.       2371.
# 10 1992. 1646. 1716. 1754. 2497.      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

要最小程度地修改您已经尝试的代码,您只需要使用set_names()map_dfc()

library(purrr)

df %>%
  select(-Index) %>%
  names() %>%
  set_names(paste0, "_fitted") %>%
  map_dfc(~ lm(as.formula(paste(.x, "~ Index")), filter(df, Index < 1996.877))$fitted.values)

# # A tibble: 1,400 × 4
#    DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#         <dbl>      <dbl>      <dbl>       <dbl>
#  1      1465.      1535.      1865.       2363.
#  2      1465.      1536.      1865.       2364.
#  3      1466.      1538.      1865.       2365.
#  4      1467.      1539.      1865.       2366.
#  5      1468.      1541.      1865.       2367.
#  6      1468.      1542.      1865.       2368.
#  7      1469.      1544.      1865.       2369.
#  8      1470.      1545.      1866.       2370.
#  9      1471.      1547.      1866.       2371.
# 10      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows
英文:

With dplyr, you can use mutate() + across() to achieve that:

library(dplyr)

df %&gt;%
  filter(Index &lt; 1996.877) %&gt;%
  mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
                .names = &quot;{.col}_fitted&quot;))

# # A tibble: 1,400 &#215; 9
#    Index   DAX   SMI   CAC  FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;       &lt;dbl&gt;
#  1 1991. 1629. 1678. 1773. 2444.      1465.      1535.      1865.       2363.
#  2 1992. 1614. 1688. 1750. 2460.      1465.      1536.      1865.       2364.
#  3 1992. 1607. 1679. 1718  2448.      1466.      1538.      1865.       2365.
#  4 1992. 1621. 1684. 1708. 2470.      1467.      1539.      1865.       2366.
#  5 1992. 1618. 1687. 1723. 2485.      1468.      1541.      1865.       2367.
#  6 1992. 1611. 1672. 1714. 2467.      1468.      1542.      1865.       2368.
#  7 1992. 1631. 1683. 1734. 2488.      1469.      1544.      1865.       2369.
#  8 1992. 1640. 1704. 1757. 2508.      1470.      1545.      1866.       2370.
#  9 1992. 1635. 1698. 1754  2510.      1471.      1547.      1866.       2371.
# 10 1992. 1646. 1716. 1754. 2497.      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

To minimally modify the code you have tried, you just need set_names() and map_dfc():

library(purrr)

df %&gt;%
  select(-Index) %&gt;% 
  names() %&gt;%
  set_names(paste0, &quot;_fitted&quot;) %&gt;%
  map_dfc(~ lm(as.formula(paste(.x, &quot;~ Index&quot;)), filter(df, Index &lt; 1996.877))$fitted.values)

# # A tibble: 1,400 &#215; 4
#    DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
#         &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;       &lt;dbl&gt;
#  1      1465.      1535.      1865.       2363.
#  2      1465.      1536.      1865.       2364.
#  3      1466.      1538.      1865.       2365.
#  4      1467.      1539.      1865.       2366.
#  5      1468.      1541.      1865.       2367.
#  6      1468.      1542.      1865.       2368.
#  7      1469.      1544.      1865.       2369.
#  8      1470.      1545.      1866.       2370.
#  9      1471.      1547.      1866.       2371.
# 10      1471.      1548.      1866.       2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows

答案2

得分: 3

由于您正在更改y值,您可以只运行一次回归,而不是多次运行。以下是代码:

values <- fitted(lm(as.matrix(subset(df,,-Index)) ~ Index, df, Index < 1996.877))
colnames(values) <- paste0(colnames(values), "_fitted")   
cbind(subset(df, Index < 1996.877), values)
       Index     DAX    SMI    CAC   FTSE DAX_fitted SMI_fitted CAC_fitted
1   1991.496 1628.75 1678.1 1772.8 2443.6   1464.602   1535.031   1864.686
2   1991.500 1613.63 1688.5 1750.5 2460.2   1465.356   1536.489   1864.810
3   1991.504 1606.51 1678.6 1718.0 2448.2   1466.110   1537.948   1864.934
4   1991.508 1621.04 1684.1 1708.1 2470.4   1466.864   1539.407   1865.059
5   1991.512 1618.16 1686.6 1723.1 2484.7   1467.618   1540.866   1865.183
6   1991.515 1610.61 1671.6 1714.3 2466.8   1468.373   1542.325   1865.307
7   1991.519 1630.75 1682.9 1734.5 2487.9   1469.127   1543.784   1865.431

请注意,这仅是代码的翻译部分,不包括其他内容。

英文:

As you are changing the y-values, you could run the regression only once, instead of running it multiple times. This is given in the code below:

values &lt;- fitted(lm(as.matrix(subset(df,,-Index)) ~ Index, df, Index &lt; 1996.877))
colnames(values) &lt;- paste0(colnames(values), &quot;_fitted&quot;)   
cbind(subset(df, Index &lt; 1996.877), values)

       Index     DAX    SMI    CAC   FTSE DAX_fitted SMI_fitted CAC_fitted
1   1991.496 1628.75 1678.1 1772.8 2443.6   1464.602   1535.031   1864.686
2   1991.500 1613.63 1688.5 1750.5 2460.2   1465.356   1536.489   1864.810
3   1991.504 1606.51 1678.6 1718.0 2448.2   1466.110   1537.948   1864.934
4   1991.508 1621.04 1684.1 1708.1 2470.4   1466.864   1539.407   1865.059
5   1991.512 1618.16 1686.6 1723.1 2484.7   1467.618   1540.866   1865.183
6   1991.515 1610.61 1671.6 1714.3 2466.8   1468.373   1542.325   1865.307
7   1991.519 1630.75 1682.9 1734.5 2487.9   1469.127   1543.784   1865.431

huangapple
  • 本文由 发表于 2023年7月23日 19:41:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748030.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定