英文:
Use map() and map2() to run regressions and add fitted values to data frame
问题
我想在数据框的一部分列上对同一数据框的另一列(以及一部分观察)进行回归,然后将拟合值附加为原始数据框的新命名列。作为示例,我将使用带有R的EuStockMarkets数据,转换为数据框df
。
df <- zoo::fortify.zoo(EuStockMarkets)
我想要在Index
的值为1996.877
之前的情况下,使用{purrr}
函数在列DAX
、SMI
、CAC
和FTSE
上进行回归。然后,将拟合值作为新列附加到df
,并命名为DAX_fitted
、SMI_fitted
、CAC_fitted
和FTSE_fitted
。
我提出了两个选项:
- 使用以下代码获取拟合值的列表:
fitted <- df %>%
select(-Index) %>%
names() %>%
paste(., ' ~ Index') %>%
map(as.formula) %>%
map2(., .y = rep(list(df %>% filter(Index < 1996.877)), length(.)), ~ predict(lm(.x, data = .y)))
- 使用以下代码获取回归结果的列表:
lm <- df %>%
select(-Index) %>%
names() %>%
paste(., ' ~ Index') %>%
map(as.formula) %>%
map(lm, data = df %>% filter(Index < 1996.877))
请问您如何完善这些代码行,以将命名的拟合值添加到原始数据框中?谢谢!
英文:
I want to regress each of a subset of columns of a data frame on another column of the same data frame (and on top on a subset of observations), and then append the fitted values as new and named columns to the original data frame. As an example, I'll use the EuStockMarkets data that comes with R, transformed into a data frame df
.
df <- zoo::fortify.zoo(EuStockMarkets)
I would like to regress the columns DAX
, SMI
, CAC
, and FTSE
on Index
> colnames(df)
[1] "Index" "DAX" "SMI" "CAC" "FTSE"
for values of Index
up to Index == 1996.877
using {purrr}
functions to avoid a loop. Then, add the fitted values as new columns to df
with names DAX_fitted
, SMI_fitted
, CAC_fitted
, and FTSE_fitted
.
I came up with two options until now:
fitted <- df %>%
select(-Index) %>%
names() %>%
paste(.,' ~ Index') %>%
map(as.formula) %>%
map2(., .y = rep(list(df %>% filter(Index < 1996.877)), length(.)), ~ predict(lm(.x, data = .y)))
which gives me a list of the fitted values or
lm <- df %>%
select(-Index) %>%
names() %>%
paste(.,' ~ Index') %>%
map(as.formula) %>%
map(lm, data = df %>% filter(Index < 1996.877))
which returns a list lm
with regression results.
Ideas on how to complete these code lines to add the named fitted values to the original data frame? Thanks!
答案1
得分: 3
使用dplyr
,您可以使用mutate()
+ across()
来实现这个目标:
library(dplyr)
df %>%
filter(Index < 1996.877) %>%
mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
.names = "{.col}_fitted"))
# # A tibble: 1,400 × 9
# Index DAX SMI CAC FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1991. 1629. 1678. 1773. 2444. 1465. 1535. 1865. 2363.
# 2 1992. 1614. 1688. 1750. 2460. 1465. 1536. 1865. 2364.
# 3 1992. 1607. 1679. 1718 2448. 1466. 1538. 1865. 2365.
# 4 1992. 1621. 1684. 1708. 2470. 1467. 1539. 1865. 2366.
# 5 1992. 1618. 1687. 1723. 2485. 1468. 1541. 1865. 2367.
# 6 1992. 1611. 1672. 1714. 2467. 1468. 1542. 1865. 2368.
# 7 1992. 1631. 1683. 1734. 2488. 1469. 1544. 1865. 2369.
# 8 1992. 1640. 1704. 1757. 2508. 1470. 1545. 1866. 2370.
# 9 1992. 1635. 1698. 1754 2510. 1471. 1547. 1866. 2371.
# 10 1992. 1646. 1716. 1754. 2497. 1471. 1548. 1866. 2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows
要最小程度地修改您已经尝试的代码,您只需要使用set_names()
和map_dfc()
:
library(purrr)
df %>%
select(-Index) %>%
names() %>%
set_names(paste0, "_fitted") %>%
map_dfc(~ lm(as.formula(paste(.x, "~ Index")), filter(df, Index < 1996.877))$fitted.values)
# # A tibble: 1,400 × 4
# DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
# <dbl> <dbl> <dbl> <dbl>
# 1 1465. 1535. 1865. 2363.
# 2 1465. 1536. 1865. 2364.
# 3 1466. 1538. 1865. 2365.
# 4 1467. 1539. 1865. 2366.
# 5 1468. 1541. 1865. 2367.
# 6 1468. 1542. 1865. 2368.
# 7 1469. 1544. 1865. 2369.
# 8 1470. 1545. 1866. 2370.
# 9 1471. 1547. 1866. 2371.
# 10 1471. 1548. 1866. 2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows
英文:
With dplyr
, you can use mutate()
+ across()
to achieve that:
library(dplyr)
df %>%
filter(Index < 1996.877) %>%
mutate(across(-Index, ~ lm(.x ~ Index)$fitted.values,
.names = "{.col}_fitted"))
# # A tibble: 1,400 × 9
# Index DAX SMI CAC FTSE DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1991. 1629. 1678. 1773. 2444. 1465. 1535. 1865. 2363.
# 2 1992. 1614. 1688. 1750. 2460. 1465. 1536. 1865. 2364.
# 3 1992. 1607. 1679. 1718 2448. 1466. 1538. 1865. 2365.
# 4 1992. 1621. 1684. 1708. 2470. 1467. 1539. 1865. 2366.
# 5 1992. 1618. 1687. 1723. 2485. 1468. 1541. 1865. 2367.
# 6 1992. 1611. 1672. 1714. 2467. 1468. 1542. 1865. 2368.
# 7 1992. 1631. 1683. 1734. 2488. 1469. 1544. 1865. 2369.
# 8 1992. 1640. 1704. 1757. 2508. 1470. 1545. 1866. 2370.
# 9 1992. 1635. 1698. 1754 2510. 1471. 1547. 1866. 2371.
# 10 1992. 1646. 1716. 1754. 2497. 1471. 1548. 1866. 2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows
To minimally modify the code you have tried, you just need set_names()
and map_dfc()
:
library(purrr)
df %>%
select(-Index) %>%
names() %>%
set_names(paste0, "_fitted") %>%
map_dfc(~ lm(as.formula(paste(.x, "~ Index")), filter(df, Index < 1996.877))$fitted.values)
# # A tibble: 1,400 × 4
# DAX_fitted SMI_fitted CAC_fitted FTSE_fitted
# <dbl> <dbl> <dbl> <dbl>
# 1 1465. 1535. 1865. 2363.
# 2 1465. 1536. 1865. 2364.
# 3 1466. 1538. 1865. 2365.
# 4 1467. 1539. 1865. 2366.
# 5 1468. 1541. 1865. 2367.
# 6 1468. 1542. 1865. 2368.
# 7 1469. 1544. 1865. 2369.
# 8 1470. 1545. 1866. 2370.
# 9 1471. 1547. 1866. 2371.
# 10 1471. 1548. 1866. 2372.
# # ℹ 1,390 more rows
# # ℹ Use `print(n = ...)` to see more rows
答案2
得分: 3
由于您正在更改y值,您可以只运行一次回归,而不是多次运行。以下是代码:
values <- fitted(lm(as.matrix(subset(df,,-Index)) ~ Index, df, Index < 1996.877))
colnames(values) <- paste0(colnames(values), "_fitted")
cbind(subset(df, Index < 1996.877), values)
Index DAX SMI CAC FTSE DAX_fitted SMI_fitted CAC_fitted
1 1991.496 1628.75 1678.1 1772.8 2443.6 1464.602 1535.031 1864.686
2 1991.500 1613.63 1688.5 1750.5 2460.2 1465.356 1536.489 1864.810
3 1991.504 1606.51 1678.6 1718.0 2448.2 1466.110 1537.948 1864.934
4 1991.508 1621.04 1684.1 1708.1 2470.4 1466.864 1539.407 1865.059
5 1991.512 1618.16 1686.6 1723.1 2484.7 1467.618 1540.866 1865.183
6 1991.515 1610.61 1671.6 1714.3 2466.8 1468.373 1542.325 1865.307
7 1991.519 1630.75 1682.9 1734.5 2487.9 1469.127 1543.784 1865.431
请注意,这仅是代码的翻译部分,不包括其他内容。
英文:
As you are changing the y-values, you could run the regression only once, instead of running it multiple times. This is given in the code below:
values <- fitted(lm(as.matrix(subset(df,,-Index)) ~ Index, df, Index < 1996.877))
colnames(values) <- paste0(colnames(values), "_fitted")
cbind(subset(df, Index < 1996.877), values)
Index DAX SMI CAC FTSE DAX_fitted SMI_fitted CAC_fitted
1 1991.496 1628.75 1678.1 1772.8 2443.6 1464.602 1535.031 1864.686
2 1991.500 1613.63 1688.5 1750.5 2460.2 1465.356 1536.489 1864.810
3 1991.504 1606.51 1678.6 1718.0 2448.2 1466.110 1537.948 1864.934
4 1991.508 1621.04 1684.1 1708.1 2470.4 1466.864 1539.407 1865.059
5 1991.512 1618.16 1686.6 1723.1 2484.7 1467.618 1540.866 1865.183
6 1991.515 1610.61 1671.6 1714.3 2466.8 1468.373 1542.325 1865.307
7 1991.519 1630.75 1682.9 1734.5 2487.9 1469.127 1543.784 1865.431
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论