如何在R中将所有可能的列组合相乘,并将它们用于多元线性回归模型中?

huangapple go评论70阅读模式
英文:

How to multiply all possible combinations of columns in R and use them into a multiple linear regression model?

问题

假设我在一个数据框中有三列分别为A、B和C。每一列的行数都是10。现在,我想要对这些列进行两两组合的乘积运算。所以,我想要的列是A*B、A*C和B*C。我该如何在R中实现这个操作呢?另外,我想要将这些列A*B、A*C和B*C用于多次线性回归的多次迭代,如下所示:Y ~ A + B + C + A*B/A*C/B*C。模型的迭代需要通过一个for循环进行计算。

我已经尝试了以下代码:

```R
cc <- combn(data, 2, FUN = Reduce, f = `*`)
n = choose(ncol(data),2)
model <- list()
for (i in 1:n) {
model[[i]] <- lm(Y ~ A+B+C+cc[,i])
}

现在的问题是cc[,i]是一个包含了所有可能组合的三列的双重变量,它没有特定的名称。因此,在模型摘要中,交互变量的名称只是“cc[,i]”。我想将变量名称更改为“AB”、“BC”或“A*C”。我该如何做呢?


<details>
<summary>英文:</summary>

Suppose I have three columns in a data frame A, B and C respectively. The number of rows for each column are 10. Now, I want to multiply all possible combinations of the columns by taking 2 at a time. So, the columns that I want are A*B, A*C,B*C. How shall I get this using R. Also, I want to use these columns A*B, A*C, B*C into several iterations of a multiple linear regression as follows: Y ~ A + B + C + A*B/A*C/B*C. The model iterations has to be computed through a for loop. 

I have tried this code:

    cc &lt;- combn(data, 2, FUN = Reduce, f = `*`)
    n = choose(ncol(data),2)
    model &lt;- list()
    for (i in 1:n) {
    model[[i]] &lt;- lm(Y ~ A+B+C+cc[,i])
    }

Now, the problem is cc[,i] is a double containing all the possible combinations of the three columns and it does not have specific name to it. So, in the model summary the interaction variable is named as &quot;cc[,i]&quot; only. I want to change the variable name to either &quot;A*B&quot; or &quot;B*C&quot; or &quot;A*C&quot;. How shall I do it?

</details>


# 答案1
**得分**: 0

这是你的代码翻译结果:

```R
看看这个代码结构是否适合你:

set.seed(7)
df <- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
Y <- rnorm(10)
df <- data.frame(cbind(Y, df))

m <- combn(3, 2)

mylist <- list()

for (i in 1:3) {
    new_col <- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
    df2 <- cbind(df, new_col)
    mylist[[i]] <- lm(Y ~ ., data = df2)

}
mylist
[[1]]

调用:
lm(formula = Y ~ ., data = df2)

系数:
(Intercept)            A            B            C      new_col  
    -1.9290       0.4046       2.0715       0.8587      -0.1156  


[[2]]

调用:
lm(formula = Y ~ ., data = df2)

系数:
(Intercept)            A            B            C      new_col  
   -1.81765      0.33800      1.89724      0.79561      0.03353  


[[3]]

调用:
lm(formula = Y ~ ., data = df2)

系数:
(Intercept)            A            B            C      new_col  
     -1.402        0.321        1.058       -1.445        5.114
英文:

See if this code structure works for you:

set.seed(7)
df &lt;- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
Y &lt;- rnorm(10)
df &lt;- data.frame(cbind(Y, df))

m &lt;- combn(3, 2)

mylist &lt;- list()

for (i in 1:3) {
    new_col &lt;- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
    df2 &lt;- cbind(df, new_col)
    mylist[[i]] &lt;- lm(Y ~ ., data = df2)
    
}
mylist
[[1]]

Call:
lm(formula = Y ~ ., data = df2)

Coefficients:
(Intercept)            A            B            C      new_col  
    -1.9290       0.4046       2.0715       0.8587      -0.1156  


[[2]]

Call:
lm(formula = Y ~ ., data = df2)

Coefficients:
(Intercept)            A            B            C      new_col  
   -1.81765      0.33800      1.89724      0.79561      0.03353  


[[3]]

Call:
lm(formula = Y ~ ., data = df2)

Coefficients:
(Intercept)            A            B            C      new_col  
     -1.402        0.321        1.058       -1.445        5.114  

huangapple
  • 本文由 发表于 2023年5月25日 19:41:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331882.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定