英文:
How to multiply all possible combinations of columns in R and use them into a multiple linear regression model?
问题
假设我在一个数据框中有三列分别为A、B和C。每一列的行数都是10。现在,我想要对这些列进行两两组合的乘积运算。所以,我想要的列是A*B、A*C和B*C。我该如何在R中实现这个操作呢?另外,我想要将这些列A*B、A*C和B*C用于多次线性回归的多次迭代,如下所示:Y ~ A + B + C + A*B/A*C/B*C。模型的迭代需要通过一个for循环进行计算。
我已经尝试了以下代码:
```R
cc <- combn(data, 2, FUN = Reduce, f = `*`)
n = choose(ncol(data),2)
model <- list()
for (i in 1:n) {
model[[i]] <- lm(Y ~ A+B+C+cc[,i])
}
现在的问题是cc[,i]
是一个包含了所有可能组合的三列的双重变量,它没有特定的名称。因此,在模型摘要中,交互变量的名称只是“cc[,i]”。我想将变量名称更改为“AB”、“BC”或“A*C”。我该如何做呢?
<details>
<summary>英文:</summary>
Suppose I have three columns in a data frame A, B and C respectively. The number of rows for each column are 10. Now, I want to multiply all possible combinations of the columns by taking 2 at a time. So, the columns that I want are A*B, A*C,B*C. How shall I get this using R. Also, I want to use these columns A*B, A*C, B*C into several iterations of a multiple linear regression as follows: Y ~ A + B + C + A*B/A*C/B*C. The model iterations has to be computed through a for loop.
I have tried this code:
cc <- combn(data, 2, FUN = Reduce, f = `*`)
n = choose(ncol(data),2)
model <- list()
for (i in 1:n) {
model[[i]] <- lm(Y ~ A+B+C+cc[,i])
}
Now, the problem is cc[,i] is a double containing all the possible combinations of the three columns and it does not have specific name to it. So, in the model summary the interaction variable is named as "cc[,i]" only. I want to change the variable name to either "A*B" or "B*C" or "A*C". How shall I do it?
</details>
# 答案1
**得分**: 0
这是你的代码翻译结果:
```R
看看这个代码结构是否适合你:
set.seed(7)
df <- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
Y <- rnorm(10)
df <- data.frame(cbind(Y, df))
m <- combn(3, 2)
mylist <- list()
for (i in 1:3) {
new_col <- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
df2 <- cbind(df, new_col)
mylist[[i]] <- lm(Y ~ ., data = df2)
}
mylist
[[1]]
调用:
lm(formula = Y ~ ., data = df2)
系数:
(Intercept) A B C new_col
-1.9290 0.4046 2.0715 0.8587 -0.1156
[[2]]
调用:
lm(formula = Y ~ ., data = df2)
系数:
(Intercept) A B C new_col
-1.81765 0.33800 1.89724 0.79561 0.03353
[[3]]
调用:
lm(formula = Y ~ ., data = df2)
系数:
(Intercept) A B C new_col
-1.402 0.321 1.058 -1.445 5.114
英文:
See if this code structure works for you:
set.seed(7)
df <- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
Y <- rnorm(10)
df <- data.frame(cbind(Y, df))
m <- combn(3, 2)
mylist <- list()
for (i in 1:3) {
new_col <- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
df2 <- cbind(df, new_col)
mylist[[i]] <- lm(Y ~ ., data = df2)
}
mylist
[[1]]
Call:
lm(formula = Y ~ ., data = df2)
Coefficients:
(Intercept) A B C new_col
-1.9290 0.4046 2.0715 0.8587 -0.1156
[[2]]
Call:
lm(formula = Y ~ ., data = df2)
Coefficients:
(Intercept) A B C new_col
-1.81765 0.33800 1.89724 0.79561 0.03353
[[3]]
Call:
lm(formula = Y ~ ., data = df2)
Coefficients:
(Intercept) A B C new_col
-1.402 0.321 1.058 -1.445 5.114
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论