如何在R中将所有可能的列组合相乘,并将它们用于多元线性回归模型中?

huangapple go评论107阅读模式
英文:

How to multiply all possible combinations of columns in R and use them into a multiple linear regression model?

问题

  1. 假设我在一个数据框中有三列分别为ABC。每一列的行数都是10。现在,我想要对这些列进行两两组合的乘积运算。所以,我想要的列是A*BA*CB*C。我该如何在R中实现这个操作呢?另外,我想要将这些列A*BA*CB*C用于多次线性回归的多次迭代,如下所示:Y ~ A + B + C + A*B/A*C/B*C。模型的迭代需要通过一个for循环进行计算。
  2. 我已经尝试了以下代码:
  3. ```R
  4. cc <- combn(data, 2, FUN = Reduce, f = `*`)
  5. n = choose(ncol(data),2)
  6. model <- list()
  7. for (i in 1:n) {
  8. model[[i]] <- lm(Y ~ A+B+C+cc[,i])
  9. }

现在的问题是cc[,i]是一个包含了所有可能组合的三列的双重变量,它没有特定的名称。因此,在模型摘要中,交互变量的名称只是“cc[,i]”。我想将变量名称更改为“AB”、“BC”或“A*C”。我该如何做呢?

  1. <details>
  2. <summary>英文:</summary>
  3. Suppose I have three columns in a data frame A, B and C respectively. The number of rows for each column are 10. Now, I want to multiply all possible combinations of the columns by taking 2 at a time. So, the columns that I want are A*B, A*C,B*C. How shall I get this using R. Also, I want to use these columns A*B, A*C, B*C into several iterations of a multiple linear regression as follows: Y ~ A + B + C + A*B/A*C/B*C. The model iterations has to be computed through a for loop.
  4. I have tried this code:
  5. cc &lt;- combn(data, 2, FUN = Reduce, f = `*`)
  6. n = choose(ncol(data),2)
  7. model &lt;- list()
  8. for (i in 1:n) {
  9. model[[i]] &lt;- lm(Y ~ A+B+C+cc[,i])
  10. }
  11. Now, the problem is cc[,i] is a double containing all the possible combinations of the three columns and it does not have specific name to it. So, in the model summary the interaction variable is named as &quot;cc[,i]&quot; only. I want to change the variable name to either &quot;A*B&quot; or &quot;B*C&quot; or &quot;A*C&quot;. How shall I do it?
  12. </details>
  13. # 答案1
  14. **得分**: 0
  15. 这是你的代码翻译结果:
  16. ```R
  17. 看看这个代码结构是否适合你:
  18. set.seed(7)
  19. df <- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
  20. Y <- rnorm(10)
  21. df <- data.frame(cbind(Y, df))
  22. m <- combn(3, 2)
  23. mylist <- list()
  24. for (i in 1:3) {
  25. new_col <- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
  26. df2 <- cbind(df, new_col)
  27. mylist[[i]] <- lm(Y ~ ., data = df2)
  28. }
  29. mylist
  30. [[1]]
  31. 调用:
  32. lm(formula = Y ~ ., data = df2)
  33. 系数:
  34. (Intercept) A B C new_col
  35. -1.9290 0.4046 2.0715 0.8587 -0.1156
  36. [[2]]
  37. 调用:
  38. lm(formula = Y ~ ., data = df2)
  39. 系数:
  40. (Intercept) A B C new_col
  41. -1.81765 0.33800 1.89724 0.79561 0.03353
  42. [[3]]
  43. 调用:
  44. lm(formula = Y ~ ., data = df2)
  45. 系数:
  46. (Intercept) A B C new_col
  47. -1.402 0.321 1.058 -1.445 5.114
英文:

See if this code structure works for you:

  1. set.seed(7)
  2. df &lt;- data.frame(A = rexp(10, 1), B = rexp(10, 2), C = rexp(10, 3))
  3. Y &lt;- rnorm(10)
  4. df &lt;- data.frame(cbind(Y, df))
  5. m &lt;- combn(3, 2)
  6. mylist &lt;- list()
  7. for (i in 1:3) {
  8. new_col &lt;- df[ , m[1, i] + 1] * df[ , m[2, i] + 1]
  9. df2 &lt;- cbind(df, new_col)
  10. mylist[[i]] &lt;- lm(Y ~ ., data = df2)
  11. }
  12. mylist
  13. [[1]]
  14. Call:
  15. lm(formula = Y ~ ., data = df2)
  16. Coefficients:
  17. (Intercept) A B C new_col
  18. -1.9290 0.4046 2.0715 0.8587 -0.1156
  19. [[2]]
  20. Call:
  21. lm(formula = Y ~ ., data = df2)
  22. Coefficients:
  23. (Intercept) A B C new_col
  24. -1.81765 0.33800 1.89724 0.79561 0.03353
  25. [[3]]
  26. Call:
  27. lm(formula = Y ~ ., data = df2)
  28. Coefficients:
  29. (Intercept) A B C new_col
  30. -1.402 0.321 1.058 -1.445 5.114

huangapple
  • 本文由 发表于 2023年5月25日 19:41:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331882.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定