如何迭代数据框列名以进行方差分析?

huangapple go评论105阅读模式
英文:

How can I iterate dataframe column names for ANOVA?

问题

我有一个数据框架

  1. bbb <- as.data.frame(list(X1 = c(19, 12, 6, 17, 8, 14, 19, 22, 20, 21, 23, 19),
  2. X2 = c(12, 6, 11, 9, 9, 9, 19, 18, 21, 22, 21, 23),
  3. X3 = c(19, 12, 13, 13, 12, 5, 23, 19, 14, 19, 20, 20),
  4. X4 = c(12, 12, 12, 16, 9, 10, 21, 19, 19, 21, 16, 21),
  5. X5 = c(12, 10, 7, 6, 11, 10, 15, 20, 24, 19, 19, 24),
  6. cluster = c(1,1,1,1,1,1,2,2,2,2,2,2)))

我想使用方差分析(ANOVA)从每一列获取p值。我可以逐个进行,但如何在循环中执行呢?aov不理解来自 colnames(bbb) 的数据。

  1. summary(aov(X1 ~ cluster, data = bbb))[[1]]$'Pr(>F)'[1]

我需要迭代我的数据框架并将p值提取到一个向量中。

英文:

I have a dataframe

  1. bbb <- as.data.frame(list(X1 = c(19, 12, 6, 17, 8, 14, 19, 22, 20, 21, 23, 19),
  2. X2 = c(12, 6, 11, 9, 9, 9, 19, 18, 21, 22, 21, 23),
  3. X3 = c(19, 12, 13, 13, 12, 5, 23, 19, 14, 19, 20, 20),
  4. X4 = c(12, 12, 12, 16, 9, 10, 21, 19, 19, 21, 16, 21),
  5. X5 = c(12, 10, 7, 6, 11, 10, 15, 20, 24, 19, 19, 24),
  6. cluster = c(1,1,1,1,1,1,2,2,2,2,2,2)))

and I would like to use ANOVA to get p-values from each column. I can do it one by one, but how can I do it in a loop? aov does not understand data from colnames(bbb)

  1. summary(aov(X1 ~ cluster, data = bbb))[[1]]$'Pr(>F)'[1]

I need to iterate my dataframe and extract p-values into a vector

答案1

得分: 1

  1. # 创建数据框架
  2. bbb <- as.data.frame(list(X1 = c(19, 12, 6, 17, 8, 14, 19, 22, 20, 21, 23, 19),
  3. X2 = c(12, 6, 11, 9, 9, 9, 19, 18, 21, 22, 21, 23),
  4. X3 = c(19, 12, 13, 13, 12, 5, 23, 19, 14, 19, 20, 20),
  5. X4 = c(12, 12, 12, 16, 9, 10, 21, 19, 19, 21, 16, 21),
  6. X5 = c(12, 10, 7, 6, 11, 10, 15, 20, 24, 19, 19, 24),
  7. cluster = c(1,1,1,1,1,1,2,2,2,2,2,2)))
  8. # 向量用于存储 p 值
  9. p_values <- numeric()
  10. # 执行方差分析并提取每列的 p 值
  11. p_values <- lapply(names(bbb)[1:5], function(col) {
  12. aov_result <- summary(aov(as.formula(paste(col, "~ cluster")), data = bbb))
  13. p_value <- aov_result[[1]]$`Pr(>F)`[1]
  14. return(p_value)
  15. })
  16. # 打印 p 值
  17. print(p_values)
英文:
  1. # Create the dataframe
  2. bbb &lt;- as.data.frame(list(X1 = c(19, 12, 6, 17, 8, 14, 19, 22, 20, 21, 23, 19),
  3. X2 = c(12, 6, 11, 9, 9, 9, 19, 18, 21, 22, 21, 23),
  4. X3 = c(19, 12, 13, 13, 12, 5, 23, 19, 14, 19, 20, 20),
  5. X4 = c(12, 12, 12, 16, 9, 10, 21, 19, 19, 21, 16, 21),
  6. X5 = c(12, 10, 7, 6, 11, 10, 15, 20, 24, 19, 19, 24),
  7. cluster = c(1,1,1,1,1,1,2,2,2,2,2,2)))
  8. # Vector to store p-values
  9. p_values &lt;- numeric()
  10. # Perform ANOVA and extract p-values for each column
  11. p_values &lt;- lapply(names(bbb)[1:5], function(col) {
  12. aov_result &lt;- summary(aov(as.formula(paste(col, &quot;~ cluster&quot;)), data = bbb))
  13. p_value &lt;- aov_result[[1]]$`Pr(&gt;F)`[1]
  14. return(p_value)
  15. })
  16. # Print the p-values
  17. print(p_values)

答案2

得分: 0

使用purrr包中的map()函数是一种选择,例如:

  1. library(purrr)
  2. bbb <- as.data.frame(list(X1 = c(19, 12, 6, 17, 8, 14, 19, 22, 20, 21, 23, 19),
  3. X2 = c(12, 6, 11, 9, 9, 9, 19, 18, 21, 22, 21, 23),
  4. X3 = c(19, 12, 13, 13, 12, 5, 23, 19, 14, 19, 20, 20),
  5. X4 = c(12, 12, 12, 16, 9, 10, 21, 19, 19, 21, 16, 21),
  6. X5 = c(12, 10, 7, 6, 11, 10, 15, 20, 24, 19, 19, 24),
  7. cluster = c(1,1,1,1,1,1,2,2,2,2,2,2)))
  8. summary(aov(X1 ~ cluster, data = bbb))[[1]]$'Pr(>F)'[1]
  9. #> [1] 0.004145981
  10. map(bbb, ~summary(aov(.x ~ cluster, data = bbb))[[1]]$'Pr(>F)'[1])
  11. #> $X1
  12. #> [1] 0.004145981
  13. #>
  14. #> $X2
  15. #> [1] 1.614913e-06
  16. #>
  17. #> $X3
  18. #> [1] 0.01052767
  19. #>
  20. #> $X4
  21. #> [1] 0.0001252443
  22. #>
  23. #> $X5
  24. #> [1] 7.91075e-05
  25. #>
  26. #> $cluster
  27. #> [1] 4.842692e-159

<sup>创建于2023年05月25日,使用reprex v2.0.2</sup>

这将以列表的形式输出结果,但如果需要,您可以使用unlist()函数将结果转换为单个向量或将它们转换为数据框。

英文:

One option is to use the map() function from the purrr package (part of the tidyverse), e.g.

  1. library(purrr)
  2. bbb &lt;- as.data.frame(list(X1 = c(19, 12, 6, 17, 8, 14, 19, 22, 20, 21, 23, 19),
  3. X2 = c(12, 6, 11, 9, 9, 9, 19, 18, 21, 22, 21, 23),
  4. X3 = c(19, 12, 13, 13, 12, 5, 23, 19, 14, 19, 20, 20),
  5. X4 = c(12, 12, 12, 16, 9, 10, 21, 19, 19, 21, 16, 21),
  6. X5 = c(12, 10, 7, 6, 11, 10, 15, 20, 24, 19, 19, 24),
  7. cluster = c(1,1,1,1,1,1,2,2,2,2,2,2)))
  8. summary(aov(X1 ~ cluster, data = bbb))[[1]]$&#39;Pr(&gt;F)&#39;[1]
  9. #&gt; [1] 0.004145981
  10. map(bbb, ~summary(aov(.x ~ cluster, data = bbb))[[1]]$&#39;Pr(&gt;F)&#39;[1])
  11. #&gt; $X1
  12. #&gt; [1] 0.004145981
  13. #&gt;
  14. #&gt; $X2
  15. #&gt; [1] 1.614913e-06
  16. #&gt;
  17. #&gt; $X3
  18. #&gt; [1] 0.01052767
  19. #&gt;
  20. #&gt; $X4
  21. #&gt; [1] 0.0001252443
  22. #&gt;
  23. #&gt; $X5
  24. #&gt; [1] 7.91075e-05
  25. #&gt;
  26. #&gt; $cluster
  27. #&gt; [1] 4.842692e-159

<sup>Created on 2023-05-25 with reprex v2.0.2</sup>

This outputs the results in a list, but you can unlist() the results or coerce them to a dataframe if required

huangapple
  • 本文由 发表于 2023年5月25日 10:46:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76328583.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定