英文:
For loop in R: error in model.frame.default()
问题
我对循环还不太了解,请耐心等待
我已经计算了α指数(观察值、Shannon、InvSimpson、均匀度),我想要对我的表的变量“Month”执行Kruskal-Wallis统计检验。
我的表(df)大致如下:
Observed | Shannon | InvSimpson | Evenness | Month |
---|---|---|---|---|
688 | 4.5538 | 23.365814 | 0.696963 | 二月 |
749 | 4.3815 | 15.162467 | 0.661992 | 二月 |
610 | 3.8291 | 11.178981 | 0.597054 | 二月 |
665 | 4.2011 | 16.284009 | 0.646343 | 三月 |
839 | 5.1855 | 43.198709 | 0.770260 | 三月 |
516 | 3.2393 | 4.765211 | 0.518611 | 四月 |
470 | 3.9677 | 11.614851 | 0.644873 | 四月 |
539 | 4.2995 | 15.593572 | 0.683583 | 四月 |
... | ... | ... | ... | ... |
在尝试使用循环之前,我逐个指数执行了测试,如下所示:
obs <- df %>% kruskal_test(Observed ~ Month)
sha <- df %>% kruskal_test(Shannon ~ Month)
inv <- df %>% kruskal_test(InvSimpson ~ Month)
eve <- df %>% kruskal_test(Evenness ~ Month)
res.kruskal <- rbind(obs, sha, inv, eve)
res.kruskal
这样可以运行,这是我想要使用for循环获得的相同结果:
# A tibble: 4 × 6
.y. n statistic df p method
1 Observed 45 20.6 9 0.0144 Kruskal-Wallis
2 Shannon 45 24.0 9 0.00434 Kruskal-Wallis
3 InvSimpson 45 20.3 9 0.0159 Kruskal-Wallis
4 Evenness 45 22.0 9 0.00899 Kruskal-Wallis
然而,当我尝试使用for循环时,如下所示:
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
result.kruskal <- data_frame()
for (i in Indices) {
kruskal <- df %>% kruskal_test(i ~ Month)
result.kruskal <- rbind(result.kruskal, kruskal)
}
我遇到了以下错误:
Error in model.frame.default(formula = formula, data = data) :
variable length differ (found for 'Month')
根据论坛上类似的错误,我不认为问题出在"Month"变量上,正如错误消息所说,我的表df中也没有NA。我是否编写了for循环有问题?
我会感激您提供的任何见解。
Sophie
英文:
I'm quite new to loops so please be patient with me
I have calculated alpha indices (Observed, Shannon, InvSimpson, Evenness) for which I want to perform a Kruskal-Wallis statistical test with the variable Month of my table.
My table (df) looks something like this :
Observed | Shannon | InvSimpson | Evenness | Month |
---|---|---|---|---|
688 | 4.5538 | 23.365814 | 0.696963 | February |
749 | 4.3815 | 15.162467 | 0.661992 | February |
610 | 3.8291 | 11.178981 | 0.597054 | February |
665 | 4.2011 | 16.284009 | 0.646343 | March |
839 | 5.1855 | 43.198709 | 0.770260 | March |
516 | 3.2393 | 4.765211 | 0.518611 | April |
470 | 3.9677 | 11.614851 | 0.644873 | April |
539 | 4.2995 | 15.593572 | 0.683583 | April |
... | ... | ... | ... | ... |
Before trying with a loop I performed the test, one indices at a time, like so :
obs <- df %>% kruskal_test(Observed ~ Month)
sha <- df %>% kruskal_test(Shannon ~ Month)
inv <- df %>% kruskal_test(InvSimpson ~ Month)
eve <- df %>% kruskal_test(Evenness ~ Month)
res.kruskal <- rbind(obs, sha, inv, eve)
res.kruskal
And it worked, that's the same result I want to get with the for loop :
# A tibble: 4 × 6
.y. n statistic df p method
<chr> <int> <dbl> <int> <dbl> <chr>
1 Observed 45 20.6 9 0.0144 Kruskal-Wallis
2 Shannon 45 24.0 9 0.00434 Kruskal-Wallis
3 InvSimpson 45 20.3 9 0.0159 Kruskal-Wallis
4 Evenness 45 22.0 9 0.00899 Kruskal-Wallis
However, when I try it with a for loop like so :
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
result.kruskal <- data_frame()
for (i in Indices) {
kruskal <- df %>% kruskal_test(i ~ Month)
result.kruskal <- rbind(result.kruskal, kruskal)
}
I get the following error :
Error in model.frame.default(formula = formula, data = data) :
variable length differ (found for 'Month')
From similar errors found on the forum, I don't think my problem comes from the Month variable as the error message says, I don't have NA in my table df either. Am I writing the for loop wrong?
I would be thankful for any insight you might have.
Sophie
答案1
得分: 0
使用数据集的前几行作为示例,lapply()
和 apply()
都可以用于迭代处理列。然后,使用 bind_rows()
将单独的测试结果组合成一个数据框:
library(tidyverse)
library(rstatix)
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
# 使用 lapply
result.kruskal <- bind_rows(
lapply(df[Indices], FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>%
select(-2) %>% as.data.frame()
result.kruskal
variable n statistic df p method
1 Observed 8 5.14 2 0.0766 Kruskal-Wallis
2 Shannon 8 2 2 0.368 Kruskal-Wallis
3 InvSimpson 8 3.22 2 0.2 Kruskal-Wallis
4 Evenness 8 1.44 2 0.486 Kruskal-Wallis
# 或者使用 apply
result.kruskal <- bind_rows(
apply(df[Indices], 2, FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>% select(-2) %>% as.data.frame()
result.kruskal
variable n statistic df p method
1 Observed 8 5.14 2 0.0766 Kruskal-Wallis
2 Shannon 8 2 2 0.368 Kruskal-Wallis
3 InvSimpson 8 3.22 2 0.2 Kruskal-Wallis
4 Evenness 8 1.44 2 0.486 Kruskal-Wallis
# 示例数据
df <- read.table(text = "Observed Shannon InvSimpson Evenness Month
688 4.5538 23.365814 0.696963 February
749 4.3815 15.162467 0.661992 February
610 3.8291 11.178981 0.597054 February
665 4.2011 16.284009 0.646343 March
839 5.1855 43.198709 0.770260 March
516 3.2393 4.765211 0.518611 April
470 3.9677 11.614851 0.644873 April
539 4.2995 15.593572 0.683583 April", header=T)
这是你提供的代码的翻译部分。
英文:
Using the first rows of your dataset as example, both lapply()
and apply()
can be used to iterate over the columns. Then, with bind_rows()
the results of single tests can be combined together as a data frame:
library(tidyverse)
library(rstatix)
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
using lapply
result.kruskal <- bind_rows(
lapply(df[Indices], FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>%
select(-2) %>% as.data.frame()
result.kruskal
variable n statistic df p method
1 Observed 8 5.14 2 0.0766 Kruskal-Wallis
2 Shannon 8 2 2 0.368 Kruskal-Wallis
3 InvSimpson 8 3.22 2 0.2 Kruskal-Wallis
4 Evenness 8 1.44 2 0.486 Kruskal-Wallis
or with apply
result.kruskal <- bind_rows(
apply(df[Indices], 2, FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>% select(-2) %>% as.data.frame()
result.kruskal
variable n statistic df p method
1 Observed 8 5.14 2 0.0766 Kruskal-Wallis
2 Shannon 8 2 2 0.368 Kruskal-Wallis
3 InvSimpson 8 3.22 2 0.2 Kruskal-Wallis
4 Evenness 8 1.44 2 0.486 Kruskal-Wallis
Example data
df <- read.table(text = "Observed Shannon InvSimpson Evenness Month
688 4.5538 23.365814 0.696963 February
749 4.3815 15.162467 0.661992 February
610 3.8291 11.178981 0.597054 February
665 4.2011 16.284009 0.646343 March
839 5.1855 43.198709 0.770260 March
516 3.2393 4.765211 0.518611 April
470 3.9677 11.614851 0.644873 April
539 4.2995 15.593572 0.683583 April", header=T)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论