2023年4月7日 03:05:29go评论83阅读模式

英文:

For loop in R: error in model.frame.default()

问题

我对循环还不太了解，请耐心等待

我已经计算了α指数（观察值、Shannon、InvSimpson、均匀度），我想要对我的表的变量“Month”执行Kruskal-Wallis统计检验。

我的表（df）大致如下：

Observed	Shannon	InvSimpson	Evenness	Month
688	4.5538	23.365814	0.696963	二月
749	4.3815	15.162467	0.661992	二月
610	3.8291	11.178981	0.597054	二月
665	4.2011	16.284009	0.646343	三月
839	5.1855	43.198709	0.770260	三月
516	3.2393	4.765211	0.518611	四月
470	3.9677	11.614851	0.644873	四月
539	4.2995	15.593572	0.683583	四月
...	...	...	...	...

在尝试使用循环之前，我逐个指数执行了测试，如下所示：

obs <- df %>% kruskal_test(Observed ~ Month)
sha <- df %>% kruskal_test(Shannon ~ Month)
inv <- df %>% kruskal_test(InvSimpson ~ Month)
eve <- df %>% kruskal_test(Evenness ~ Month)
res.kruskal <- rbind(obs, sha, inv, eve)
res.kruskal

这样可以运行，这是我想要使用for循环获得的相同结果：

# A tibble: 4 × 6
  .y.            n statistic    df       p method        
1 Observed      45      20.6     9 0.0144  Kruskal-Wallis
2 Shannon       45      24.0     9 0.00434 Kruskal-Wallis
3 InvSimpson    45      20.3     9 0.0159  Kruskal-Wallis
4 Evenness      45      22.0     9 0.00899 Kruskal-Wallis

然而，当我尝试使用for循环时，如下所示：

Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
result.kruskal <- data_frame()
for (i in Indices) {
  kruskal <- df %>% kruskal_test(i ~ Month)
  result.kruskal <- rbind(result.kruskal, kruskal)
}

我遇到了以下错误：

Error in model.frame.default(formula = formula, data = data) : 
  variable length differ (found for 'Month')

根据论坛上类似的错误，我不认为问题出在"Month"变量上，正如错误消息所说，我的表df中也没有NA。我是否编写了for循环有问题？

我会感激您提供的任何见解。

Sophie

英文:

I'm quite new to loops so please be patient with me

I have calculated alpha indices (Observed, Shannon, InvSimpson, Evenness) for which I want to perform a Kruskal-Wallis statistical test with the variable Month of my table.

My table (df) looks something like this :

Observed	Shannon	InvSimpson	Evenness	Month
688	4.5538	23.365814	0.696963	February
749	4.3815	15.162467	0.661992	February
610	3.8291	11.178981	0.597054	February
665	4.2011	16.284009	0.646343	March
839	5.1855	43.198709	0.770260	March
516	3.2393	4.765211	0.518611	April
470	3.9677	11.614851	0.644873	April
539	4.2995	15.593572	0.683583	April
...	...	...	...	...

Before trying with a loop I performed the test, one indices at a time, like so :

obs &lt;- df %&gt;% kruskal_test(Observed ~ Month)
sha &lt;- df %&gt;% kruskal_test(Shannon ~ Month)
inv &lt;- df %&gt;% kruskal_test(InvSimpson ~ Month)
eve &lt;- df %&gt;% kruskal_test(Evenness ~ Month)
res.kruskal &lt;- rbind(obs, sha, inv, eve)
res.kruskal

And it worked, that's the same result I want to get with the for loop :

# A tibble: 4 &#215; 6
  .y.            n statistic    df       p method        
  &lt;chr&gt;      &lt;int&gt;     &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;         
1 Observed      45      20.6     9 0.0144  Kruskal-Wallis
2 Shannon       45      24.0     9 0.00434 Kruskal-Wallis
3 InvSimpson    45      20.3     9 0.0159  Kruskal-Wallis
4 Evenness      45      22.0     9 0.00899 Kruskal-Wallis

However, when I try it with a for loop like so :

Indices &lt;- c(&quot;Observed&quot;, &quot;Shannon&quot;, &quot;InvSimpson&quot;, &quot;Evenness&quot;)
result.kruskal &lt;- data_frame()
for (i in Indices) {
  kruskal &lt;- df %&gt;% kruskal_test(i ~ Month)
  result.kruskal &lt;- rbind(result.kruskal, kruskal)
}

I get the following error :

Error in model.frame.default(formula = formula, data = data) : 
  variable length differ (found for &#39;Month&#39;)

From similar errors found on the forum, I don't think my problem comes from the Month variable as the error message says, I don't have NA in my table df either. Am I writing the for loop wrong?

I would be thankful for any insight you might have.

Sophie

答案1

得分: 0

使用数据集的前几行作为示例，lapply() 和 apply() 都可以用于迭代处理列。然后，使用 bind_rows() 将单独的测试结果组合成一个数据框：

library(tidyverse)
library(rstatix)
Indices <- c("Observed", "Shannon", "InvSimpson", "Evenness")
# 使用 lapply
result.kruskal <- bind_rows(
               lapply(df[Indices], FUN = function(x)   kruskal_test(df, x ~ Month))
               , .id = "variable") %>%
               select(-2) %>% as.data.frame()
result.kruskal
 variable        n statistic    df   p    method        
1 Observed       8      5.14     2 0.0766 Kruskal-Wallis
2 Shannon        8      2        2 0.368  Kruskal-Wallis
3 InvSimpson     8      3.22     2 0.2    Kruskal-Wallis
4 Evenness       8      1.44     2 0.486  Kruskal-Wallis
# 或者使用 apply
result.kruskal <- bind_rows(
  apply(df[Indices], 2, FUN = function(x) kruskal_test(df, x ~ Month))
, .id = "variable") %>% select(-2) %>% as.data.frame()
result.kruskal
 variable        n statistic    df   p    method        
1 Observed       8      5.14     2 0.0766 Kruskal-Wallis
2 Shannon        8      2        2 0.368  Kruskal-Wallis
3 InvSimpson     8      3.22     2 0.2    Kruskal-Wallis
4 Evenness       8      1.44     2 0.486  Kruskal-Wallis
# 示例数据
df <- read.table(text = "Observed	Shannon	InvSimpson	Evenness	Month
688	4.5538	23.365814	0.696963	February
749	4.3815	15.162467	0.661992	February
610	3.8291	11.178981	0.597054	February
665	4.2011	16.284009	0.646343	March
839	5.1855	43.198709	0.770260	March
516	3.2393	4.765211	0.518611	April
470	3.9677	11.614851	0.644873	April
539	4.2995	15.593572	0.683583	April", header=T)

这是你提供的代码的翻译部分。

英文:

Using the first rows of your dataset as example, both lapply() and apply() can be used to iterate over the columns. Then, with bind_rows() the results of single tests can be combined together as a data frame:

library(tidyverse)
library(rstatix)
Indices &lt;- c(&quot;Observed&quot;, &quot;Shannon&quot;, &quot;InvSimpson&quot;, &quot;Evenness&quot;)

using lapply

result.kruskal &lt;- bind_rows(
               lapply(df[Indices], FUN = function(x)   kruskal_test(df, x ~ Month))
               , .id = &quot;variable&quot;) %&gt;% 
               select(-2) %&gt;% as.data.frame()
result.kruskal
 variable        n statistic    df   p    method        
1 Observed       8      5.14     2 0.0766 Kruskal-Wallis
2 Shannon        8      2        2 0.368  Kruskal-Wallis
3 InvSimpson     8      3.22     2 0.2    Kruskal-Wallis
4 Evenness       8      1.44     2 0.486  Kruskal-Wallis

or with apply

result.kruskal &lt;- bind_rows(
  apply(df[Indices], 2, FUN = function(x) kruskal_test(df, x ~ Month))
, .id = &quot;variable&quot;) %&gt;% select(-2) %&gt;% as.data.frame()
result.kruskal
 variable        n statistic    df   p    method        
1 Observed       8      5.14     2 0.0766 Kruskal-Wallis
2 Shannon        8      2        2 0.368  Kruskal-Wallis
3 InvSimpson     8      3.22     2 0.2    Kruskal-Wallis
4 Evenness       8      1.44     2 0.486  Kruskal-Wallis

Example data

df &lt;- read.table(text = &quot;Observed	Shannon	InvSimpson	Evenness	Month
688	4.5538	23.365814	0.696963	February
749	4.3815	15.162467	0.661992	February
610	3.8291	11.178981	0.597054	February
665	4.2011	16.284009	0.646343	March
839	5.1855	43.198709	0.770260	March
516	3.2393	4.765211	0.518611	April
470	3.9677	11.614851	0.644873	April
539	4.2995	15.593572	0.683583	April&quot;, header=T)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在R中的for循环：model.frame.default()中的错误。

问题

答案1

using lapply

or with apply

Example data

按行在gt中突出显示

在使用`.R`脚本的时候，可以使用`knitr::spin()`条件性地显示`.html`上的部分。

Java for循环多次返回相同的数字

使用subset删除带有NA的行在函数内部不起作用。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。