英文:
Optimize data for t.test to avoid "data are essentially constant" error
问题
有关在R中使用t.test()时出现错误消息“数据基本上是恒定的”的情况,有几个StackOverflow帖子讨论了这个问题,这是因为组之间没有足够的差异(没有变化)来运行t.test()。(如果有其他原因,请纠正我)
我现在处于这种情况,我想通过改变我的数据以使数据的统计特性不发生 drastical 改变来解决这个问题,这样 t-检验结果仍然正确。我想知道如果我向数据中添加一些非常小的变化(例如将 0.301029995663981 更改为 0.301029995663990),或者还能做些什么?
例如,这是我的数据:
# 创建数据框
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
"0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
"0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
"1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
"0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
"0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
"0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
"0.301029995663981","0.301029995663981","0.301029995663981","0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429",
"1.27875360095283","0.698970004336019","1.38021124171161","1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762",
"1.17609125905568","0.845098040014257","0.698970004336019","0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619",
"1.46239799789896","1.34242268082221","1.34242268082221","0.778151250383644"))
然后,我设置了因子:
# 设置因子
str(data)
data$Date<-as.factor(data$Date)
data$Site<-as.factor(data$Site)
data$Species<-as.factor(data$Species)
data$Mean<-as.numeric(data$Mean)
str(data)
当我尝试使用t.test()时:
compare_means(Mean ~ Species, data = data, group.b = "Date", method = "t.test")
会出现以下错误:
错误 in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: Date.2021.12.
Caused by error in `t.test.default()`:
! data are essentially constant
Run `rlang::last_trace()` to see where the error occurred.
类似地,在ggplot中使用这个:
<details>
<summary>英文:</summary>
There are several StackOverflow posts about situation where t.test() in R produce an error saying "data are essentially constant", this is due to that there is not enough difference between the groups (there is no variation) to run the t.test(). (Correct me if there is something else)
I'm in this situation, and I would like to fix this buy altering my data the way the statistical features of the data don't change drastically, so the t-test stays correct. I was wondering what if I add some very little variation to the data (e.g. change 0.301029995663981 to 0.301029995663990), or what else can I do?
For example, this is my data:
# Create the data frame
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01","2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.09","2022.09","2022.10","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
"0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
"0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
"1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
"0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
"0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
"0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981",
"0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161",
"1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019",
"0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221",
"0.778151250383644"))
After, I set the factors:
# Set factors
str(data)
data$Date<-as.factor(data$Date)
data$Site<-as.factor(data$Site)
data$Species<-as.factor(data$Species)
data$Mean<-as.numeric(data$Mean)
str(data)
When I try t.test():
compare_means(Mean ~ Species, data = data, group.b = "Date", method = "t.test")
This is the error:
Error in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: Date.2021.12.
Caused by error in `t.test.default()`:
! data are essentially constant
Run `rlang::last_trace()` to see where the error occurred.
Similarly, when I use this in ggplot:
ggplot(data, aes(x = Date, y = Mean, fill=Species)) +
geom_boxplot()+
stat_compare_means(data=data,method="t.test", label = "p.signif") +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Warning message:
Computation failed in `stat_compare_means()`
Caused by error in `mutate()`:
ℹ In argument: `p = purrr::map(...)`.
Caused by error in `purrr::map()`:
ℹ In index: 5.
ℹ With name: x.5.
Caused by error in `t.test.default()`:
! data are essentially constant
What is the best solution, which keeps the data still usable in t-test?
</details>
# 答案1
**得分**: 1
针对每个日期-物种组合找到`Mean`的标准差,然后筛选掉任何标准差为0的日期,这样就可以实现。您甚至可以将筛选后的数据传递给`compare_means()`函数:
``` r
library(dplyr)
library(ggpubr)
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348","0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981","0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226","1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981","0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981","0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161","1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019","0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221","0.778151250383644"))
data$Date <- as.factor(data$Date)
data$Site <- as.factor(data$Site)
data$Species <- as.factor(data$Species)
data$Mean <- as.numeric(data$Mean)
data %>%
group_by(Date, Species) %>%
mutate(s = sd(Mean)) %>%
group_by(Date) %>%
filter(!any(s == 0)) %>%
compare_means(Mean ~ Species, data = ., group.b = "Date", method = "t.test")
#> # A tibble: 11 × 9
#> Date .y. group1 group2 p p.adj p.format p.signif method
#> <fct> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2021.08 Mean A B 0.718 1 0.718 ns T-test
#> 2 2021.09 Mean A B 0.451 1 0.451 ns T-test
#> 3 2021.10 Mean A B 0.0889 0.89 0.089 ns T-test
#> 4 2021.11 Mean A B 0.850 1 0.850 ns T-test
#> 5 2022.01 Mean A B
<details>
<summary>英文:</summary>
Finding the sd of `Mean` for each Date-Species combination and then filtering out any Dates where any sd is 0 will do the trick. You could even just pipe the filtered data to `compare_means()`:
``` r
library(dplyr)
library(ggpubr)
data <- data.frame(Date = c("2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01","2021.08","2021.08","2021.09","2021.09","2021.09","2021.10","2021.10","2021.10","2021.11","2021.11","2021.11","2021.11","2021.11","2021.12","2021.12","2022.01","2022.01","2022.01","2022.01","2022.08","2022.08","2022.08","2022.08","2022.08","2022.09","2022.09","2022.09","2022.09","2022.10","2022.10","2022.10","2022.10","2022.11","2022.11","2022.11","2022.11","2022.11","2022.12","2022.12","2022.12","2022.12","2023.01","2023.01","2023.01","2023.01"),
Species = c("A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A","A",
"A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B","B"),
Site = c("Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something","Something",
"Something","Something","Something","Something"),
Mean = c("0.301029995663981","1.07918124604762","0.698970004336019","1.23044892137827","1.53147891704226","1.41497334797082","1.7160033436348",
"0.698970004336019","1.39794000867204","1","0.301029995663981","0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.845098040014257","0.301029995663981","0.301029995663981",
"0.477121254719662","0.698970004336019","1.23044892137827","1.41497334797082","1.95904139232109","1.5910646070265","1.53147891704226",
"1.14612803567824","1.57978359661681","1.34242268082221","0.778151250383644","0.301029995663981","0.301029995663981","0.477121254719662",
"0.301029995663981","1.20411998265592","0.845098040014257","1.17609125905568","1.20411998265592","0.698970004336019","0.301029995663981",
"0.698970004336019","0.698970004336019","0.903089986991944","1.14612803567824","0.301029995663981","0.602059991327962","0.301029995663981",
"0.845098040014257","0.698970004336019","0.698970004336019","0.301029995663981","0.698970004336019","0.301029995663981","0.301029995663981",
"0.301029995663981","0.477121254719662","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981","0.301029995663981",
"0.602059991327962","0.301029995663981","0.845098040014257","1.92941892571429","1.27875360095283","0.698970004336019","1.38021124171161",
"1.20411998265592","1.38021124171161","1.14612803567824","1","1.07918124604762","1.17609125905568","0.845098040014257","0.698970004336019",
"0.778151250383644","0.301029995663981","0.845098040014257","1.64345267648619","1.46239799789896","1.34242268082221","1.34242268082221",
"0.778151250383644"))
data$Date<-as.factor(data$Date)
data$Site<-as.factor(data$Site)
data$Species<-as.factor(data$Species)
data$Mean<-as.numeric(data$Mean)
data %>%
group_by(Date, Species) %>%
mutate(s = sd(Mean)) %>%
group_by(Date) %>%
filter(!any(s == 0)) %>%
compare_means(Mean ~ Species, data = ., group.b = "Date", method = "t.test")
#> # A tibble: 11 × 9
#> Date .y. group1 group2 p p.adj p.format p.signif method
#> <fct> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
#> 1 2021.08 Mean A B 0.718 1 0.718 ns T-test
#> 2 2021.09 Mean A B 0.451 1 0.451 ns T-test
#> 3 2021.10 Mean A B 0.0889 0.89 0.089 ns T-test
#> 4 2021.11 Mean A B 0.850 1 0.850 ns T-test
#> 5 2022.01 Mean A B 1 1 1.000 ns T-test
#> 6 2022.08 Mean A B 0.234 1 0.234 ns T-test
#> 7 2022.09 Mean A B 0.670 1 0.670 ns T-test
#> 8 2022.10 Mean A B 0.0707 0.78 0.071 ns T-test
#> 9 2022.11 Mean A B 0.783 1 0.783 ns T-test
#> 10 2022.12 Mean A B 0.399 1 0.399 ns T-test
#> 11 2023.01 Mean A B 0.255 1 0.255 ns T-test
<sup>Created on 2023-06-01 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论