英文:
What are these dots in this boxplot?
问题
我为一个数据集创建了一个箱线图,并没有使用geom_jitter函数。但是在图中仍然有点出现。它们是统计值吗?为什么它们会出现?
以下是您提供的代码:
pacman::p_load(tidyverse, readxl, janitor, emmeans, multcomp, magrittr,
parameters, effectsize, multcompView, see, performance,
conflicted, ggpubr, rstatix)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("summarise", "dplyr")
conflict_prefer("extract", "magrittr")
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") ## Color blind friendly palette
##--------------------------------------------------------------------------------------------------------------------------
## Funktion um Excel Datei mit mehreren Sheets zu öffnen und eines davon auszuwählen
library(readxl)
read_excel_allsheets <- function(filename) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(x) readxl::read_excel(filename, sheet = x))
return(x)
}
big_tbl <- read_excel_allsheets("Mesocosms_R.xlsx")
big_tbl
phyto_plankton_tbl<- big_tbl[[14]]
##--------------------------------------------------------------------------------------------------------------------------
## Data transformation
phyto_plankton_tbl %>%
mutate(
block = as.factor(block),
trt = factor(trt, labels = c("-P&-F", "+P/-F", "+P/+F", "-P/+F")))
phyto_plankton_tbl <- phyto_plankton_tbl %>%
gather(key = "time", value = "PelaChl", t0, t1, t2, t3, t4, t5) %>% ## Ändert Tabelle aus width format into long format
convert_as_factor(trt, time)
print(phyto_plankton_tbl, n = 40)
##--------------------------------------------------------------------------------------------------------------------------
## Visualization
pelaChl_bxp <- ggplot(data = phyto_plankton_tbl, aes(x= time, y = PelaChl, fill = trt)) +
geom_boxplot() +
ylim(0, 50) +
scale_fill_manual(values=cbPalette) + ## Adds color blind firendly palette
## geom_jitter() +
theme_bw()
请注意,我只会返回翻译好的部分,不会回答关于翻译的问题。
英文:
I created a boxplot for a data set and did not use the geom_jitter function. Still there are dots inside the plot. Are those statistical values or why are they appearing?
I attached the code I use below.
pacman::p_load(tidyverse, readxl, janitor, emmeans, multcomp, magrittr,
parameters, effectsize, multcompView, see, performance,
conflicted, ggpubr, rstatix)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("summarise", "dplyr")
conflict_prefer("extract", "magrittr")
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") ## Color blind friendly palette
##--------------------------------------------------------------------------------------------------------------------------
## Funktion um Excel Datei mit mehreren Sheets zu öffnen und eines davon auszuwählen
library(readxl)
read_excel_allsheets <- function(filename) {
sheets <- readxl::excel_sheets(filename)
x <- lapply(sheets, function(x) readxl::read_excel(filename, sheet = x))
return(x)
}
big_tbl <- read_excel_allsheets ("Mesocosms_R.xlsx")
big_tbl
phyto_plankton_tbl<- big_tbl[[14]]
##--------------------------------------------------------------------------------------------------------------------------
## Data transformation
phyto_plankton_tbl %>%
mutate(
block = as.factor(block),
trt = factor(trt, labels = c("-P&-F", "+P/-F", "+P/+F", "-P/+F")))
phyto_plankton_tbl <- phyto_plankton_tbl %>%
gather(key = "time", value = "PelaChl", t0, t1, t2, t3, t4, t5) %>% ## Ändert Tabelle aus width format into long format
convert_as_factor(trt, time)
print(phyto_plankton_tbl, n = 40)
##--------------------------------------------------------------------------------------------------------------------------
## Visualization
pelaChl_bxp <- ggplot(data = phyto_plankton_tbl, aes(x= time, y = PelaChl, fill = trt)) +
geom_boxplot() +
ylim(0, 50) +
scale_fill_manual(values=cbPalette) + ## Adds color blind firendly palette
## geom_jitter() +
theme_bw()
答案1
得分: -1
从文档中可以看到:
箱线图紧凑地显示了连续变量的分布。它可视化了五个摘要统计量(中位数、两个箱须和两个须),以及所有的“异常值”点。
你所看到的个别点是“异常值”(尽管,正如Roland已经指出的那样,“异常值”是一个有争议的术语-通常人们认为可以从数据集中删除任何不寻常或极端的值,但这些值可能是反映基础数据的奇怪之处的真实数据点)。
英文:
From the documentation:
>The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.
The individual points you are seeing are "outliers" (though, as Roland has helpfully pointed out,"outlier" is a loaded term- often people think they can just remove any values from a dataset which are unusual or extreme, when they may be real data points which reflect the weirdness of the underlying data).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论