这个箱线图中的这些点是什么意思?

huangapple go评论59阅读模式
英文:

What are these dots in this boxplot?

问题

我为一个数据集创建了一个箱线图,并没有使用geom_jitter函数。但是在图中仍然有点出现。它们是统计值吗?为什么它们会出现?

以下是您提供的代码:

pacman::p_load(tidyverse, readxl, janitor, emmeans, multcomp, magrittr,
               parameters, effectsize, multcompView, see, performance,
               conflicted, ggpubr, rstatix)
conflict_prefer("select", "dplyr")
conflict_prefer("filter", "dplyr")
conflict_prefer("summarise", "dplyr")
conflict_prefer("extract", "magrittr")

cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2", "#D55E00", "#CC79A7") ## Color blind friendly palette
##--------------------------------------------------------------------------------------------------------------------------
## Funktion um Excel Datei mit mehreren Sheets zu öffnen und eines davon auszuwählen
library(readxl)    
read_excel_allsheets <- function(filename) {
  sheets <- readxl::excel_sheets(filename)
  x <- lapply(sheets, function(x) readxl::read_excel(filename, sheet = x))
  return(x)
}

big_tbl <- read_excel_allsheets("Mesocosms_R.xlsx")
big_tbl

phyto_plankton_tbl<- big_tbl[[14]]
##--------------------------------------------------------------------------------------------------------------------------
## Data transformation
phyto_plankton_tbl %>%
  mutate(
  block = as.factor(block),
  trt = factor(trt, labels = c("-P&-F", "+P/-F", "+P/+F", "-P/+F")))

phyto_plankton_tbl <- phyto_plankton_tbl %>%
  gather(key = "time", value = "PelaChl", t0, t1, t2, t3, t4, t5) %>%  ## Ändert Tabelle aus width format into long format
  convert_as_factor(trt, time)
print(phyto_plankton_tbl, n = 40)
##--------------------------------------------------------------------------------------------------------------------------
## Visualization

pelaChl_bxp <- ggplot(data = phyto_plankton_tbl, aes(x= time, y = PelaChl, fill = trt)) +
  geom_boxplot() + 
  ylim(0, 50) +
  scale_fill_manual(values=cbPalette) + ## Adds color blind firendly palette
  ##  geom_jitter() +
  theme_bw() 

请注意,我只会返回翻译好的部分,不会回答关于翻译的问题。

英文:

Boxplot

I created a boxplot for a data set and did not use the geom_jitter function. Still there are dots inside the plot. Are those statistical values or why are they appearing?

I attached the code I use below.

pacman::p_load(tidyverse, readxl, janitor, emmeans, multcomp, magrittr,
               parameters, effectsize, multcompView, see, performance,
               conflicted, ggpubr, rstatix)
conflict_prefer(&quot;select&quot;, &quot;dplyr&quot;)
conflict_prefer(&quot;filter&quot;, &quot;dplyr&quot;)
conflict_prefer(&quot;summarise&quot;, &quot;dplyr&quot;)
conflict_prefer(&quot;extract&quot;, &quot;magrittr&quot;)

cbPalette &lt;- c(&quot;#999999&quot;, &quot;#E69F00&quot;, &quot;#56B4E9&quot;, &quot;#009E73&quot;, &quot;#F0E442&quot;, &quot;#0072B2&quot;, &quot;#D55E00&quot;, &quot;#CC79A7&quot;) ## Color blind friendly palette
##--------------------------------------------------------------------------------------------------------------------------
## Funktion um Excel Datei mit mehreren Sheets zu &#246;ffnen und eines davon auszuw&#228;hlen
library(readxl)    
read_excel_allsheets &lt;- function(filename) {
  sheets &lt;- readxl::excel_sheets(filename)
  x &lt;- lapply(sheets, function(x) readxl::read_excel(filename, sheet = x))
  return(x)
}

big_tbl &lt;- read_excel_allsheets (&quot;Mesocosms_R.xlsx&quot;)
big_tbl

phyto_plankton_tbl&lt;- big_tbl[[14]]
##--------------------------------------------------------------------------------------------------------------------------
## Data transformation
phyto_plankton_tbl %&gt;% 
  mutate(
  block = as.factor(block),
  trt = factor(trt, labels = c(&quot;-P&amp;-F&quot;, &quot;+P/-F&quot;, &quot;+P/+F&quot;, &quot;-P/+F&quot;)))

phyto_plankton_tbl &lt;- phyto_plankton_tbl %&gt;% 
  gather(key = &quot;time&quot;, value = &quot;PelaChl&quot;, t0, t1, t2, t3, t4, t5) %&gt;%  ## &#196;ndert Tabelle aus width format into long format
  convert_as_factor(trt, time)
print(phyto_plankton_tbl, n = 40)
##--------------------------------------------------------------------------------------------------------------------------
## Visualization

pelaChl_bxp &lt;- ggplot(data = phyto_plankton_tbl, aes(x= time, y = PelaChl, fill = trt)) +
  geom_boxplot() + 
  ylim(0, 50) +
  scale_fill_manual(values=cbPalette) + ## Adds color blind firendly palette
  ##  geom_jitter() +
  theme_bw() 

答案1

得分: -1

文档中可以看到:
箱线图紧凑地显示了连续变量的分布。它可视化了五个摘要统计量(中位数、两个箱须和两个须),以及所有的“异常值”点。

你所看到的个别点是“异常值”(尽管,正如Roland已经指出的那样,“异常值”是一个有争议的术语-通常人们认为可以从数据集中删除任何不寻常或极端的值,但这些值可能是反映基础数据的奇怪之处的真实数据点)。

英文:

From the documentation:
>The boxplot compactly displays the distribution of a continuous variable. It visualises five summary statistics (the median, two hinges and two whiskers), and all "outlying" points individually.

The individual points you are seeing are "outliers" (though, as Roland has helpfully pointed out,"outlier" is a loaded term- often people think they can just remove any values from a dataset which are unusual or extreme, when they may be real data points which reflect the weirdness of the underlying data).

huangapple
  • 本文由 发表于 2023年7月27日 17:22:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76778293.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定