Boxplot with additional lines for 10th and 90th percentile in R

huangapple go评论61阅读模式
英文:

Boxplot with additional lines for 10th and 90th percentile in R

问题

抱歉,你的请求是要我只翻译代码部分,以下是代码的翻译部分:

很抱歉,我在R中经验有限,但我需要解决一个看起来对我来说相当困难的问题,但如果懂得如何在R中使用箱线图,可能会很容易。如果你能帮助我解决这个问题,我将不胜感激:

我需要在分组的箱线图中添加**额外的水平线或点**来表示第10和第90百分位数。此外,箱线图应包括常见的特征,如最小值、最大值、通常的第25百分位数、中位数和第75百分位数,以及异常值。

我尝试调整这里发布的几种解决方案,但没有一种适用于我的情况。一个有希望的尝试类似于下面的解决方案,通过编写一个函数来实现,但我需要中位数而不是均值,此外,我还需要额外显示第10和第90百分位数,而不是替代它们。此外,重要的是通过变量*Col*对箱子进行分组(请参见下面的示例代码):

如果你能给我一些解决这个问题的思路,我将不胜感激!

```R
dataset_stack <- structure(list(Col = c("Blue", "Blue", "Blue", "Blue", "Blue", 
"Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", 
"Blue", "Green", "Green", "Green", "Green", "Green", "Green", "Green", 
"Green", "Green", "Green", "Green", "Green", "Green", "Green", 
"Green", "Red", "Red", "Red", "Red", "Red", "Red", "Red", "Red", 
"Red", "Red", "Red", "Red", "Red", "Red", "Red"), TTC = c(0.9, 
0.7, 0, 0.1, 0.1, 0.4, 0.9, 0.8, 0.1, 0, 0.7, 0.2, 0.7, 0.2, 
0, 0.8, 0.7, 0.8, 0.9, 0.3, 0.9, 0.8, 0.3, 1, 0.6, 0.4, 0.3, 
0.3, 0.3, 0.2, 0.2, 0.7, 0.9, 0.9, 0.6, 0.4, 0.1, 0.4, 0.8, 0, 
0.7, 0.4, 0.7)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-43L))

bp.vals <- function(x, probs=c(0.1, 0.25, 0.75, .9)) {
    r <- quantile(x, probs=probs, na.rm=TRUE)
    r = c(r[1:2], exp(mean(log(x))), r[3:4])
    names(r) <- c("ymin", "lower", "middle", "upper", "ymax")
    r
}

# 使用内置的mtcars数据帧示例使用该函数
ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
    stat_summary(fun.data=bp.vals, geom="boxplot")

请注意,这只是代码的翻译部分,不包括任何问题或其他内容。

英文:

Unfortunately I am not very experienced in R, but I need to solve a problem, which appears quite difficult to me, but probably is quite easy if one knows how to work with boxplots in R. I would be really grateful if you could help me with this:

I need to add additional horizontal lines or dots in a grouped boxplot diagram for the 10th and 90th percentiles. Besides this, the boxplot should entail the common features such as min, max, the box with the usual 25th percentile, median and 75th percentile and outliers.

I tried to adapt several of the solutions posted here, but none of them works for my case. One promising attempt would be similar to the solution below with writing a function - but I need the median not the mean and besides this I would need to display the 10th and 90th percentile additionally not instead. Also, it is important to group the boxes by the variable Col (see sample code below):

If you could give me some ideas how to solve this, I would be really grateful!

dataset_stack &lt;- structure(list(Col = c(&quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, 
&quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, &quot;Blue&quot;, 
&quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, 
&quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, &quot;Green&quot;, 
&quot;Green&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, 
&quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;, &quot;Red&quot;), TTC = c(0.9, 
0.7, 0, 0.1, 0.1, 0.4, 0.9, 0.8, 0.1, 0, 0.7, 0.2, 0.7, 0.2, 
0, 0.8, 0.7, 0.8, 0.9, 0.3, 0.9, 0.8, 0.3, 1, 0.6, 0.4, 0.3, 
0.3, 0.3, 0.2, 0.2, 0.7, 0.9, 0.9, 0.6, 0.4, 0.1, 0.4, 0.8, 0, 
0.7, 0.4, 0.7)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, 
-43L))
bp.vals &lt;- function(x, probs=c(0.1, 0.25, 0.75, .9)) {
r &lt;- quantile(x, probs=probs , na.rm=TRUE)
r = c(r[1:2], exp(mean(log(x))), r[3:4])
names(r) &lt;- c(&quot;ymin&quot;, &quot;lower&quot;, &quot;middle&quot;, &quot;upper&quot;, &quot;ymax&quot;)
r
}
# Sample usage of the function with the built-in mtcars data frame
ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
stat_summary(fun.data=bp.vals, geom=&quot;boxplot&quot;)

答案1

得分: 1

You could use the stat_summary() function and add a fun() to indicate specific quantile() and median as colored points. If your data contains outliers, they would be shown, for example, in orange color:

ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
    geom_boxplot(outlier.color = "orange3", outlier.size = 4) +
    stat_summary(fun.y="median", geom="point", shape=16, size=4, color="darkred") +
    stat_summary(geom="point", fun = \(x) quantile(x, 0.1, na.rm = T), shape = 16, size = 4, color = "red") +
    stat_summary(geom="point", fun = \(x) quantile(x, 0.9, na.rm = T), shape = 16, size = 4, color = "blue") +
    theme_bw()

If you want to show only, e.g., mean in black, median in dark red, min, and max values with, e.g., grey color, you could use the function stat_summary():

ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
    geom_boxplot(outlier.color = "orange3", outlier.size = 4) +
    stat_summary(fun.y="mean", geom="point", shape=16, size=4, color="black") +
    stat_summary(fun.y="median", geom="point", shape=16, size=4, color="darkred") +
    stat_summary(fun.y="min", geom="point", shape=16, size=4, color="grey") +
    stat_summary(fun.y="max", geom="point", shape=16, size=4, color="grey") +
    theme_bw()

Adding all together:

ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
    geom_boxplot(outlier.color = "orange3", outlier.size = 4) +
    stat_summary(fun.y="mean", geom="point", shape=16, size=4, color="black") +
    stat_summary(fun.y="median", geom="point", shape=16, size=4, color="darkred") +
    stat_summary(fun.y="min", geom="point", shape=16, size=4, color="grey") +
    stat_summary(fun.y="max", geom="point", shape=16, size=4, color="grey") +
    stat_summary(geom="point", fun = \(x) quantile(x, 0.1, na.rm = T), shape = 16, size = 4, color = "red") +
    stat_summary(geom="point", fun = \(x) quantile(x, 0.9, na.rm = T), shape = 16, size = 4, color = "blue") +
    theme_bw()
英文:

You could use stat_summary() function and adding a fun() to indicate specific quantile() and median as colored points. If your data contains outliers they would be shown for example in orange color:

ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
geom_boxplot(outlier.color = &quot;orange3&quot;, outlier.size = 4) + 
stat_summary(fun.y=&quot;median&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;darkred&quot;) +
stat_summary(geom = &quot;point&quot;, fun = \(x) quantile(x, 0.1,na.rm=T),shape=16, size=4,color=&quot;red&quot;)+
stat_summary(geom = &quot;point&quot;, fun = \(x) quantile(x, 0.9,na.rm=T),shape=16, size=4,color=&quot;blue&quot;)+
theme_bw()  

Boxplot with additional lines for 10th and 90th percentile in R

If you want to show only e.g, mean in black, median in dark red, min, and max values with e.g grey color you could use the function stat_summary() :

ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
geom_boxplot(outlier.color = &quot;orange3&quot;, outlier.size = 4) + 
stat_summary(fun.y=&quot;mean&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;black&quot;) +
stat_summary(fun.y=&quot;median&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;darkred&quot;) +
stat_summary(fun.y=&quot;min&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;grey&quot;) +
stat_summary(fun.y=&quot;max&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;grey&quot;) +
theme_bw() 

Boxplot with additional lines for 10th and 90th percentile in R

adding all together:

ggplot(dataset_stack, aes(x=factor(Col), y=TTC)) +
geom_boxplot(outlier.color = &quot;orange3&quot;, outlier.size = 4) + 
stat_summary(fun.y=&quot;mean&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;black&quot;) +
stat_summary(fun.y=&quot;median&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;darkred&quot;) +
stat_summary(fun.y=&quot;min&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;grey&quot;) +
stat_summary(fun.y=&quot;max&quot;, geom=&quot;point&quot;, shape=16, size=4, color=&quot;grey&quot;) +
stat_summary(geom = &quot;point&quot;, fun = \(x) quantile(x, 0.1,na.rm=T),shape=16, size=4,color=&quot;red&quot;)+
stat_summary(geom = &quot;point&quot;, fun = \(x) quantile(x, 0.9,na.rm=T),shape=16, size=4,color=&quot;blue&quot;)+
theme_bw()  

Boxplot with additional lines for 10th and 90th percentile in R

huangapple
  • 本文由 发表于 2023年3月21日 02:53:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/75794216.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定