英文:
Combine rows to create a new "other" row to reduce data size
问题
以下是翻译好的内容:
有100个变量,但截图只显示了其中22个。我想能够将数据绘制成易读的条形图,但有100个变量,我无法很好地阅读它。因此,我想在品牌列下面创建一个“其他”行,将所有计数小于5的行都放入“其他”行。这种做法可行吗?还有没有其他方法可以将100个分类变量绘制成条形图并能够阅读?我附上了条形图的截图。
我附上的图片是包含所有100个变量的条形图,我无法阅读它。
英文:
There are 100 variables, but the screenshot only shows 22 of them. I want to be able to plot the data into a readable bar graph and with 100 variables I cannot read it well. So, I want to make an "other" row under the brand column that puts all rows into the "other" row that has a total count of less than 5. Is this possible? Or does anyone have another way of plotting 100 categorical variables into a bar plot and being able to read it? I attached a screenshot of the bar plot as well.
The image I attached is what the bar plot looks like will all 100 variables, I cannot read it.
答案1
得分: 2
如果这是您的数据
set.seed(42)
df <- data.frame(brand = LETTERS[1:20], Count = sample(10, 20, replace=T))
使用 bind_rows
获取修改后的数据集
val <- 5
bind_rows(df %>% filter(Count >= val),
df %>% summarize(brand = "other", Count = sum(Count[Count < val])))
brand Count
1 B 5
2 D 9
3 E 10
4 H 10
5 J 8
6 K 7
7 M 9
8 N 5
9 P 10
10 S 9
11 T 9
12 other 22
请注意,您还可以旋转标签以使它们更可读,例如 ... + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
最终的图表包括根据条形顺序排列,首先是 "other",然后按 Count 升序排列
bind_rows(df %>% filter(Count >= val),
df %>% summarize(brand = "other", Count = sum(Count[Count < val]))) %>%
mutate(brand = factor(brand, levels =
c(brand[grep("other", brand)],
brand[grep("other", brand, invert = T)][order(Count)]))) %>%
ggplot() +
geom_col(aes(brand, Count)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
英文:
If this is your data
set.seed(42)
df <- data.frame(brand = LETTERS[1:20], Count = sample(10, 20, replace=T))
Using bind_rows
to get the modified data set
val <- 5
bind_rows(df %>% filter(Count >= val),
df %>% summarize(brand = "other", Count = sum(Count[Count < val])))
brand Count
1 B 5
2 D 9
3 E 10
4 H 10
5 J 8
6 K 7
7 M 9
8 N 5
9 P 10
10 S 9
11 T 9
12 other 22
Note you can, additionally, rotate your labels to make them more readable, e.g. ... + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
Final plot including ordering of bars, first "other", then ascending by Count
bind_rows(df %>% filter(Count >= val),
df %>% summarize(brand = "other", Count = sum(Count[Count < val]))) %>%
mutate(brand = factor(brand, levels =
c(brand[grep("other", brand)],
brand[grep("other", brand, invert = T)][order(Count)]))) %>%
ggplot() +
geom_col(aes(brand, Count)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
答案2
得分: 1
我将演示如何将分类变量的子集减少到一个“其他”类别以进行绘图。
库(dplyr)
库(ggplot2)
数据(“星球大战”,包=“dplyr”)
ggplot(starwars,aes(eye_color))+ geom_bar()
计数(星球大战,眼睛颜色)%>% 排序(降序(n))
# # A tibble: 15 × 2
# eye_color n
# <chr> <int>
# 1 brown 21
# 2 blue 19
# 3 yellow 11
# 4 black 10
# 5 orange 8
# 6 red 5
# 7 hazel 3
# 8 unknown 3
# 9 blue-gray 1
# 10 dark 1
# 11 gold 1
# 12 green, yellow 1
# 13 pink 1
# 14 red, blue 1
# 15 white 1
计数(星球大战,眼睛颜色)%>% 变异(eye2 = if_else(n <= 5,“other”,eye_color))%>%
left_join(星球大战,by=“eye_color”)%>% ggplot(aes(eye2))+
geom_bar()
英文:
I'll demonstrate how to reduce a subset of categoricals into an "other" category for plotting.
library(dplyr)
library(ggplot2)
data("starwars", package="dplyr")
ggplot(starwars, aes(eye_color)) + geom_bar()
count(starwars, eye_color) %>%
arrange(desc(n))
# # A tibble: 15 × 2
# eye_color n
# <chr> <int>
# 1 brown 21
# 2 blue 19
# 3 yellow 11
# 4 black 10
# 5 orange 8
# 6 red 5
# 7 hazel 3
# 8 unknown 3
# 9 blue-gray 1
# 10 dark 1
# 11 gold 1
# 12 green, yellow 1
# 13 pink 1
# 14 red, blue 1
# 15 white 1
count(starwars, eye_color) %>%
mutate(eye2 = if_else(n <= 5, "other", eye_color)) %>%
left_join(starwars, by="eye_color") %>%
ggplot(aes(eye2)) +
geom_bar()
答案3
得分: 0
以下是代码部分的翻译:
library(tidyverse)
df <- starwars
df$brand <- df$eye_color
df1 <- df %>%
count(brand, name = 'total_count') %>%
mutate(brand = ifelse(total_count > 5, brand, "Other")) %>%
summarise(total_count = sum(total_count), .by = brand)
df1
#> # A tibble: 6 × 2
#> brand total_count
#> <chr> <int>
#> 1 black 10
#> 2 blue 19
#> 3 Other 18
#> 4 brown 21
#> 5 orange 8
#> 6 yellow 11
ggplot(df1, aes(brand, total_count)) +
geom_bar(stat='identity')
希望这有帮助!如果您有其他问题,可以继续提问。
英文:
Another option is to use the ifelse
to mutate the data, summarize it then plot:
library(tidyverse)
df <- starwars
df$brand <- df$eye_color
df1 <- df %>%
count(brand, name = 'total_count')%>%
mutate(brand = ifelse(total_count > 5, brand, "Other"))%>%
summarise(total_count = sum(total_count), .by = brand)
df1
#> # A tibble: 6 × 2
#> brand total_count
#> <chr> <int>
#> 1 black 10
#> 2 blue 19
#> 3 Other 18
#> 4 brown 21
#> 5 orange 8
#> 6 yellow 11
ggplot(df1, aes(brand, total_count)) +
geom_bar(stat='identity')
<!-- -->
<sup>Created on 2023-04-16 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论