合并行以创建新的“其他”行以减小数据大小

huangapple go评论65阅读模式
英文:

Combine rows to create a new "other" row to reduce data size

问题

以下是翻译好的内容:

有100个变量,但截图只显示了其中22个。我想能够将数据绘制成易读的条形图,但有100个变量,我无法很好地阅读它。因此,我想在品牌列下面创建一个“其他”行,将所有计数小于5的行都放入“其他”行。这种做法可行吗?还有没有其他方法可以将100个分类变量绘制成条形图并能够阅读?我附上了条形图的截图。

合并行以创建新的“其他”行以减小数据大小

我附上的图片是包含所有100个变量的条形图,我无法阅读它。

英文:

There are 100 variables, but the screenshot only shows 22 of them. I want to be able to plot the data into a readable bar graph and with 100 variables I cannot read it well. So, I want to make an "other" row under the brand column that puts all rows into the "other" row that has a total count of less than 5. Is this possible? Or does anyone have another way of plotting 100 categorical variables into a bar plot and being able to read it? I attached a screenshot of the bar plot as well.

合并行以创建新的“其他”行以减小数据大小

The image I attached is what the bar plot looks like will all 100 variables, I cannot read it.

合并行以创建新的“其他”行以减小数据大小

答案1

得分: 2

如果这是您的数据

set.seed(42)

df <- data.frame(brand = LETTERS[1:20], Count = sample(10, 20, replace=T))

使用 bind_rows 获取修改后的数据集

val <- 5

bind_rows(df %>% filter(Count >= val), 
          df %>% summarize(brand = "other", Count = sum(Count[Count < val])))
   brand Count
1      B     5
2      D     9
3      E    10
4      H    10
5      J     8
6      K     7
7      M     9
8      N     5
9      P    10
10     S     9
11     T     9
12 other    22

请注意,您还可以旋转标签以使它们更可读,例如 ... + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

最终的图表包括根据条形顺序排列,首先是 "other",然后按 Count 升序排列

bind_rows(df %>% filter(Count >= val), 
          df %>% summarize(brand = "other", Count = sum(Count[Count < val]))) %>%
  mutate(brand = factor(brand, levels = 
                   c(brand[grep("other", brand)], 
                     brand[grep("other", brand, invert = T)][order(Count)]))) %>%
  ggplot() + 
    geom_col(aes(brand, Count)) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

合并行以创建新的“其他”行以减小数据大小

英文:

If this is your data

set.seed(42)

df &lt;- data.frame(brand = LETTERS[1:20], Count = sample(10, 20, replace=T))

Using bind_rows to get the modified data set

val &lt;- 5

bind_rows(df %&gt;% filter(Count &gt;= val), 
          df %&gt;% summarize(brand = &quot;other&quot;, Count = sum(Count[Count &lt; val])))
   brand Count
1      B     5
2      D     9
3      E    10
4      H    10
5      J     8
6      K     7
7      M     9
8      N     5
9      P    10
10     S     9
11     T     9
12 other    22

Note you can, additionally, rotate your labels to make them more readable, e.g. ... + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Final plot including ordering of bars, first "other", then ascending by Count

bind_rows(df %&gt;% filter(Count &gt;= val), 
          df %&gt;% summarize(brand = &quot;other&quot;, Count = sum(Count[Count &lt; val]))) %&gt;%
  mutate(brand = factor(brand, levels = 
                   c(brand[grep(&quot;other&quot;, brand)], 
                     brand[grep(&quot;other&quot;, brand, invert = T)][order(Count)]))) %&gt;%
  ggplot() + 
    geom_col(aes(brand, Count)) + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

合并行以创建新的“其他”行以减小数据大小

答案2

得分: 1

我将演示如何将分类变量的子集减少到一个“其他”类别以进行绘图。

库(dplyr)
库(ggplot2)
数据(“星球大战”,包=“dplyr”)
ggplot(starwars,aes(eye_color))+ geom_bar()

合并行以创建新的“其他”行以减小数据大小

计数(星球大战,眼睛颜色)%>% 排序(降序(n))
# # A tibble: 15 &#215; 2
#    eye_color         n
#    &lt;chr&gt;         &lt;int&gt;
#  1 brown            21
#  2 blue             19
#  3 yellow           11
#  4 black            10
#  5 orange            8
#  6 red               5
#  7 hazel             3
#  8 unknown           3
#  9 blue-gray         1
# 10 dark              1
# 11 gold              1
# 12 green, yellow     1
# 13 pink              1
# 14 red, blue         1
# 15 white             1
计数(星球大战,眼睛颜色)%>% 变异(eye2 = if_else(n <= 5,“other”,eye_color))%>%
  left_join(星球大战,by=“eye_color”)%>% ggplot(aes(eye2))+
  geom_bar()

合并行以创建新的“其他”行以减小数据大小

英文:

I'll demonstrate how to reduce a subset of categoricals into an "other" category for plotting.

library(dplyr)
library(ggplot2)
data(&quot;starwars&quot;, package=&quot;dplyr&quot;)
ggplot(starwars, aes(eye_color)) + geom_bar()

合并行以创建新的“其他”行以减小数据大小

count(starwars, eye_color) %&gt;%
  arrange(desc(n))
# # A tibble: 15 &#215; 2
#    eye_color         n
#    &lt;chr&gt;         &lt;int&gt;
#  1 brown            21
#  2 blue             19
#  3 yellow           11
#  4 black            10
#  5 orange            8
#  6 red               5
#  7 hazel             3
#  8 unknown           3
#  9 blue-gray         1
# 10 dark              1
# 11 gold              1
# 12 green, yellow     1
# 13 pink              1
# 14 red, blue         1
# 15 white             1
count(starwars, eye_color) %&gt;%
  mutate(eye2 = if_else(n &lt;= 5, &quot;other&quot;, eye_color)) %&gt;%
  left_join(starwars, by=&quot;eye_color&quot;) %&gt;%
  ggplot(aes(eye2)) +
  geom_bar()

合并行以创建新的“其他”行以减小数据大小

答案3

得分: 0

以下是代码部分的翻译:

library(tidyverse)
df <- starwars
df$brand <- df$eye_color

df1 <- df %>%
  count(brand, name = 'total_count') %>%
  mutate(brand = ifelse(total_count > 5, brand, "Other")) %>%
  summarise(total_count = sum(total_count), .by = brand)
df1
#> # A tibble: 6 × 2
#>   brand  total_count
#>   <chr>        <int>
#> 1 black           10
#> 2 blue            19
#> 3 Other           18
#> 4 brown           21
#> 5 orange           8
#> 6 yellow          11

ggplot(df1, aes(brand, total_count)) +
  geom_bar(stat='identity')

希望这有帮助!如果您有其他问题,可以继续提问。

英文:

Another option is to use the ifelse to mutate the data, summarize it then plot:

library(tidyverse)
df &lt;- starwars
df$brand &lt;- df$eye_color

df1 &lt;- df %&gt;%
  count(brand, name = &#39;total_count&#39;)%&gt;%
  mutate(brand = ifelse(total_count &gt; 5, brand, &quot;Other&quot;))%&gt;%
  summarise(total_count = sum(total_count), .by = brand)
df1
#&gt; # A tibble: 6 &#215; 2
#&gt;   brand  total_count
#&gt;   &lt;chr&gt;        &lt;int&gt;
#&gt; 1 black           10
#&gt; 2 blue            19
#&gt; 3 Other           18
#&gt; 4 brown           21
#&gt; 5 orange           8
#&gt; 6 yellow          11

  ggplot(df1, aes(brand, total_count)) +
  geom_bar(stat=&#39;identity&#39;)

合并行以创建新的“其他”行以减小数据大小<!-- -->

<sup>Created on 2023-04-16 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年4月17日 03:47:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定