合并行以创建新的“其他”行以减小数据大小

huangapple go评论102阅读模式
英文:

Combine rows to create a new "other" row to reduce data size

问题

以下是翻译好的内容:

有100个变量,但截图只显示了其中22个。我想能够将数据绘制成易读的条形图,但有100个变量,我无法很好地阅读它。因此,我想在品牌列下面创建一个“其他”行,将所有计数小于5的行都放入“其他”行。这种做法可行吗?还有没有其他方法可以将100个分类变量绘制成条形图并能够阅读?我附上了条形图的截图。

合并行以创建新的“其他”行以减小数据大小

我附上的图片是包含所有100个变量的条形图,我无法阅读它。

英文:

There are 100 variables, but the screenshot only shows 22 of them. I want to be able to plot the data into a readable bar graph and with 100 variables I cannot read it well. So, I want to make an "other" row under the brand column that puts all rows into the "other" row that has a total count of less than 5. Is this possible? Or does anyone have another way of plotting 100 categorical variables into a bar plot and being able to read it? I attached a screenshot of the bar plot as well.

合并行以创建新的“其他”行以减小数据大小

The image I attached is what the bar plot looks like will all 100 variables, I cannot read it.

合并行以创建新的“其他”行以减小数据大小

答案1

得分: 2

如果这是您的数据

  1. set.seed(42)
  2. df <- data.frame(brand = LETTERS[1:20], Count = sample(10, 20, replace=T))

使用 bind_rows 获取修改后的数据集

  1. val <- 5
  2. bind_rows(df %>% filter(Count >= val),
  3. df %>% summarize(brand = "other", Count = sum(Count[Count < val])))
  4. brand Count
  5. 1 B 5
  6. 2 D 9
  7. 3 E 10
  8. 4 H 10
  9. 5 J 8
  10. 6 K 7
  11. 7 M 9
  12. 8 N 5
  13. 9 P 10
  14. 10 S 9
  15. 11 T 9
  16. 12 other 22

请注意,您还可以旋转标签以使它们更可读,例如 ... + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

最终的图表包括根据条形顺序排列,首先是 "other",然后按 Count 升序排列

  1. bind_rows(df %>% filter(Count >= val),
  2. df %>% summarize(brand = "other", Count = sum(Count[Count < val]))) %>%
  3. mutate(brand = factor(brand, levels =
  4. c(brand[grep("other", brand)],
  5. brand[grep("other", brand, invert = T)][order(Count)]))) %>%
  6. ggplot() +
  7. geom_col(aes(brand, Count)) +
  8. theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

合并行以创建新的“其他”行以减小数据大小

英文:

If this is your data

  1. set.seed(42)
  2. df &lt;- data.frame(brand = LETTERS[1:20], Count = sample(10, 20, replace=T))

Using bind_rows to get the modified data set

  1. val &lt;- 5
  2. bind_rows(df %&gt;% filter(Count &gt;= val),
  3. df %&gt;% summarize(brand = &quot;other&quot;, Count = sum(Count[Count &lt; val])))
  4. brand Count
  5. 1 B 5
  6. 2 D 9
  7. 3 E 10
  8. 4 H 10
  9. 5 J 8
  10. 6 K 7
  11. 7 M 9
  12. 8 N 5
  13. 9 P 10
  14. 10 S 9
  15. 11 T 9
  16. 12 other 22

Note you can, additionally, rotate your labels to make them more readable, e.g. ... + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

Final plot including ordering of bars, first "other", then ascending by Count

  1. bind_rows(df %&gt;% filter(Count &gt;= val),
  2. df %&gt;% summarize(brand = &quot;other&quot;, Count = sum(Count[Count &lt; val]))) %&gt;%
  3. mutate(brand = factor(brand, levels =
  4. c(brand[grep(&quot;other&quot;, brand)],
  5. brand[grep(&quot;other&quot;, brand, invert = T)][order(Count)]))) %&gt;%
  6. ggplot() +
  7. geom_col(aes(brand, Count)) +
  8. theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

合并行以创建新的“其他”行以减小数据大小

答案2

得分: 1

我将演示如何将分类变量的子集减少到一个“其他”类别以进行绘图。

  1. 库(dplyr
  2. 库(ggplot2
  3. 数据(“星球大战”,包=dplyr”)
  4. ggplotstarwarsaeseye_color))+ geom_bar()

合并行以创建新的“其他”行以减小数据大小

  1. 计数(星球大战,眼睛颜色)%>% 排序(降序(n))
  2. # # A tibble: 15 &#215; 2
  3. # eye_color n
  4. # &lt;chr&gt; &lt;int&gt;
  5. # 1 brown 21
  6. # 2 blue 19
  7. # 3 yellow 11
  8. # 4 black 10
  9. # 5 orange 8
  10. # 6 red 5
  11. # 7 hazel 3
  12. # 8 unknown 3
  13. # 9 blue-gray 1
  14. # 10 dark 1
  15. # 11 gold 1
  16. # 12 green, yellow 1
  17. # 13 pink 1
  18. # 14 red, blue 1
  19. # 15 white 1
  20. 计数(星球大战,眼睛颜色)%>% 变异(eye2 = if_elsen <= 5,“other”,eye_color))%>%
  21. left_join(星球大战,by=“eye_color”)%>% ggplotaeseye2))+
  22. geom_bar()

合并行以创建新的“其他”行以减小数据大小

英文:

I'll demonstrate how to reduce a subset of categoricals into an "other" category for plotting.

  1. library(dplyr)
  2. library(ggplot2)
  3. data(&quot;starwars&quot;, package=&quot;dplyr&quot;)
  4. ggplot(starwars, aes(eye_color)) + geom_bar()

合并行以创建新的“其他”行以减小数据大小

  1. count(starwars, eye_color) %&gt;%
  2. arrange(desc(n))
  3. # # A tibble: 15 &#215; 2
  4. # eye_color n
  5. # &lt;chr&gt; &lt;int&gt;
  6. # 1 brown 21
  7. # 2 blue 19
  8. # 3 yellow 11
  9. # 4 black 10
  10. # 5 orange 8
  11. # 6 red 5
  12. # 7 hazel 3
  13. # 8 unknown 3
  14. # 9 blue-gray 1
  15. # 10 dark 1
  16. # 11 gold 1
  17. # 12 green, yellow 1
  18. # 13 pink 1
  19. # 14 red, blue 1
  20. # 15 white 1
  21. count(starwars, eye_color) %&gt;%
  22. mutate(eye2 = if_else(n &lt;= 5, &quot;other&quot;, eye_color)) %&gt;%
  23. left_join(starwars, by=&quot;eye_color&quot;) %&gt;%
  24. ggplot(aes(eye2)) +
  25. geom_bar()

合并行以创建新的“其他”行以减小数据大小

答案3

得分: 0

以下是代码部分的翻译:

  1. library(tidyverse)
  2. df <- starwars
  3. df$brand <- df$eye_color
  4. df1 <- df %>%
  5. count(brand, name = 'total_count') %>%
  6. mutate(brand = ifelse(total_count > 5, brand, "Other")) %>%
  7. summarise(total_count = sum(total_count), .by = brand)
  8. df1
  9. #> # A tibble: 6 × 2
  10. #> brand total_count
  11. #> <chr> <int>
  12. #> 1 black 10
  13. #> 2 blue 19
  14. #> 3 Other 18
  15. #> 4 brown 21
  16. #> 5 orange 8
  17. #> 6 yellow 11
  18. ggplot(df1, aes(brand, total_count)) +
  19. geom_bar(stat='identity')

希望这有帮助!如果您有其他问题,可以继续提问。

英文:

Another option is to use the ifelse to mutate the data, summarize it then plot:

  1. library(tidyverse)
  2. df &lt;- starwars
  3. df$brand &lt;- df$eye_color
  4. df1 &lt;- df %&gt;%
  5. count(brand, name = &#39;total_count&#39;)%&gt;%
  6. mutate(brand = ifelse(total_count &gt; 5, brand, &quot;Other&quot;))%&gt;%
  7. summarise(total_count = sum(total_count), .by = brand)
  8. df1
  9. #&gt; # A tibble: 6 &#215; 2
  10. #&gt; brand total_count
  11. #&gt; &lt;chr&gt; &lt;int&gt;
  12. #&gt; 1 black 10
  13. #&gt; 2 blue 19
  14. #&gt; 3 Other 18
  15. #&gt; 4 brown 21
  16. #&gt; 5 orange 8
  17. #&gt; 6 yellow 11
  18. ggplot(df1, aes(brand, total_count)) +
  19. geom_bar(stat=&#39;identity&#39;)

合并行以创建新的“其他”行以减小数据大小<!-- -->

<sup>Created on 2023-04-16 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年4月17日 03:47:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76029999.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定