使用条件对一列进行汇总,并返回一个新行,其中包含汇总后的值。

huangapple go评论105阅读模式
英文:

Summarize in a column using a condition and return a new row with the summed value

问题

  1. df %>%
  2. group_by(group) %>%
  3. mutate(item = ifelse(value < 10, "cheap_stuff", item)) %>%
  4. filter(!(value < 10)) %>%
  5. group_by(group, item) %>%
  6. summarise(value = sum(value), percentage = sum(percentage))
英文:

I have a dataset and I am trying to find a solution for it using dplyr. My goal is to summarize the values in the columns value and percentage, but only for the value smaller than 10 and add this to a new item name called: "cheap_stuff", while removing the rows with the low values.

My data looks like this:

  1. df &lt;- data.frame(group=c(rep(&quot;A&quot;,4), rep(&quot;B&quot;,4), rep(&quot;C&quot;,4), rep(&quot;D&quot;,4)),
  2. value=c(1, 23, 15, 5, 3, 45, 7, 21, 4, 8, 26, 30, 3, 9, 37, 68),
  3. percentage=c(2.27, 52.27, 34.09, 11.36 ,3.95 ,59.21 ,9.21 ,27.63 ,5.88 ,11.76 ,38.24 ,44.12 ,2.56 ,7.69, 31.62, 58.12),
  4. item=c(&quot;cheap1&quot;,&quot;expensive1&quot; ,&quot;expensive2&quot;, &quot;cheap2&quot;,
  5. &quot;cheap1&quot;, &quot;expensive1&quot;,&quot;cheap2&quot;,&quot;expensive2&quot;,
  6. &quot;cheap1&quot;,&quot;cheap2&quot;,&quot;expensive1&quot;,&quot;expensive2&quot;,
  7. &quot;cheap1&quot;,&quot;cheap2&quot;,&quot;expensive1&quot;,&quot;expensive2&quot;))
  8. view(df)
  9. group value percentage item
  10. 1 A 1 2.27 cheap1
  11. 2 A 23 52.27 expensive1
  12. 3 A 15 34.09 expensive2
  13. 4 A 5 11.36 cheap2
  14. 5 B 3 3.95 cheap1
  15. 6 B 45 59.21 expensive1
  16. 7 B 7 9.21 cheap2
  17. 8 B 21 27.63 expensive2
  18. 9 C 4 5.88 cheap1
  19. 10 C 8 11.76 cheap2
  20. 11 C 26 38.24 expensive1
  21. 12 C 30 44.12 expensive2
  22. 13 D 3 2.56 cheap1
  23. 14 D 9 7.69 cheap2
  24. 15 D 37 31.62 expensive1
  25. 16 D 68 58.12 expensive2

My desired output looks like this:

  1. group value percentage item
  2. 1 A 6 13.64 cheap_stuff
  3. 2 A 23 52.27 expensive1
  4. 3 A 15 34.09 expensive2
  5. 4 B 10 13.16 cheap_stuff
  6. 5 B 45 59.21 expensive1
  7. 6 B 21 27.63 expensive2
  8. 7 C 12 17.65 cheap_stuff
  9. 8 C 26 38.24 expensive1
  10. 9 C 30 44.12 expensive2
  11. 10 D 12 10.26 cheap_stuff
  12. 11 D 37 31.62 expensive1
  13. 12 D 68 58.12 expensive2

This post comes in the right direction,
https://stackoverflow.com/questions/59199273/summarize-with-mathematical-conditions-in-dplyr?noredirect=1&amp;lq=1
But, there all values are summed, and a new column is created.

I have tried something like this:

  1. library(dplyr)
  2. df%&gt;%
  3. group_by(group) %&gt;%
  4. mutate(item= replace(item, which(value &lt;10),&quot;cheap_stuff&quot;)) %&gt;%
  5. mutate(value = sum(value[value &lt; 10]))

But that fails in the sense that I can not removed the rows that I want, and it write over the rows with expensive values.

  1. # A tibble: 16 &#215; 4
  2. # Groups: group [4]
  3. group value percentage item
  4. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;
  5. 1 A 6 2.27 cheap_stuff
  6. 2 A 6 52.3 expensive1
  7. 3 A 6 34.1 expensive2
  8. 4 A 6 11.4 cheap_stuff
  9. 5 B 10 3.95 cheap_stuff
  10. 6 B 10 59.2 expensive1
  11. 7 B 10 9.21 cheap_stuff
  12. 8 B 10 27.6 expensive2
  13. 9 C 12 5.88 cheap_stuff
  14. 10 C 12 11.8 cheap_stuff
  15. 11 C 12 38.2 expensive1
  16. 12 C 12 44.1 expensive2
  17. 13 D 12 2.56 cheap_stuff
  18. 14 D 12 7.69 cheap_stuff
  19. 15 D 12 31.6 expensive1
  20. 16 D 12 58.1 expensive2

答案1

得分: 2

  1. df %>%
  2. group_by(group, item = case_when(value < 10 ~ "cheap_stuff",
  3. TRUE ~ item)) %>%
  4. summarise(value = sum(value),
  5. percentage = sum(percentage)) %>%
  6. ungroup
  7. group item value percentage
  8. <chr> <chr> <dbl> <dbl>
  9. 1 A cheap_stuff 6 13.6
  10. 2 A expensive1 23 52.3
  11. 3 A expensive2 15 34.1
  12. 4 B cheap_stuff 10 13.2
  13. 5 B expensive1 45 59.2
  14. 6 B expensive2 21 27.6
  15. 7 C cheap_stuff 12 17.6
  16. 8 C expensive1 26 38.2
  17. 9 C expensive2 30 44.1
  18. 10 D cheap_stuff 12 10.2
  19. 11 D expensive1 37 31.6
  20. 12 D expensive2 68 58.1
英文:

Using value&lt;10 instead of grepl:

  1. df %&gt;%
  2. group_by(group,item=case_when(value &lt; 10~&quot;cheap_stuff&quot;,
  3. T~item)) %&gt;%
  4. summarise(value=sum(value),
  5. percentage=sum(percentage))%&gt;%
  6. ungroup
  7. group item value percentage
  8. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  9. 1 A cheap_stuff 6 13.6
  10. 2 A expensive1 23 52.3
  11. 3 A expensive2 15 34.1
  12. 4 B cheap_stuff 10 13.2
  13. 5 B expensive1 45 59.2
  14. 6 B expensive2 21 27.6
  15. 7 C cheap_stuff 12 17.6
  16. 8 C expensive1 26 38.2
  17. 9 C expensive2 30 44.1
  18. 10 D cheap_stuff 12 10.2
  19. 11 D expensive1 37 31.6
  20. 12 D expensive2 68 58.1

Original answer:

  1. df %&gt;%
  2. group_by(group,item=case_when(grepl(&quot;cheap&quot;,item,fixed=T)~&quot;cheap_stuff&quot;,
  3. T~item)) %&gt;%
  4. summarise(value=sum(value),
  5. percentage=sum(percentage))
  6. group item value percentage
  7. &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  8. 1 A cheap_stuff 6 13.6
  9. 2 A expensive1 23 52.3
  10. 3 A expensive2 15 34.1
  11. 4 B cheap_stuff 10 13.2
  12. 5 B expensive1 45 59.2
  13. 6 B expensive2 21 27.6
  14. 7 C cheap_stuff 12 17.6
  15. 8 C expensive1 26 38.2
  16. 9 C expensive2 30 44.1
  17. 10 D cheap_stuff 12 10.2
  18. 11 D expensive1 37 31.6
  19. 12 D expensive2 68 58.1

huangapple
  • 本文由 发表于 2023年2月8日 23:40:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388181.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定