使用条件对一列进行汇总,并返回一个新行,其中包含汇总后的值。

huangapple go评论75阅读模式
英文:

Summarize in a column using a condition and return a new row with the summed value

问题

df %>%
  group_by(group) %>%
  mutate(item = ifelse(value < 10, "cheap_stuff", item)) %>%
  filter(!(value < 10)) %>%
  group_by(group, item) %>%
  summarise(value = sum(value), percentage = sum(percentage))

英文:

I have a dataset and I am trying to find a solution for it using dplyr. My goal is to summarize the values in the columns value and percentage, but only for the value smaller than 10 and add this to a new item name called: "cheap_stuff", while removing the rows with the low values.

My data looks like this:

df &lt;- data.frame(group=c(rep(&quot;A&quot;,4), rep(&quot;B&quot;,4), rep(&quot;C&quot;,4), rep(&quot;D&quot;,4)),
                 value=c(1,	23,	15,	5,	3,	45,	7,	21,	4,	8,	26,	30,	3,	9,	37,	68),
                 percentage=c(2.27,	52.27,	34.09,	11.36	,3.95	,59.21	,9.21	,27.63	,5.88	,11.76	,38.24	,44.12	,2.56	,7.69, 31.62, 58.12),
                 item=c(&quot;cheap1&quot;,&quot;expensive1&quot;	,&quot;expensive2&quot;,	&quot;cheap2&quot;,
                 &quot;cheap1&quot;,	&quot;expensive1&quot;,&quot;cheap2&quot;,&quot;expensive2&quot;,
                 &quot;cheap1&quot;,&quot;cheap2&quot;,&quot;expensive1&quot;,&quot;expensive2&quot;,
                 &quot;cheap1&quot;,&quot;cheap2&quot;,&quot;expensive1&quot;,&quot;expensive2&quot;))

view(df)
   group value percentage       item
1      A     1       2.27     cheap1
2      A    23      52.27 expensive1
3      A    15      34.09 expensive2
4      A     5      11.36     cheap2
5      B     3       3.95     cheap1
6      B    45      59.21 expensive1
7      B     7       9.21     cheap2
8      B    21      27.63 expensive2
9      C     4       5.88     cheap1
10     C     8      11.76     cheap2
11     C    26      38.24 expensive1
12     C    30      44.12 expensive2
13     D     3       2.56     cheap1
14     D     9       7.69     cheap2
15     D    37      31.62 expensive1
16     D    68      58.12 expensive2

My desired output looks like this:

   group value percentage        item
1      A     6      13.64 cheap_stuff
2      A    23      52.27  expensive1
3      A    15      34.09  expensive2
4      B    10      13.16 cheap_stuff
5      B    45      59.21  expensive1
6      B    21      27.63  expensive2
7      C    12      17.65 cheap_stuff
8      C    26      38.24  expensive1
9      C    30      44.12  expensive2
10     D    12      10.26 cheap_stuff
11     D    37      31.62  expensive1
12     D    68      58.12  expensive2

This post comes in the right direction,
https://stackoverflow.com/questions/59199273/summarize-with-mathematical-conditions-in-dplyr?noredirect=1&amp;lq=1
But, there all values are summed, and a new column is created.

I have tried something like this:

library(dplyr)
df%&gt;%
  group_by(group) %&gt;%
  mutate(item= replace(item, which(value &lt;10),&quot;cheap_stuff&quot;)) %&gt;%
  mutate(value = sum(value[value &lt; 10]))

But that fails in the sense that I can not removed the rows that I want, and it write over the rows with expensive values.

# A tibble: 16 &#215; 4
# Groups:   group [4]
   group value percentage item       
   &lt;chr&gt; &lt;dbl&gt;      &lt;dbl&gt; &lt;chr&gt;      
 1 A         6       2.27 cheap_stuff
 2 A         6      52.3  expensive1 
 3 A         6      34.1  expensive2 
 4 A         6      11.4  cheap_stuff
 5 B        10       3.95 cheap_stuff
 6 B        10      59.2  expensive1 
 7 B        10       9.21 cheap_stuff
 8 B        10      27.6  expensive2 
 9 C        12       5.88 cheap_stuff
10 C        12      11.8  cheap_stuff
11 C        12      38.2  expensive1 
12 C        12      44.1  expensive2 
13 D        12       2.56 cheap_stuff
14 D        12       7.69 cheap_stuff
15 D        12      31.6  expensive1 
16 D        12      58.1  expensive2 

答案1

得分: 2

df %>%
  group_by(group, item = case_when(value < 10 ~ "cheap_stuff",
                                    TRUE ~ item)) %>%
  summarise(value = sum(value),
            percentage = sum(percentage)) %>%
  ungroup

   group item        value percentage
   <chr> <chr>       <dbl>      <dbl>
 1 A     cheap_stuff     6       13.6
 2 A     expensive1     23       52.3
 3 A     expensive2     15       34.1
 4 B     cheap_stuff    10       13.2
 5 B     expensive1     45       59.2
 6 B     expensive2     21       27.6
 7 C     cheap_stuff    12       17.6
 8 C     expensive1     26       38.2
 9 C     expensive2     30       44.1
10 D     cheap_stuff    12       10.2
11 D     expensive1     37       31.6
12 D     expensive2     68       58.1
英文:

Using value&lt;10 instead of grepl:

df %&gt;%
  group_by(group,item=case_when(value &lt; 10~&quot;cheap_stuff&quot;,
                                T~item)) %&gt;%
  summarise(value=sum(value),
            percentage=sum(percentage))%&gt;%
  ungroup

   group item        value percentage
   &lt;chr&gt; &lt;chr&gt;       &lt;dbl&gt;      &lt;dbl&gt;
 1 A     cheap_stuff     6       13.6
 2 A     expensive1     23       52.3
 3 A     expensive2     15       34.1
 4 B     cheap_stuff    10       13.2
 5 B     expensive1     45       59.2
 6 B     expensive2     21       27.6
 7 C     cheap_stuff    12       17.6
 8 C     expensive1     26       38.2
 9 C     expensive2     30       44.1
10 D     cheap_stuff    12       10.2
11 D     expensive1     37       31.6
12 D     expensive2     68       58.1

Original answer:

df %&gt;%
  group_by(group,item=case_when(grepl(&quot;cheap&quot;,item,fixed=T)~&quot;cheap_stuff&quot;,
                                T~item)) %&gt;%
  summarise(value=sum(value),
            percentage=sum(percentage))

   group item        value percentage
   &lt;chr&gt; &lt;chr&gt;       &lt;dbl&gt;      &lt;dbl&gt;
 1 A     cheap_stuff     6       13.6
 2 A     expensive1     23       52.3
 3 A     expensive2     15       34.1
 4 B     cheap_stuff    10       13.2
 5 B     expensive1     45       59.2
 6 B     expensive2     21       27.6
 7 C     cheap_stuff    12       17.6
 8 C     expensive1     26       38.2
 9 C     expensive2     30       44.1
10 D     cheap_stuff    12       10.2
11 D     expensive1     37       31.6
12 D     expensive2     68       58.1

huangapple
  • 本文由 发表于 2023年2月8日 23:40:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/75388181.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定