英文:
Summarize in a column using a condition and return a new row with the summed value
问题
df %>%
  group_by(group) %>%
  mutate(item = ifelse(value < 10, "cheap_stuff", item)) %>%
  filter(!(value < 10)) %>%
  group_by(group, item) %>%
  summarise(value = sum(value), percentage = sum(percentage))
英文:
I have a dataset and I am trying to find a solution for it using dplyr. My goal is to summarize the values in the columns value and percentage, but only for the value smaller than 10 and add this to a new item name called: "cheap_stuff", while removing the rows with the low values.
My data looks like this:
df <- data.frame(group=c(rep("A",4), rep("B",4), rep("C",4), rep("D",4)),
                 value=c(1,	23,	15,	5,	3,	45,	7,	21,	4,	8,	26,	30,	3,	9,	37,	68),
                 percentage=c(2.27,	52.27,	34.09,	11.36	,3.95	,59.21	,9.21	,27.63	,5.88	,11.76	,38.24	,44.12	,2.56	,7.69, 31.62, 58.12),
                 item=c("cheap1","expensive1"	,"expensive2",	"cheap2",
                 "cheap1",	"expensive1","cheap2","expensive2",
                 "cheap1","cheap2","expensive1","expensive2",
                 "cheap1","cheap2","expensive1","expensive2"))
view(df)
   group value percentage       item
1      A     1       2.27     cheap1
2      A    23      52.27 expensive1
3      A    15      34.09 expensive2
4      A     5      11.36     cheap2
5      B     3       3.95     cheap1
6      B    45      59.21 expensive1
7      B     7       9.21     cheap2
8      B    21      27.63 expensive2
9      C     4       5.88     cheap1
10     C     8      11.76     cheap2
11     C    26      38.24 expensive1
12     C    30      44.12 expensive2
13     D     3       2.56     cheap1
14     D     9       7.69     cheap2
15     D    37      31.62 expensive1
16     D    68      58.12 expensive2
My desired output looks like this:
   group value percentage        item
1      A     6      13.64 cheap_stuff
2      A    23      52.27  expensive1
3      A    15      34.09  expensive2
4      B    10      13.16 cheap_stuff
5      B    45      59.21  expensive1
6      B    21      27.63  expensive2
7      C    12      17.65 cheap_stuff
8      C    26      38.24  expensive1
9      C    30      44.12  expensive2
10     D    12      10.26 cheap_stuff
11     D    37      31.62  expensive1
12     D    68      58.12  expensive2
This post comes in the right direction,
https://stackoverflow.com/questions/59199273/summarize-with-mathematical-conditions-in-dplyr?noredirect=1&lq=1
But, there all values are summed, and a new column is created.
I have tried something like this:
library(dplyr)
df%>%
  group_by(group) %>%
  mutate(item= replace(item, which(value <10),"cheap_stuff")) %>%
  mutate(value = sum(value[value < 10]))
But that fails in the sense that I can not removed the rows that I want, and it write over the rows with expensive values.
# A tibble: 16 × 4
# Groups:   group [4]
   group value percentage item       
   <chr> <dbl>      <dbl> <chr>      
 1 A         6       2.27 cheap_stuff
 2 A         6      52.3  expensive1 
 3 A         6      34.1  expensive2 
 4 A         6      11.4  cheap_stuff
 5 B        10       3.95 cheap_stuff
 6 B        10      59.2  expensive1 
 7 B        10       9.21 cheap_stuff
 8 B        10      27.6  expensive2 
 9 C        12       5.88 cheap_stuff
10 C        12      11.8  cheap_stuff
11 C        12      38.2  expensive1 
12 C        12      44.1  expensive2 
13 D        12       2.56 cheap_stuff
14 D        12       7.69 cheap_stuff
15 D        12      31.6  expensive1 
16 D        12      58.1  expensive2 
答案1
得分: 2
df %>%
  group_by(group, item = case_when(value < 10 ~ "cheap_stuff",
                                    TRUE ~ item)) %>%
  summarise(value = sum(value),
            percentage = sum(percentage)) %>%
  ungroup
   group item        value percentage
   <chr> <chr>       <dbl>      <dbl>
 1 A     cheap_stuff     6       13.6
 2 A     expensive1     23       52.3
 3 A     expensive2     15       34.1
 4 B     cheap_stuff    10       13.2
 5 B     expensive1     45       59.2
 6 B     expensive2     21       27.6
 7 C     cheap_stuff    12       17.6
 8 C     expensive1     26       38.2
 9 C     expensive2     30       44.1
10 D     cheap_stuff    12       10.2
11 D     expensive1     37       31.6
12 D     expensive2     68       58.1
英文:
Using value<10 instead of grepl:
df %>%
  group_by(group,item=case_when(value < 10~"cheap_stuff",
                                T~item)) %>%
  summarise(value=sum(value),
            percentage=sum(percentage))%>%
  ungroup
   group item        value percentage
   <chr> <chr>       <dbl>      <dbl>
 1 A     cheap_stuff     6       13.6
 2 A     expensive1     23       52.3
 3 A     expensive2     15       34.1
 4 B     cheap_stuff    10       13.2
 5 B     expensive1     45       59.2
 6 B     expensive2     21       27.6
 7 C     cheap_stuff    12       17.6
 8 C     expensive1     26       38.2
 9 C     expensive2     30       44.1
10 D     cheap_stuff    12       10.2
11 D     expensive1     37       31.6
12 D     expensive2     68       58.1
Original answer:
df %>%
  group_by(group,item=case_when(grepl("cheap",item,fixed=T)~"cheap_stuff",
                                T~item)) %>%
  summarise(value=sum(value),
            percentage=sum(percentage))
   group item        value percentage
   <chr> <chr>       <dbl>      <dbl>
 1 A     cheap_stuff     6       13.6
 2 A     expensive1     23       52.3
 3 A     expensive2     15       34.1
 4 B     cheap_stuff    10       13.2
 5 B     expensive1     45       59.2
 6 B     expensive2     21       27.6
 7 C     cheap_stuff    12       17.6
 8 C     expensive1     26       38.2
 9 C     expensive2     30       44.1
10 D     cheap_stuff    12       10.2
11 D     expensive1     37       31.6
12 D     expensive2     68       58.1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论