英文:
Summarize in a column using a condition and return a new row with the summed value
问题
df %>%
group_by(group) %>%
mutate(item = ifelse(value < 10, "cheap_stuff", item)) %>%
filter(!(value < 10)) %>%
group_by(group, item) %>%
summarise(value = sum(value), percentage = sum(percentage))
英文:
I have a dataset and I am trying to find a solution for it using dplyr
. My goal is to summarize the values in the columns value
and percentage
, but only for the value
smaller than 10 and add this to a new item
name called: "cheap_stuff", while removing the rows with the low values.
My data looks like this:
df <- data.frame(group=c(rep("A",4), rep("B",4), rep("C",4), rep("D",4)),
value=c(1, 23, 15, 5, 3, 45, 7, 21, 4, 8, 26, 30, 3, 9, 37, 68),
percentage=c(2.27, 52.27, 34.09, 11.36 ,3.95 ,59.21 ,9.21 ,27.63 ,5.88 ,11.76 ,38.24 ,44.12 ,2.56 ,7.69, 31.62, 58.12),
item=c("cheap1","expensive1" ,"expensive2", "cheap2",
"cheap1", "expensive1","cheap2","expensive2",
"cheap1","cheap2","expensive1","expensive2",
"cheap1","cheap2","expensive1","expensive2"))
view(df)
group value percentage item
1 A 1 2.27 cheap1
2 A 23 52.27 expensive1
3 A 15 34.09 expensive2
4 A 5 11.36 cheap2
5 B 3 3.95 cheap1
6 B 45 59.21 expensive1
7 B 7 9.21 cheap2
8 B 21 27.63 expensive2
9 C 4 5.88 cheap1
10 C 8 11.76 cheap2
11 C 26 38.24 expensive1
12 C 30 44.12 expensive2
13 D 3 2.56 cheap1
14 D 9 7.69 cheap2
15 D 37 31.62 expensive1
16 D 68 58.12 expensive2
My desired output looks like this:
group value percentage item
1 A 6 13.64 cheap_stuff
2 A 23 52.27 expensive1
3 A 15 34.09 expensive2
4 B 10 13.16 cheap_stuff
5 B 45 59.21 expensive1
6 B 21 27.63 expensive2
7 C 12 17.65 cheap_stuff
8 C 26 38.24 expensive1
9 C 30 44.12 expensive2
10 D 12 10.26 cheap_stuff
11 D 37 31.62 expensive1
12 D 68 58.12 expensive2
This post comes in the right direction,
https://stackoverflow.com/questions/59199273/summarize-with-mathematical-conditions-in-dplyr?noredirect=1&lq=1
But, there all values are summed, and a new column is created.
I have tried something like this:
library(dplyr)
df%>%
group_by(group) %>%
mutate(item= replace(item, which(value <10),"cheap_stuff")) %>%
mutate(value = sum(value[value < 10]))
But that fails in the sense that I can not removed the rows that I want, and it write over the rows with expensive values.
# A tibble: 16 × 4
# Groups: group [4]
group value percentage item
<chr> <dbl> <dbl> <chr>
1 A 6 2.27 cheap_stuff
2 A 6 52.3 expensive1
3 A 6 34.1 expensive2
4 A 6 11.4 cheap_stuff
5 B 10 3.95 cheap_stuff
6 B 10 59.2 expensive1
7 B 10 9.21 cheap_stuff
8 B 10 27.6 expensive2
9 C 12 5.88 cheap_stuff
10 C 12 11.8 cheap_stuff
11 C 12 38.2 expensive1
12 C 12 44.1 expensive2
13 D 12 2.56 cheap_stuff
14 D 12 7.69 cheap_stuff
15 D 12 31.6 expensive1
16 D 12 58.1 expensive2
答案1
得分: 2
df %>%
group_by(group, item = case_when(value < 10 ~ "cheap_stuff",
TRUE ~ item)) %>%
summarise(value = sum(value),
percentage = sum(percentage)) %>%
ungroup
group item value percentage
<chr> <chr> <dbl> <dbl>
1 A cheap_stuff 6 13.6
2 A expensive1 23 52.3
3 A expensive2 15 34.1
4 B cheap_stuff 10 13.2
5 B expensive1 45 59.2
6 B expensive2 21 27.6
7 C cheap_stuff 12 17.6
8 C expensive1 26 38.2
9 C expensive2 30 44.1
10 D cheap_stuff 12 10.2
11 D expensive1 37 31.6
12 D expensive2 68 58.1
英文:
Using value<10
instead of grepl
:
df %>%
group_by(group,item=case_when(value < 10~"cheap_stuff",
T~item)) %>%
summarise(value=sum(value),
percentage=sum(percentage))%>%
ungroup
group item value percentage
<chr> <chr> <dbl> <dbl>
1 A cheap_stuff 6 13.6
2 A expensive1 23 52.3
3 A expensive2 15 34.1
4 B cheap_stuff 10 13.2
5 B expensive1 45 59.2
6 B expensive2 21 27.6
7 C cheap_stuff 12 17.6
8 C expensive1 26 38.2
9 C expensive2 30 44.1
10 D cheap_stuff 12 10.2
11 D expensive1 37 31.6
12 D expensive2 68 58.1
Original answer:
df %>%
group_by(group,item=case_when(grepl("cheap",item,fixed=T)~"cheap_stuff",
T~item)) %>%
summarise(value=sum(value),
percentage=sum(percentage))
group item value percentage
<chr> <chr> <dbl> <dbl>
1 A cheap_stuff 6 13.6
2 A expensive1 23 52.3
3 A expensive2 15 34.1
4 B cheap_stuff 10 13.2
5 B expensive1 45 59.2
6 B expensive2 21 27.6
7 C cheap_stuff 12 17.6
8 C expensive1 26 38.2
9 C expensive2 30 44.1
10 D cheap_stuff 12 10.2
11 D expensive1 37 31.6
12 D expensive2 68 58.1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论