英文:
Split and aggregating results without a loop
问题
以下是翻译的代码部分:
I have a dataframe where I would like to group the results by name and type, and if my scenario has an "up" or "down", report these as separate columns line (where the "value" is the minimum of the these two results).
Using string split and loops, I can sort through the columns, split the string, replace with N/A etc but this feels incredibly clumsy.
Can anyone suggest a better way to produce the preferred output that doesn't use a loop?
My input looks like:
input = structure(list(Name = c("Alice", "Alice ", "Tim", "Tim", "Greg",
"Greg"), Value = c("-5", "6", "5", "-2", "5", "7"), Type = c("Sales",
"Sales", "Returns", "Returns", "Promo", "Promo"), Scenario = c("Down",
"Up", "Down_RED_One", "Up_RED_One", "BLUE", "YELLOW")), row.names = c(NA,
6L), class = "data.frame")
And my preferred output is:
output = structure(list(Name = c("Alice", "Tim", "Greg", "Greg"), Value = c("-5",
"-2", "5", "7"), Type = c("Sales", "Returns", "Promo", "Promo"
), Scenario = c("N/A", "RED_One", "BLUE", "YELLOW"), Up = c("6",
"-2", "N/A", "N/A"), Down = c("-5", "5", "N/A", "N/A")), row.names = 1:4, class = "data.frame")
英文:
I have a dataframe where I would like to group the results by name and type, and if my scenario has an "up" or "down", report these as separate columns line (where the "value" is the minimum of the these two results).
Using string split and loops, I can sort through the columns, split the string, replace with N/A etc but this feels incredibly clumsy.
Can anyone suggest a better way to produce the preferred output that doesn't use a loop?
My input looks like:
input = structure(list(Name = c("Alice", "Alice ", "Tim", "Tim", "Greg",
"Greg"), Value = c("-5", "6", "5", "-2", "5", "7"), Type = c("Sales",
"Sales", "Returns", "Returns", "Promo", "Promo"), Scenario = c("Down",
"Up", "Down_RED_One", "Up_RED_One", "BLUE", "YELLOW")), row.names = c(NA,
6L), class = "data.frame")
And my preferred output is:
output = structure(list(Name = c("Alice", "Tim", "Greg", "Greg"), Value = c("-5",
"-2", "5", "7"), Type = c("Sales", "Returns", "Promo", "Promo"
), Scenario = c("N/A", "RED_One", "BLUE", "YELLOW"), Up = c("6",
"-2", "N/A", "N/A"), Down = c("-5", "5", "N/A", "N/A")), row.names = 1:4, class = "data.frame")
答案1
得分: 2
对"Up"
和"Down"
进行一些初始转换,获取最小值,然后使用tidyr::pivot_wider()
函数:
library(dplyr) # >= v1.1.0
library(stringr)
library(tidyr)
input %>%
mutate(
UpDown = str_extract(Scenario, "Up|Down"),
Scenario = na_if(str_remove(Scenario, "_?(Up|Down)_?"), ""),
ValuePivot = Value
) %>%
mutate(Value = min(Value), .by = c(Name, Scenario)) %>%
pivot_wider(names_from = UpDown, values_from = ValuePivot) %>%
select(!`NA`)
# 一个tibble:4 × 6
Name Value Type Scenario Down Up
1 Alice -5 Sales <NA> -5 6
2 Tim -2 Returns RED_One 5 -2
3 Greg 5 Promo BLUE <NA> <NA>
4 Greg 7 Promo YELLOW <NA> <NA>
请注意,"Alice"
的一个值末尾带有空格,我假设这是一个错误并已删除。
英文:
Do some initial transformation to extract "Up"
and "Down"
and get your minima, then tidyr::pivot_wider()
:
library(dplyr) # >= v1.1.0
library(stringr)
library(tidyr)
input %>%
mutate(
UpDown = str_extract(Scenario, "Up|Down"),
Scenario = na_if(str_remove(Scenario, "_?(Up|Down)_?"), ""),
ValuePivot = Value
) %>%
mutate(Value = min(Value), .by = c(Name, Scenario)) %>%
pivot_wider(names_from = UpDown, values_from = ValuePivot) %>%
select(!`NA`)
# A tibble: 4 × 6
Name Value Type Scenario Down Up
<chr> <chr> <chr> <chr> <chr> <chr>
1 Alice -5 Sales <NA> -5 6
2 Tim -2 Returns RED_One 5 -2
3 Greg 5 Promo BLUE <NA> <NA>
4 Greg 7 Promo YELLOW <NA> <NA>
Note, one value of "Alice"
had a trailing space, which I assumed was an error and removed.
答案2
得分: 1
我的答案有效,但我不禁想到一定有更好的方法。
我已经添加了几个注释来解释每个部分的作用和原因。如果您有任何问题,请告诉我。
library(tidyverse)
input %>%
separate_wider_regex(Scenario, # 将“Up”和“Down”分开
patterns = c(Up = "Up|Down|", Scenario = ".*"),
too_few = "error") %>%
mutate(Scenario = trimws(Scenario, which = "left", whitespace = "_"), # 去掉前导下划线
Name = trimws(Name, "both"), # 去掉导致不匹配的空格
Up = ifelse(Up == "", NA, Up)) %>% # 将空字符串更改为NA以进行数据透视
pivot_wider(values_from = Value, names_from = Up)
# # A tibble: 4 × 6
# Name Type Scenario Down Up `NA`
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 Alice Sales "" -5 6 <NA>
# 2 Tim Returns "RED_One" 5 -2 <NA>
# 3 Greg Promo "BLUE" <NA> <NA> 5
# 4 Greg Promo "YELLOW" <NA> <NA> 7
英文:
My answer works, but I can't help but think there has to be a better way.
I have added several comments to explain what does what and why. If you have any questions, let me know.
library(tidyverse)
input %>%
separate_wider_regex(Scenario, # separate up and down
patterns = c(Up = "Up|Down|", Scenario = ".*"),
too_few = "error") %>%
mutate(Scenario = trimws(Scenario, which = "left", whitespace = "_"), # drop leading _
Name = trimws(Name, "both"), # drop whitespace causing mismatch
Up = ifelse(Up == "", NA, Up)) %>% # change blank strings to NA for pivot
pivot_wider(values_from = Value, names_from = Up)
# # A tibble: 4 × 6
# Name Type Scenario Down Up `NA`
# <chr> <chr> <chr> <chr> <chr> <chr>
# 1 Alice Sales "" -5 6 <NA>
# 2 Tim Returns "RED_One" 5 -2 <NA>
# 3 Greg Promo "BLUE" <NA> <NA> 5
# 4 Greg Promo "YELLOW" <NA> <NA> 7
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论