分割和汇总结果,无需循环。

huangapple go评论58阅读模式
英文:

Split and aggregating results without a loop

问题

以下是翻译的代码部分:

I have a dataframe where I would like to group the results by name and type, and if my scenario has an "up" or "down", report these as separate columns line (where the "value" is the minimum of the these two results).

Using string split and loops, I can sort through the columns, split the string, replace with N/A etc but this feels incredibly clumsy.

Can anyone suggest a better way to produce the preferred output that doesn't use a loop?


My input looks like:

input = structure(list(Name = c("Alice", "Alice ", "Tim", "Tim", "Greg",
"Greg"), Value = c("-5", "6", "5", "-2", "5", "7"), Type = c("Sales",
"Sales", "Returns", "Returns", "Promo", "Promo"), Scenario = c("Down",
"Up", "Down_RED_One", "Up_RED_One", "BLUE", "YELLOW")), row.names = c(NA,
6L), class = "data.frame")


And my preferred output is:

output = structure(list(Name = c("Alice", "Tim", "Greg", "Greg"), Value = c("-5",
"-2", "5", "7"), Type = c("Sales", "Returns", "Promo", "Promo"
), Scenario = c("N/A", "RED_One", "BLUE", "YELLOW"), Up = c("6",
"-2", "N/A", "N/A"), Down = c("-5", "5", "N/A", "N/A")), row.names = 1:4, class = "data.frame")

英文:

I have a dataframe where I would like to group the results by name and type, and if my scenario has an "up" or "down", report these as separate columns line (where the "value" is the minimum of the these two results).

Using string split and loops, I can sort through the columns, split the string, replace with N/A etc but this feels incredibly clumsy.

Can anyone suggest a better way to produce the preferred output that doesn't use a loop?

My input looks like:

input = structure(list(Name = c("Alice", "Alice ", "Tim", "Tim", "Greg", 
"Greg"), Value = c("-5", "6", "5", "-2", "5", "7"), Type = c("Sales", 
"Sales", "Returns", "Returns", "Promo", "Promo"), Scenario = c("Down", 
"Up", "Down_RED_One", "Up_RED_One", "BLUE", "YELLOW")), row.names = c(NA, 
6L), class = "data.frame")

And my preferred output is:

output = structure(list(Name = c("Alice", "Tim", "Greg", "Greg"), Value = c("-5", 
"-2", "5", "7"), Type = c("Sales", "Returns", "Promo", "Promo"
), Scenario = c("N/A", "RED_One", "BLUE", "YELLOW"), Up = c("6", 
"-2", "N/A", "N/A"), Down = c("-5", "5", "N/A", "N/A")), row.names = 1:4, class = "data.frame")

答案1

得分: 2

"Up""Down"进行一些初始转换,获取最小值,然后使用tidyr::pivot_wider()函数:


library(dplyr)   # >= v1.1.0
library(stringr)
library(tidyr)

input %>%
  mutate(
    UpDown = str_extract(Scenario, "Up|Down"),
    Scenario = na_if(str_remove(Scenario, "_?(Up|Down)_?"), ""),
    ValuePivot = Value
  ) %>%
  mutate(Value = min(Value), .by = c(Name, Scenario)) %>%
  pivot_wider(names_from = UpDown, values_from = ValuePivot) %>%
  select(!`NA`)
# 一个tibble:4 × 6
  Name  Value Type    Scenario Down  Up   
1 Alice -5    Sales   <NA>     -5    6    
2 Tim   -2    Returns RED_One  5     -2   
3 Greg  5     Promo   BLUE     <NA>  <NA> 
4 Greg  7     Promo   YELLOW   <NA>  <NA>

请注意,&quot;Alice&quot;的一个值末尾带有空格,我假设这是一个错误并已删除。

英文:

Do some initial transformation to extract &quot;Up&quot; and &quot;Down&quot; and get your minima, then tidyr::pivot_wider():


library(dplyr)   # &gt;= v1.1.0
library(stringr)
library(tidyr)

input %&gt;%
  mutate(
    UpDown = str_extract(Scenario, &quot;Up|Down&quot;),
    Scenario = na_if(str_remove(Scenario, &quot;_?(Up|Down)_?&quot;), &quot;&quot;),
    ValuePivot = Value
  ) %&gt;%
  mutate(Value = min(Value), .by = c(Name, Scenario)) %&gt;%
  pivot_wider(names_from = UpDown, values_from = ValuePivot) %&gt;%
  select(!`NA`)
# A tibble: 4 &#215; 6
  Name  Value Type    Scenario Down  Up   
  &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt;
1 Alice -5    Sales   &lt;NA&gt;     -5    6    
2 Tim   -2    Returns RED_One  5     -2   
3 Greg  5     Promo   BLUE     &lt;NA&gt;  &lt;NA&gt; 
4 Greg  7     Promo   YELLOW   &lt;NA&gt;  &lt;NA&gt;

Note, one value of &quot;Alice&quot; had a trailing space, which I assumed was an error and removed.

答案2

得分: 1

我的答案有效,但我不禁想到一定有更好的方法。

我已经添加了几个注释来解释每个部分的作用和原因。如果您有任何问题,请告诉我。

library(tidyverse)

input %>%
  separate_wider_regex(Scenario,                                # 将“Up”和“Down”分开
                       patterns = c(Up = "Up|Down|", Scenario = ".*"),
                       too_few = "error") %>%
  mutate(Scenario = trimws(Scenario, which = "left", whitespace = "_"), # 去掉前导下划线
         Name = trimws(Name, "both"),       # 去掉导致不匹配的空格
         Up = ifelse(Up == "", NA, Up)) %>% # 将空字符串更改为NA以进行数据透视
  pivot_wider(values_from = Value, names_from = Up)
# # A tibble: 4 × 6
#   Name  Type    Scenario  Down  Up    `NA` 
#   <chr> <chr>   <chr>     <chr> <chr> <chr>
# 1 Alice Sales   ""        -5    6     <NA> 
# 2 Tim   Returns "RED_One" 5     -2    <NA> 
# 3 Greg  Promo   "BLUE"    <NA>  <NA>  5    
# 4 Greg  Promo   "YELLOW"  <NA>  <NA>  7   
英文:

My answer works, but I can't help but think there has to be a better way.

I have added several comments to explain what does what and why. If you have any questions, let me know.

library(tidyverse)

input %&gt;% 
  separate_wider_regex(Scenario,                                # separate up and down
                       patterns = c(Up = &quot;Up|Down|&quot;, Scenario = &quot;.*&quot;),
                       too_few = &quot;error&quot;) %&gt;% 
  mutate(Scenario = trimws(Scenario, which = &quot;left&quot;, whitespace = &quot;_&quot;), # drop leading _
         Name = trimws(Name, &quot;both&quot;),       # drop whitespace causing mismatch
         Up = ifelse(Up == &quot;&quot;, NA, Up)) %&gt;% # change blank strings to NA for pivot
  pivot_wider(values_from = Value, names_from = Up)
# # A tibble: 4 &#215; 6
#   Name  Type    Scenario  Down  Up    `NA` 
#   &lt;chr&gt; &lt;chr&gt;   &lt;chr&gt;     &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
# 1 Alice Sales   &quot;&quot;        -5    6     &lt;NA&gt; 
# 2 Tim   Returns &quot;RED_One&quot; 5     -2    &lt;NA&gt; 
# 3 Greg  Promo   &quot;BLUE&quot;    &lt;NA&gt;  &lt;NA&gt;  5    
# 4 Greg  Promo   &quot;YELLOW&quot;  &lt;NA&gt;  &lt;NA&gt;  7   

huangapple
  • 本文由 发表于 2023年3月7日 08:52:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75657139.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定