2020年1月6日 20:30:15go评论96阅读模式

英文:

How to use case_when based on two conditional min/max filters?

问题

以下是您要翻译的内容：

# 创建一个新列，其中给定了“Measure”中最新月份（2019-12-01）中的最小值和最大值以及所有其他行都给定值“Other”。请注意，我的实际脚本是自动化的，因此无法手动指定最新月份或最小/最大的“Measure”值。
直观地说，我考虑这样做：
test %>% mutate(
  for_label = case_when(
    Month == max(Month) & Measure == min(Measure) ~ Area,
    Month == max(Month) & Measure == max(Measure) ~ Area,
    TRUE ~ "Other"
  ))
  
但这只会返回一个每个值都是“Other”的列。我认为匹配过滤器正在寻找全局“Measure”值的最小值和最大值，而不是在选定的“max”月份内寻找。不确定最佳解决方案。
# 示例数据：
test <- structure(list(Area = c("Doncaster", "Hull", "Southampton", "Doncaster", 
"Hull", "Southampton", "Doncaster", "Hull", "Southampton", "Doncaster", 
"Hull", "Southampton"), Month = structure(c(18140, 18140, 18140, 
18170, 18170, 18170, 18201, 18201, 18201, 18231, 18231, 18231
), class = "Date"), Measure = c(22.1, 15.5, 28.2, 19.3, 17, 26.9, 
19.1, 18.2, 26.6, 19.5, 19.9, 26.8)), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"))

英文:

Sample data:

# A tibble: 12 x 3
   Area        Month      Measure
   &lt;chr&gt;       &lt;date&gt;       &lt;dbl&gt;
 1 Doncaster   2019-09-01    22.1
 2 Hull        2019-09-01    15.5
 3 Southampton 2019-09-01    28.2
 4 Doncaster   2019-10-01    19.3
 5 Hull        2019-10-01    17  
 6 Southampton 2019-10-01    26.9
 7 Doncaster   2019-11-01    19.1
 8 Hull        2019-11-01    18.2
 9 Southampton 2019-11-01    26.6
10 Doncaster   2019-12-01    19.5
11 Hull        2019-12-01    19.9
12 Southampton 2019-12-01    26.8

I want to mutate a new column where the minimum and maximum value in Measure for the latest month (2019-12-01) is given the value in Area, and all other rows are given the value "Other". Note my real script is automated so I can't reply on manually specifying the latest month or min/max Measure values.

Intuitively, I thought about doing something like:

test %&gt;% mutate(
  for_label = case_when(
    Month == max(Month) &amp; Measure == min(Measure) ~ Area,
    Month == max(Month) &amp; Measure == max(Measure) ~ Area,
    TRUE ~ &quot;Other&quot;
  ))

But this just returns a column where every value is "Other". I'm assuming that matching filters are looking for a minimum and maximum global Measure value, and not within the selected "max" Month. Not sure the best solution for this.

Sample data:

test &lt;- structure(list(Area = c(&quot;Doncaster&quot;, &quot;Hull&quot;, &quot;Southampton&quot;, &quot;Doncaster&quot;, 
&quot;Hull&quot;, &quot;Southampton&quot;, &quot;Doncaster&quot;, &quot;Hull&quot;, &quot;Southampton&quot;, &quot;Doncaster&quot;, 
&quot;Hull&quot;, &quot;Southampton&quot;), Month = structure(c(18140, 18140, 18140, 
18170, 18170, 18170, 18201, 18201, 18201, 18231, 18231, 18231
), class = &quot;Date&quot;), Measure = c(22.1, 15.5, 28.2, 19.3, 17, 26.9, 
19.1, 18.2, 26.6, 19.5, 19.9, 26.8)), row.names = c(NA, -12L), class = c(&quot;tbl_df&quot;, 
&quot;tbl&quot;, &quot;data.frame&quot;))

答案1

得分: 2

如果我理解您的问题正确，您需要检查数据子集中的min和max值，而不是整个数据框。

library(dplyr)
test %>%
  mutate(for_label = case_when(
         Month == max(Month) & Measure == min(Measure[Month == max(Month)]) ~ Area,
         Month == max(Month) & Measure == max(Measure[Month == max(Month)]) ~ Area,
         TRUE ~ "Other"))

一个数据框：12 x 4

Area Month Measure for_label

1 Doncaster 2019-09-01 22.1 Other

2 Hull 2019-09-01 15.5 Other

3 Southampton 2019-09-01 28.2 Other

4 Doncaster 2019-10-01 19.3 Other

5 Hull 2019-10-01 17 Other

6 Southampton 2019-10-01 26.9 Other

7 Doncaster 2019-11-01 19.1 Other

8 Hull 2019-11-01 18.2 Other

9 Southampton 2019-11-01 26.6 Other

#10 Doncaster 2019-12-01 19.5 Doncaster
#11 Hull 2019-12-01 19.9 Other
#12 Southampton 2019-12-01 26.8 Southampton


<details>
<summary>英文:</summary>
If I understand you correctly, you need to check the `min` and `max` value withing subset of data and not the entire dataframe. 
    library(dplyr)
    test %&gt;% 
      mutate(for_label = case_when(
             Month == max(Month) &amp; Measure == min(Measure[Month == max(Month)]) ~ Area,
             Month == max(Month) &amp; Measure == max(Measure[Month == max(Month)]) ~ Area,
             TRUE ~ &quot;Other&quot;))
  
    # A tibble: 12 x 4
    #   Area        Month      Measure for_label  
    #   &lt;chr&gt;       &lt;date&gt;       &lt;dbl&gt; &lt;chr&gt;      
    # 1 Doncaster   2019-09-01    22.1 Other      
    # 2 Hull        2019-09-01    15.5 Other      
    # 3 Southampton 2019-09-01    28.2 Other      
    # 4 Doncaster   2019-10-01    19.3 Other      
    # 5 Hull        2019-10-01    17   Other      
    # 6 Southampton 2019-10-01    26.9 Other      
    # 7 Doncaster   2019-11-01    19.1 Other      
    # 8 Hull        2019-11-01    18.2 Other      
    # 9 Southampton 2019-11-01    26.6 Other      
    #10 Doncaster   2019-12-01    19.5 Doncaster  
    #11 Hull        2019-12-01    19.9 Other      
    #12 Southampton 2019-12-01    26.8 Southampton
</details>
# 答案2
**得分**: 1
`case_when`语句也可以通过检查`Measure`是否是`range(Measure[Month == last(Month)])`的元素而折叠为单个`if_else`语句：
```r
library(dplyr)
test %>%
    mutate(for_label = if_else(Month == last(Month) & Measure %in% range(Measure[Month == last(Month)]), Area, "Other"))

一个tibble: 12 x 4

Area Month Measure for_label

1 Doncaster 2019-09-01 22.1 Other

2 Hull 2019-09-01 15.5 Other

3 Southampton 2019-09-01 28.2 Other

4 Doncaster 2019-10-01 19.3 Other

5 Hull 2019-10-01 17 Other

6 Southampton 2019-10-01 26.9 Other

7 Doncaster 2019-11-01 19.1 Other

8 Hull 2019-11-01 18.2 Other

9 Southampton 2019-11-01 26.6 Other

10 Doncaster 2019-12-01 19.5 Doncaster

11 Hull 2019-12-01 19.9 Other

12 Southampton 2019-12-01 26.8 Southampton


<details>
<summary>英文:</summary>
The `case_when` statement could also be collapsed to a single `if_else` by checking if `Measure` is an element of `range(Measure[Month == last(Month)])`:
``` r
library(dplyr)
test %&gt;%
    mutate(for_label = if_else(Month == last(Month) &amp; Measure %in% range(Measure[Month == last(Month)]), Area, &quot;Other&quot;))
#&gt; # A tibble: 12 x 4
#&gt;    Area        Month      Measure for_label  
#&gt;    &lt;chr&gt;       &lt;date&gt;       &lt;dbl&gt; &lt;chr&gt;      
#&gt;  1 Doncaster   2019-09-01    22.1 Other      
#&gt;  2 Hull        2019-09-01    15.5 Other      
#&gt;  3 Southampton 2019-09-01    28.2 Other      
#&gt;  4 Doncaster   2019-10-01    19.3 Other      
#&gt;  5 Hull        2019-10-01    17   Other      
#&gt;  6 Southampton 2019-10-01    26.9 Other      
#&gt;  7 Doncaster   2019-11-01    19.1 Other      
#&gt;  8 Hull        2019-11-01    18.2 Other      
#&gt;  9 Southampton 2019-11-01    26.6 Other      
#&gt; 10 Doncaster   2019-12-01    19.5 Doncaster  
#&gt; 11 Hull        2019-12-01    19.9 Other      
#&gt; 12 Southampton 2019-12-01    26.8 Southampton

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题

答案1

一个数据框：12 x 4

Area Month Measure for_label

1 Doncaster 2019-09-01 22.1 Other

2 Hull 2019-09-01 15.5 Other

3 Southampton 2019-09-01 28.2 Other

4 Doncaster 2019-10-01 19.3 Other

5 Hull 2019-10-01 17 Other

6 Southampton 2019-10-01 26.9 Other

7 Doncaster 2019-11-01 19.1 Other

8 Hull 2019-11-01 18.2 Other

9 Southampton 2019-11-01 26.6 Other

一个tibble: 12 x 4

Area Month Measure for_label

1 Doncaster 2019-09-01 22.1 Other

2 Hull 2019-09-01 15.5 Other

3 Southampton 2019-09-01 28.2 Other

4 Doncaster 2019-10-01 19.3 Other

5 Hull 2019-10-01 17 Other

6 Southampton 2019-10-01 26.9 Other

7 Doncaster 2019-11-01 19.1 Other

8 Hull 2019-11-01 18.2 Other

9 Southampton 2019-11-01 26.6 Other

10 Doncaster 2019-12-01 19.5 Doncaster

11 Hull 2019-12-01 19.9 Other

12 Southampton 2019-12-01 26.8 Southampton

发表评论