如何根据两个条件的最小/最大过滤器来使用 case_when?

huangapple go评论96阅读模式
英文:

How to use case_when based on two conditional min/max filters?

问题

以下是您要翻译的内容:

  1. # 创建一个新列,其中给定了“Measure”中最新月份(2019-12-01)中的最小值和最大值以及所有其他行都给定值“Other”。请注意,我的实际脚本是自动化的,因此无法手动指定最新月份或最小/最大的“Measure”值。
  2. 直观地说,我考虑这样做:
  3. test %>% mutate(
  4. for_label = case_when(
  5. Month == max(Month) & Measure == min(Measure) ~ Area,
  6. Month == max(Month) & Measure == max(Measure) ~ Area,
  7. TRUE ~ "Other"
  8. ))
  9. 但这只会返回一个每个值都是“Other”的列。我认为匹配过滤器正在寻找全局“Measure”值的最小值和最大值,而不是在选定的“max”月份内寻找。不确定最佳解决方案。
  10. # 示例数据:
  11. test <- structure(list(Area = c("Doncaster", "Hull", "Southampton", "Doncaster",
  12. "Hull", "Southampton", "Doncaster", "Hull", "Southampton", "Doncaster",
  13. "Hull", "Southampton"), Month = structure(c(18140, 18140, 18140,
  14. 18170, 18170, 18170, 18201, 18201, 18201, 18231, 18231, 18231
  15. ), class = "Date"), Measure = c(22.1, 15.5, 28.2, 19.3, 17, 26.9,
  16. 19.1, 18.2, 26.6, 19.5, 19.9, 26.8)), row.names = c(NA, -12L), class = c("tbl_df",
  17. "tbl", "data.frame"))
英文:

Sample data:

  1. # A tibble: 12 x 3
  2. Area Month Measure
  3. &lt;chr&gt; &lt;date&gt; &lt;dbl&gt;
  4. 1 Doncaster 2019-09-01 22.1
  5. 2 Hull 2019-09-01 15.5
  6. 3 Southampton 2019-09-01 28.2
  7. 4 Doncaster 2019-10-01 19.3
  8. 5 Hull 2019-10-01 17
  9. 6 Southampton 2019-10-01 26.9
  10. 7 Doncaster 2019-11-01 19.1
  11. 8 Hull 2019-11-01 18.2
  12. 9 Southampton 2019-11-01 26.6
  13. 10 Doncaster 2019-12-01 19.5
  14. 11 Hull 2019-12-01 19.9
  15. 12 Southampton 2019-12-01 26.8

I want to mutate a new column where the minimum and maximum value in Measure for the latest month (2019-12-01) is given the value in Area, and all other rows are given the value &quot;Other&quot;. Note my real script is automated so I can't reply on manually specifying the latest month or min/max Measure values.

Intuitively, I thought about doing something like:

  1. test %&gt;% mutate(
  2. for_label = case_when(
  3. Month == max(Month) &amp; Measure == min(Measure) ~ Area,
  4. Month == max(Month) &amp; Measure == max(Measure) ~ Area,
  5. TRUE ~ &quot;Other&quot;
  6. ))

But this just returns a column where every value is &quot;Other&quot;. I'm assuming that matching filters are looking for a minimum and maximum global Measure value, and not within the selected "max" Month. Not sure the best solution for this.

Sample data:

  1. test &lt;- structure(list(Area = c(&quot;Doncaster&quot;, &quot;Hull&quot;, &quot;Southampton&quot;, &quot;Doncaster&quot;,
  2. &quot;Hull&quot;, &quot;Southampton&quot;, &quot;Doncaster&quot;, &quot;Hull&quot;, &quot;Southampton&quot;, &quot;Doncaster&quot;,
  3. &quot;Hull&quot;, &quot;Southampton&quot;), Month = structure(c(18140, 18140, 18140,
  4. 18170, 18170, 18170, 18201, 18201, 18201, 18231, 18231, 18231
  5. ), class = &quot;Date&quot;), Measure = c(22.1, 15.5, 28.2, 19.3, 17, 26.9,
  6. 19.1, 18.2, 26.6, 19.5, 19.9, 26.8)), row.names = c(NA, -12L), class = c(&quot;tbl_df&quot;,
  7. &quot;tbl&quot;, &quot;data.frame&quot;))

答案1

得分: 2

如果我理解您的问题正确,您需要检查数据子集中的minmax值,而不是整个数据框。

  1. library(dplyr)
  2. test %>%
  3. mutate(for_label = case_when(
  4. Month == max(Month) & Measure == min(Measure[Month == max(Month)]) ~ Area,
  5. Month == max(Month) & Measure == max(Measure[Month == max(Month)]) ~ Area,
  6. TRUE ~ "Other"))

一个数据框:12 x 4

Area Month Measure for_label

1 Doncaster 2019-09-01 22.1 Other

2 Hull 2019-09-01 15.5 Other

3 Southampton 2019-09-01 28.2 Other

4 Doncaster 2019-10-01 19.3 Other

5 Hull 2019-10-01 17 Other

6 Southampton 2019-10-01 26.9 Other

7 Doncaster 2019-11-01 19.1 Other

8 Hull 2019-11-01 18.2 Other

9 Southampton 2019-11-01 26.6 Other

#10 Doncaster 2019-12-01 19.5 Doncaster
#11 Hull 2019-12-01 19.9 Other
#12 Southampton 2019-12-01 26.8 Southampton

  1. <details>
  2. <summary>英文:</summary>
  3. If I understand you correctly, you need to check the `min` and `max` value withing subset of data and not the entire dataframe.
  4. library(dplyr)
  5. test %&gt;%
  6. mutate(for_label = case_when(
  7. Month == max(Month) &amp; Measure == min(Measure[Month == max(Month)]) ~ Area,
  8. Month == max(Month) &amp; Measure == max(Measure[Month == max(Month)]) ~ Area,
  9. TRUE ~ &quot;Other&quot;))
  10. # A tibble: 12 x 4
  11. # Area Month Measure for_label
  12. # &lt;chr&gt; &lt;date&gt; &lt;dbl&gt; &lt;chr&gt;
  13. # 1 Doncaster 2019-09-01 22.1 Other
  14. # 2 Hull 2019-09-01 15.5 Other
  15. # 3 Southampton 2019-09-01 28.2 Other
  16. # 4 Doncaster 2019-10-01 19.3 Other
  17. # 5 Hull 2019-10-01 17 Other
  18. # 6 Southampton 2019-10-01 26.9 Other
  19. # 7 Doncaster 2019-11-01 19.1 Other
  20. # 8 Hull 2019-11-01 18.2 Other
  21. # 9 Southampton 2019-11-01 26.6 Other
  22. #10 Doncaster 2019-12-01 19.5 Doncaster
  23. #11 Hull 2019-12-01 19.9 Other
  24. #12 Southampton 2019-12-01 26.8 Southampton
  25. </details>
  26. # 答案2
  27. **得分**: 1
  28. `case_when`语句也可以通过检查`Measure`是否是`range(Measure[Month == last(Month)])`的元素而折叠为单个`if_else`语句:
  29. ```r
  30. library(dplyr)
  31. test %>%
  32. mutate(for_label = if_else(Month == last(Month) & Measure %in% range(Measure[Month == last(Month)]), Area, "Other"))

一个tibble: 12 x 4

Area Month Measure for_label

1 Doncaster 2019-09-01 22.1 Other

2 Hull 2019-09-01 15.5 Other

3 Southampton 2019-09-01 28.2 Other

4 Doncaster 2019-10-01 19.3 Other

5 Hull 2019-10-01 17 Other

6 Southampton 2019-10-01 26.9 Other

7 Doncaster 2019-11-01 19.1 Other

8 Hull 2019-11-01 18.2 Other

9 Southampton 2019-11-01 26.6 Other

10 Doncaster 2019-12-01 19.5 Doncaster

11 Hull 2019-12-01 19.9 Other

12 Southampton 2019-12-01 26.8 Southampton

  1. <details>
  2. <summary>英文:</summary>
  3. The `case_when` statement could also be collapsed to a single `if_else` by checking if `Measure` is an element of `range(Measure[Month == last(Month)])`:
  4. ``` r
  5. library(dplyr)
  6. test %&gt;%
  7. mutate(for_label = if_else(Month == last(Month) &amp; Measure %in% range(Measure[Month == last(Month)]), Area, &quot;Other&quot;))
  8. #&gt; # A tibble: 12 x 4
  9. #&gt; Area Month Measure for_label
  10. #&gt; &lt;chr&gt; &lt;date&gt; &lt;dbl&gt; &lt;chr&gt;
  11. #&gt; 1 Doncaster 2019-09-01 22.1 Other
  12. #&gt; 2 Hull 2019-09-01 15.5 Other
  13. #&gt; 3 Southampton 2019-09-01 28.2 Other
  14. #&gt; 4 Doncaster 2019-10-01 19.3 Other
  15. #&gt; 5 Hull 2019-10-01 17 Other
  16. #&gt; 6 Southampton 2019-10-01 26.9 Other
  17. #&gt; 7 Doncaster 2019-11-01 19.1 Other
  18. #&gt; 8 Hull 2019-11-01 18.2 Other
  19. #&gt; 9 Southampton 2019-11-01 26.6 Other
  20. #&gt; 10 Doncaster 2019-12-01 19.5 Doncaster
  21. #&gt; 11 Hull 2019-12-01 19.9 Other
  22. #&gt; 12 Southampton 2019-12-01 26.8 Southampton

huangapple
  • 本文由 发表于 2020年1月6日 20:30:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/59612171.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定