英文:
How to use case_when based on two conditional min/max filters?
问题
以下是您要翻译的内容:
# 创建一个新列,其中给定了“Measure”中最新月份(2019-12-01)中的最小值和最大值以及所有其他行都给定值“Other”。请注意,我的实际脚本是自动化的,因此无法手动指定最新月份或最小/最大的“Measure”值。
直观地说,我考虑这样做:
test %>% mutate(
for_label = case_when(
Month == max(Month) & Measure == min(Measure) ~ Area,
Month == max(Month) & Measure == max(Measure) ~ Area,
TRUE ~ "Other"
))
但这只会返回一个每个值都是“Other”的列。我认为匹配过滤器正在寻找全局“Measure”值的最小值和最大值,而不是在选定的“max”月份内寻找。不确定最佳解决方案。
# 示例数据:
test <- structure(list(Area = c("Doncaster", "Hull", "Southampton", "Doncaster",
"Hull", "Southampton", "Doncaster", "Hull", "Southampton", "Doncaster",
"Hull", "Southampton"), Month = structure(c(18140, 18140, 18140,
18170, 18170, 18170, 18201, 18201, 18201, 18231, 18231, 18231
), class = "Date"), Measure = c(22.1, 15.5, 28.2, 19.3, 17, 26.9,
19.1, 18.2, 26.6, 19.5, 19.9, 26.8)), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
英文:
Sample data:
# A tibble: 12 x 3
Area Month Measure
<chr> <date> <dbl>
1 Doncaster 2019-09-01 22.1
2 Hull 2019-09-01 15.5
3 Southampton 2019-09-01 28.2
4 Doncaster 2019-10-01 19.3
5 Hull 2019-10-01 17
6 Southampton 2019-10-01 26.9
7 Doncaster 2019-11-01 19.1
8 Hull 2019-11-01 18.2
9 Southampton 2019-11-01 26.6
10 Doncaster 2019-12-01 19.5
11 Hull 2019-12-01 19.9
12 Southampton 2019-12-01 26.8
I want to mutate a new column where the minimum and maximum value in Measure
for the latest month (2019-12-01) is given the value in Area
, and all other rows are given the value "Other"
. Note my real script is automated so I can't reply on manually specifying the latest month or min/max Measure
values.
Intuitively, I thought about doing something like:
test %>% mutate(
for_label = case_when(
Month == max(Month) & Measure == min(Measure) ~ Area,
Month == max(Month) & Measure == max(Measure) ~ Area,
TRUE ~ "Other"
))
But this just returns a column where every value is "Other"
. I'm assuming that matching filters are looking for a minimum and maximum global Measure
value, and not within the selected "max" Month
. Not sure the best solution for this.
Sample data:
test <- structure(list(Area = c("Doncaster", "Hull", "Southampton", "Doncaster",
"Hull", "Southampton", "Doncaster", "Hull", "Southampton", "Doncaster",
"Hull", "Southampton"), Month = structure(c(18140, 18140, 18140,
18170, 18170, 18170, 18201, 18201, 18201, 18231, 18231, 18231
), class = "Date"), Measure = c(22.1, 15.5, 28.2, 19.3, 17, 26.9,
19.1, 18.2, 26.6, 19.5, 19.9, 26.8)), row.names = c(NA, -12L), class = c("tbl_df",
"tbl", "data.frame"))
答案1
得分: 2
如果我理解您的问题正确,您需要检查数据子集中的min
和max
值,而不是整个数据框。
library(dplyr)
test %>%
mutate(for_label = case_when(
Month == max(Month) & Measure == min(Measure[Month == max(Month)]) ~ Area,
Month == max(Month) & Measure == max(Measure[Month == max(Month)]) ~ Area,
TRUE ~ "Other"))
一个数据框:12 x 4
Area Month Measure for_label
1 Doncaster 2019-09-01 22.1 Other
2 Hull 2019-09-01 15.5 Other
3 Southampton 2019-09-01 28.2 Other
4 Doncaster 2019-10-01 19.3 Other
5 Hull 2019-10-01 17 Other
6 Southampton 2019-10-01 26.9 Other
7 Doncaster 2019-11-01 19.1 Other
8 Hull 2019-11-01 18.2 Other
9 Southampton 2019-11-01 26.6 Other
#10 Doncaster 2019-12-01 19.5 Doncaster
#11 Hull 2019-12-01 19.9 Other
#12 Southampton 2019-12-01 26.8 Southampton
<details>
<summary>英文:</summary>
If I understand you correctly, you need to check the `min` and `max` value withing subset of data and not the entire dataframe.
library(dplyr)
test %>%
mutate(for_label = case_when(
Month == max(Month) & Measure == min(Measure[Month == max(Month)]) ~ Area,
Month == max(Month) & Measure == max(Measure[Month == max(Month)]) ~ Area,
TRUE ~ "Other"))
# A tibble: 12 x 4
# Area Month Measure for_label
# <chr> <date> <dbl> <chr>
# 1 Doncaster 2019-09-01 22.1 Other
# 2 Hull 2019-09-01 15.5 Other
# 3 Southampton 2019-09-01 28.2 Other
# 4 Doncaster 2019-10-01 19.3 Other
# 5 Hull 2019-10-01 17 Other
# 6 Southampton 2019-10-01 26.9 Other
# 7 Doncaster 2019-11-01 19.1 Other
# 8 Hull 2019-11-01 18.2 Other
# 9 Southampton 2019-11-01 26.6 Other
#10 Doncaster 2019-12-01 19.5 Doncaster
#11 Hull 2019-12-01 19.9 Other
#12 Southampton 2019-12-01 26.8 Southampton
</details>
# 答案2
**得分**: 1
`case_when`语句也可以通过检查`Measure`是否是`range(Measure[Month == last(Month)])`的元素而折叠为单个`if_else`语句:
```r
library(dplyr)
test %>%
mutate(for_label = if_else(Month == last(Month) & Measure %in% range(Measure[Month == last(Month)]), Area, "Other"))
一个tibble: 12 x 4
Area Month Measure for_label
1 Doncaster 2019-09-01 22.1 Other
2 Hull 2019-09-01 15.5 Other
3 Southampton 2019-09-01 28.2 Other
4 Doncaster 2019-10-01 19.3 Other
5 Hull 2019-10-01 17 Other
6 Southampton 2019-10-01 26.9 Other
7 Doncaster 2019-11-01 19.1 Other
8 Hull 2019-11-01 18.2 Other
9 Southampton 2019-11-01 26.6 Other
10 Doncaster 2019-12-01 19.5 Doncaster
11 Hull 2019-12-01 19.9 Other
12 Southampton 2019-12-01 26.8 Southampton
<details>
<summary>英文:</summary>
The `case_when` statement could also be collapsed to a single `if_else` by checking if `Measure` is an element of `range(Measure[Month == last(Month)])`:
``` r
library(dplyr)
test %>%
mutate(for_label = if_else(Month == last(Month) & Measure %in% range(Measure[Month == last(Month)]), Area, "Other"))
#> # A tibble: 12 x 4
#> Area Month Measure for_label
#> <chr> <date> <dbl> <chr>
#> 1 Doncaster 2019-09-01 22.1 Other
#> 2 Hull 2019-09-01 15.5 Other
#> 3 Southampton 2019-09-01 28.2 Other
#> 4 Doncaster 2019-10-01 19.3 Other
#> 5 Hull 2019-10-01 17 Other
#> 6 Southampton 2019-10-01 26.9 Other
#> 7 Doncaster 2019-11-01 19.1 Other
#> 8 Hull 2019-11-01 18.2 Other
#> 9 Southampton 2019-11-01 26.6 Other
#> 10 Doncaster 2019-12-01 19.5 Doncaster
#> 11 Hull 2019-12-01 19.9 Other
#> 12 Southampton 2019-12-01 26.8 Southampton
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论