英文:
Mutate value of a range of columns if columns name meets another column value
问题
我有一个wide
df
,其中columns
表示许多给定years
的months
和每个month
中颜色的变化:
df <- data.frame(id = as.integer(c(123,124,125,126)),
no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
`Jan.2010` = as.character(c("green", "black", "pink", "grey")),
`Feb.2010` = as.character(c("green", "black", "pink", "grey")),
`Mar.2010` = as.character(c("green", "red", "pink", "grey")),
`Apr.2010` = as.character(c("green", "red", "pink", "grey")),
`May.2010` = as.character(c("green", "red", "pink", "grey")),
`Jun.2010` = as.character(c("green", "red", "pink", "grey")),
`Jul.2010` = as.character(c("green", "white", "pink", "grey")),
`Ago.2010` = as.character(c("red", "white", "pink", "grey")),
`Sep.2010` = as.character(c("red", "white", "pink", "grey")),
`Oct.2010` = as.character(c("red", "white", "pink", "grey")),
`Nov.2010` = as.character(c("red", "white", "pink", "grey")),
`Dez.2010` = as.character(c("red", "white", "grey", "blue"))
)
df
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123 May.2010 green green green green green green green red red red red red
2 124 <NA> black black red red red red white white white white white white
3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey grey grey grey blue
我想要对每个column
应用NA
,如果包含的month
等于或在column
'no_change' 中指定的month
及以上。这是期望的output
:
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
2 124 <NA> black black red red red red white white white white white white
3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
英文:
I have a wide
df
with columns
representing the months
of many given years
and the changes of colour in each month
:
df <- data.frame(id = as.integer(c(123,124,125,126)),
no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
`Jan.2010` = as.character(c("green", "black", "pink", "grey")),
`Feb.2010` = as.character(c("green", "black", "pink", "grey")),
`Mar.2010` = as.character(c("green", "red", "pink", "grey")),
`Apr.2010` = as.character(c("green", "red", "pink", "grey")),
`May.2010` = as.character(c("green", "red", "pink", "grey")),
`Jun.2010` = as.character(c("green", "red", "pink", "grey")),
`Jul.2010` = as.character(c("green", "white", "pink", "grey")),
`Ago.2010` = as.character(c("red", "white", "pink", "grey")),
`Sep.2010` = as.character(c("red", "white", "pink", "grey")),
`Oct.2010` = as.character(c("red", "white", "pink", "grey")),
`Nov.2010` = as.character(c("red", "white", "pink", "grey")),
`Dez.2010` = as.character(c("red", "white", "grey", "blue"))
)
df
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123 May.2010 green green green green green green green red red red red red
2 124 <NA> black black red red red red white white white white white white
3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey grey grey grey blue
I want to apply NA
to each column
that contains a month
equal to and above that specified in column
'no_change'. This is the desired output
:
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
2 124 <NA> black black red red red red white white white white white white
3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
答案1
得分: 2
这是使用dplyr::across()
和自定义函数的一种方法:
library(dplyr)
set_na <- function(x, dat) {
col_nm <- cur_column()
col_dat <- lubridate::dmy(paste0("01.", col_nm))
if_else(col_dat <= dat | is.na(dat), x, NA)
}
df |>
mutate(
no_change_dat = lubridate::dmy(paste0("01.", no_change)),
across(-c(id, no_change, no_change_dat), \(x) set_na(x, no_change_dat))
)
数据来自OP:
df <- data.frame(id = as.integer(c(123,124,125,126)),
no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
`Jan.2010` = as.character(c("green", "black", "pink", "grey")),
`Feb.2010` = as.character(c("green", "black", "pink", "grey")),
`Mar.2010` = as.character(c("green", "red", "pink", "grey")),
`Apr.2010` = as.character(c("green", "red", "pink", "grey")),
`May.2010` = as.character(c("green", "red", "pink", "grey")),
`Jun.2010` = as.character(c("green", "red", "pink", "grey")),
`Jul.2010` = as.character(c("green", "white", "pink", "grey")),
`Ago.2010` = as.character(c("red", "white", "pink", "grey")),
`Sep.2010` = as.character(c("red", "white", "pink", "grey")),
`Oct.2010` = as.character(c("red", "white", "pink", "grey")),
`Nov.2010` = as.character(c("red", "white", "pink", "grey")),
`Dez.2010` = as.character(c("red", "white", "grey", "blue"))
)
在2023-06-19使用reprex v2.0.2创建
英文:
Here is one way to do it using dplyr::across()
and a custom function:
library(dplyr)
set_na <- function(x, dat) {
col_nm <- cur_column()
col_dat <- lubridate::dmy(paste0("01.", col_nm))
if_else(col_dat <= dat | is.na(dat), x, NA)
}
df |>
mutate(
no_change_dat = lubridate::dmy(paste0("01.", no_change)),
across(-c(id, no_change, no_change_dat), \(x) set_na(x, no_change_dat))
)
#> Warning: There were 2 warnings in `mutate()`.
#> The first warning was:
#> ℹ In argument: `no_change_dat = lubridate::dmy(paste0("01.", no_change))`.
#> Caused by warning:
#> ! 2 failed to parse.
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#> id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010
#> 1 123 May.2010 green green green green green <NA> <NA>
#> 2 124 <NA> black black red red red red white
#> 3 125 <NA> pink pink pink pink pink pink pink
#> 4 126 Sep.2010 grey grey grey grey grey grey grey
#> Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010 no_change_dat
#> 1 <NA> <NA> <NA> <NA> <NA> 2010-05-01
#> 2 white white white white white <NA>
#> 3 pink pink pink pink grey <NA>
#> 4 <NA> grey <NA> <NA> <NA> 2010-09-01
Data from OP
df <- data.frame(id = as.integer(c(123,124,125,126)),
no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
`Jan.2010` = as.character(c("green", "black", "pink", "grey")),
`Feb.2010` = as.character(c("green", "black", "pink", "grey")),
`Mar.2010` = as.character(c("green", "red", "pink", "grey")),
`Apr.2010` = as.character(c("green", "red", "pink", "grey")),
`May.2010` = as.character(c("green", "red", "pink", "grey")),
`Jun.2010` = as.character(c("green", "red", "pink", "grey")),
`Jul.2010` = as.character(c("green", "white", "pink", "grey")),
`Ago.2010` = as.character(c("red", "white", "pink", "grey")),
`Sep.2010` = as.character(c("red", "white", "pink", "grey")),
`Oct.2010` = as.character(c("red", "white", "pink", "grey")),
`Nov.2010` = as.character(c("red", "white", "pink", "grey")),
`Dez.2010` = as.character(c("red", "white", "grey", "blue"))
)
<sup>Created on 2023-06-19 with reprex v2.0.2</sup>
答案2
得分: 2
你可以将格式“枢轴”为“长”格式,并找出哪些行应该变成“NA”。
library(tidyverse)
df %>%
pivot_longer(ends_with("2010")) %>%
group_by(id) %>%
mutate(value = ifelse(cumsum(name == no_change & !is.na(no_change)), NA, value)) %>%
pivot_wider() %>%
ungroup()
# A tibble: 4 × 14
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
2 124 NA black black red red red red white white white white white white
3 125 NA pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
英文:
You can pivot
the format into a "long" format, and find out which rows should be turned into NA
.
library(tidyverse)
df %>%
pivot_longer(ends_with("2010")) %>%
group_by(id) %>%
mutate(value = ifelse(cumsum(name == no_change & !is.na(no_change)), NA, value)) %>%
pivot_wider() %>%
ungroup()
# A tibble: 4 × 14
id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
<int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
2 124 NA black black red red red red white white white white white white
3 125 NA pink pink pink pink pink pink pink pink pink pink pink grey
4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论