更改一系列列的值,如果列名符合另一列的值。

huangapple go评论91阅读模式
英文:

Mutate value of a range of columns if columns name meets another column value

问题

我有一个wide df,其中columns表示许多给定yearsmonths和每个month中颜色的变化:

  1. df <- data.frame(id = as.integer(c(123,124,125,126)),
  2. no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
  3. `Jan.2010` = as.character(c("green", "black", "pink", "grey")),
  4. `Feb.2010` = as.character(c("green", "black", "pink", "grey")),
  5. `Mar.2010` = as.character(c("green", "red", "pink", "grey")),
  6. `Apr.2010` = as.character(c("green", "red", "pink", "grey")),
  7. `May.2010` = as.character(c("green", "red", "pink", "grey")),
  8. `Jun.2010` = as.character(c("green", "red", "pink", "grey")),
  9. `Jul.2010` = as.character(c("green", "white", "pink", "grey")),
  10. `Ago.2010` = as.character(c("red", "white", "pink", "grey")),
  11. `Sep.2010` = as.character(c("red", "white", "pink", "grey")),
  12. `Oct.2010` = as.character(c("red", "white", "pink", "grey")),
  13. `Nov.2010` = as.character(c("red", "white", "pink", "grey")),
  14. `Dez.2010` = as.character(c("red", "white", "grey", "blue"))
  15. )
  16. df
  17. id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  18. 1 123 May.2010 green green green green green green green red red red red red
  19. 2 124 <NA> black black red red red red white white white white white white
  20. 3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
  21. 4 126 Sep.2010 grey grey grey grey grey grey grey grey grey grey grey blue

我想要对每个column应用NA,如果包含的month等于或在column 'no_change' 中指定的month及以上。这是期望的output

  1. id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  2. 1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
  3. 2 124 <NA> black black red red red red white white white white white white
  4. 3 125 <NA> pink pink pink pink pink pink pink pink pink pink pink grey
  5. 4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
英文:

I have a wide df with columns representing the months of many given years and the changes of colour in each month:

  1. df &lt;- data.frame(id = as.integer(c(123,124,125,126)),
  2. no_change = as.character(c(&quot;May.2010&quot;, NA, NA, &quot;Sep.2010&quot;)),
  3. `Jan.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  4. `Feb.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  5. `Mar.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  6. `Apr.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  7. `May.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  8. `Jun.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  9. `Jul.2010` = as.character(c(&quot;green&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  10. `Ago.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  11. `Sep.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  12. `Oct.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  13. `Nov.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  14. `Dez.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;grey&quot;, &quot;blue&quot;))
  15. )
  16. df
  17. id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  18. 1 123 May.2010 green green green green green green green red red red red red
  19. 2 124 &lt;NA&gt; black black red red red red white white white white white white
  20. 3 125 &lt;NA&gt; pink pink pink pink pink pink pink pink pink pink pink grey
  21. 4 126 Sep.2010 grey grey grey grey grey grey grey grey grey grey grey blue

I want to apply NA to each column that contains a month equal to and above that specified in column 'no_change'. This is the desired output:

  1. id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  2. 1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
  3. 2 124 &lt;NA&gt; black black red red red red white white white white white white
  4. 3 125 &lt;NA&gt; pink pink pink pink pink pink pink pink pink pink pink grey
  5. 4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA

答案1

得分: 2

这是使用dplyr::across()和自定义函数的一种方法:

  1. library(dplyr)
  2. set_na <- function(x, dat) {
  3. col_nm <- cur_column()
  4. col_dat <- lubridate::dmy(paste0("01.", col_nm))
  5. if_else(col_dat <= dat | is.na(dat), x, NA)
  6. }
  7. df |&gt;
  8. mutate(
  9. no_change_dat = lubridate::dmy(paste0("01.", no_change)),
  10. across(-c(id, no_change, no_change_dat), \(x) set_na(x, no_change_dat))
  11. )

数据来自OP:

  1. df <- data.frame(id = as.integer(c(123,124,125,126)),
  2. no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
  3. `Jan.2010` = as.character(c("green", "black", "pink", "grey")),
  4. `Feb.2010` = as.character(c("green", "black", "pink", "grey")),
  5. `Mar.2010` = as.character(c("green", "red", "pink", "grey")),
  6. `Apr.2010` = as.character(c("green", "red", "pink", "grey")),
  7. `May.2010` = as.character(c("green", "red", "pink", "grey")),
  8. `Jun.2010` = as.character(c("green", "red", "pink", "grey")),
  9. `Jul.2010` = as.character(c("green", "white", "pink", "grey")),
  10. `Ago.2010` = as.character(c("red", "white", "pink", "grey")),
  11. `Sep.2010` = as.character(c("red", "white", "pink", "grey")),
  12. `Oct.2010` = as.character(c("red", "white", "pink", "grey")),
  13. `Nov.2010` = as.character(c("red", "white", "pink", "grey")),
  14. `Dez.2010` = as.character(c("red", "white", "grey", "blue"))
  15. )

在2023-06-19使用reprex v2.0.2创建

英文:

Here is one way to do it using dplyr::across() and a custom function:

  1. library(dplyr)
  2. set_na &lt;- function(x, dat) {
  3. col_nm &lt;- cur_column()
  4. col_dat &lt;- lubridate::dmy(paste0(&quot;01.&quot;, col_nm))
  5. if_else(col_dat &lt;= dat | is.na(dat), x, NA)
  6. }
  7. df |&gt;
  8. mutate(
  9. no_change_dat = lubridate::dmy(paste0(&quot;01.&quot;, no_change)),
  10. across(-c(id, no_change, no_change_dat), \(x) set_na(x, no_change_dat))
  11. )
  12. #&gt; Warning: There were 2 warnings in `mutate()`.
  13. #&gt; The first warning was:
  14. #&gt; ℹ In argument: `no_change_dat = lubridate::dmy(paste0(&quot;01.&quot;, no_change))`.
  15. #&gt; Caused by warning:
  16. #&gt; ! 2 failed to parse.
  17. #&gt; ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
  18. #&gt; id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010
  19. #&gt; 1 123 May.2010 green green green green green &lt;NA&gt; &lt;NA&gt;
  20. #&gt; 2 124 &lt;NA&gt; black black red red red red white
  21. #&gt; 3 125 &lt;NA&gt; pink pink pink pink pink pink pink
  22. #&gt; 4 126 Sep.2010 grey grey grey grey grey grey grey
  23. #&gt; Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010 no_change_dat
  24. #&gt; 1 &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; 2010-05-01
  25. #&gt; 2 white white white white white &lt;NA&gt;
  26. #&gt; 3 pink pink pink pink grey &lt;NA&gt;
  27. #&gt; 4 &lt;NA&gt; grey &lt;NA&gt; &lt;NA&gt; &lt;NA&gt; 2010-09-01

Data from OP

  1. df &lt;- data.frame(id = as.integer(c(123,124,125,126)),
  2. no_change = as.character(c(&quot;May.2010&quot;, NA, NA, &quot;Sep.2010&quot;)),
  3. `Jan.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  4. `Feb.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  5. `Mar.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  6. `Apr.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  7. `May.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  8. `Jun.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  9. `Jul.2010` = as.character(c(&quot;green&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  10. `Ago.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  11. `Sep.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  12. `Oct.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  13. `Nov.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
  14. `Dez.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;grey&quot;, &quot;blue&quot;))
  15. )

<sup>Created on 2023-06-19 with reprex v2.0.2</sup>

答案2

得分: 2

你可以将格式“枢轴”为“长”格式,并找出哪些行应该变成“NA”。

  1. library(tidyverse)
  2. df %>%
  3. pivot_longer(ends_with("2010")) %>%
  4. group_by(id) %>%
  5. mutate(value = ifelse(cumsum(name == no_change & !is.na(no_change)), NA, value)) %>%
  6. pivot_wider() %>%
  7. ungroup()
  8. # A tibble: 4 × 14
  9. id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  10. <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
  11. 1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
  12. 2 124 NA black black red red red red white white white white white white
  13. 3 125 NA pink pink pink pink pink pink pink pink pink pink pink grey
  14. 4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
英文:

You can pivot the format into a "long" format, and find out which rows should be turned into NA.

  1. library(tidyverse)
  2. df %&gt;%
  3. pivot_longer(ends_with(&quot;2010&quot;)) %&gt;%
  4. group_by(id) %&gt;%
  5. mutate(value = ifelse(cumsum(name == no_change &amp; !is.na(no_change)), NA, value)) %&gt;%
  6. pivot_wider() %&gt;%
  7. ungroup()
  8. # A tibble: 4 &#215; 14
  9. id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  10. &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  11. 1 123 May.2010 green green green green NA NA NA NA NA NA NA NA
  12. 2 124 NA black black red red red red white white white white white white
  13. 3 125 NA pink pink pink pink pink pink pink pink pink pink pink grey
  14. 4 126 Sep.2010 grey grey grey grey grey grey grey grey NA NA NA NA
  15. </details>

huangapple
  • 本文由 发表于 2023年6月19日 22:41:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76507694.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定