2023年6月19日 22:41:19go评论91阅读模式

英文:

Mutate value of a range of columns if columns name meets another column value

问题

我有一个wide df，其中columns表示许多给定years的months和每个month中颜色的变化：

df <- data.frame(id = as.integer(c(123,124,125,126)),
                 no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
                 `Jan.2010` = as.character(c("green", "black", "pink", "grey")),
                 `Feb.2010` = as.character(c("green", "black", "pink", "grey")),
                 `Mar.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Apr.2010` = as.character(c("green", "red", "pink", "grey")),
                 `May.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Jun.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Jul.2010` = as.character(c("green", "white", "pink", "grey")),
                 `Ago.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Sep.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Oct.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Nov.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Dez.2010` = as.character(c("red", "white", "grey", "blue"))
                 )
df
   id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123  May.2010    green    green    green    green    green    green    green      red      red      red      red      red
2 124      <NA>    black    black      red      red      red      red    white    white    white    white    white    white
3 125      <NA>     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey
4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey     grey     grey     grey     grey     blue

我想要对每个column应用NA，如果包含的month等于或在column 'no_change' 中指定的month及以上。这是期望的output：

   id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123  May.2010    green    green    green    green       NA       NA       NA       NA       NA       NA       NA       NA
2 124      <NA>    black    black      red      red      red      red    white    white    white    white    white    white
3 125      <NA>     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey
4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey     grey       NA       NA       NA       NA

英文:

I have a wide df with columns representing the months of many given years and the changes of colour in each month:

df &lt;- data.frame(id = as.integer(c(123,124,125,126)),
                 no_change = as.character(c(&quot;May.2010&quot;, NA, NA, &quot;Sep.2010&quot;)),
                 `Jan.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Feb.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Mar.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Apr.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `May.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Jun.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Jul.2010` = as.character(c(&quot;green&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Ago.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Sep.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Oct.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Nov.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Dez.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;grey&quot;, &quot;blue&quot;))
                 )
df     
   id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123  May.2010    green    green    green    green    green    green    green      red      red      red      red      red
2 124      &lt;NA&gt;    black    black      red      red      red      red    white    white    white    white    white    white
3 125      &lt;NA&gt;     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey
4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey     grey     grey     grey     grey     blue

I want to apply NA to each column that contains a month equal to and above that specified in column 'no_change'. This is the desired output:

   id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
1 123  May.2010    green    green    green    green       NA       NA       NA       NA       NA       NA       NA       NA
2 124      &lt;NA&gt;    black    black      red      red      red      red    white    white    white    white    white    white
3 125      &lt;NA&gt;     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey
4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey     grey       NA       NA       NA       NA

答案1

得分: 2

这是使用dplyr::across()和自定义函数的一种方法：

library(dplyr)
set_na <- function(x, dat) {
  
  col_nm <- cur_column()
  col_dat <- lubridate::dmy(paste0("01.", col_nm))
  
  if_else(col_dat <= dat | is.na(dat), x, NA)
  
}
df |&gt; 
  mutate(
    no_change_dat = lubridate::dmy(paste0("01.", no_change)),
    across(-c(id, no_change, no_change_dat), \(x) set_na(x, no_change_dat))
    )

数据来自OP：

df <- data.frame(id = as.integer(c(123,124,125,126)),
                 no_change = as.character(c("May.2010", NA, NA, "Sep.2010")),
                 `Jan.2010` = as.character(c("green", "black", "pink", "grey")),
                 `Feb.2010` = as.character(c("green", "black", "pink", "grey")),
                 `Mar.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Apr.2010` = as.character(c("green", "red", "pink", "grey")),
                 `May.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Jun.2010` = as.character(c("green", "red", "pink", "grey")),
                 `Jul.2010` = as.character(c("green", "white", "pink", "grey")),
                 `Ago.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Sep.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Oct.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Nov.2010` = as.character(c("red", "white", "pink", "grey")),
                 `Dez.2010` = as.character(c("red", "white", "grey", "blue"))
)

^{在2023-06-19使用reprex v2.0.2创建}

英文:

Here is one way to do it using dplyr::across() and a custom function:

library(dplyr)
set_na &lt;- function(x, dat) {
  
  col_nm &lt;- cur_column()
  col_dat &lt;- lubridate::dmy(paste0(&quot;01.&quot;, col_nm))
  
  if_else(col_dat &lt;= dat | is.na(dat), x, NA)
  
}
df |&gt; 
  mutate(
    no_change_dat = lubridate::dmy(paste0(&quot;01.&quot;, no_change)),
    across(-c(id, no_change, no_change_dat), \(x) set_na(x, no_change_dat))
    )
#&gt; Warning: There were 2 warnings in `mutate()`.
#&gt; The first warning was:
#&gt; ℹ In argument: `no_change_dat = lubridate::dmy(paste0(&quot;01.&quot;, no_change))`.
#&gt; Caused by warning:
#&gt; !  2 failed to parse.
#&gt; ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
#&gt;    id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010
#&gt; 1 123  May.2010    green    green    green    green    green     &lt;NA&gt;     &lt;NA&gt;
#&gt; 2 124      &lt;NA&gt;    black    black      red      red      red      red    white
#&gt; 3 125      &lt;NA&gt;     pink     pink     pink     pink     pink     pink     pink
#&gt; 4 126  Sep.2010     grey     grey     grey     grey     grey     grey     grey
#&gt;   Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010 no_change_dat
#&gt; 1     &lt;NA&gt;     &lt;NA&gt;     &lt;NA&gt;     &lt;NA&gt;     &lt;NA&gt;    2010-05-01
#&gt; 2    white    white    white    white    white          &lt;NA&gt;
#&gt; 3     pink     pink     pink     pink     grey          &lt;NA&gt;
#&gt; 4     &lt;NA&gt;     grey     &lt;NA&gt;     &lt;NA&gt;     &lt;NA&gt;    2010-09-01

Data from OP

df &lt;- data.frame(id = as.integer(c(123,124,125,126)),
                 no_change = as.character(c(&quot;May.2010&quot;, NA, NA, &quot;Sep.2010&quot;)),
                 `Jan.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Feb.2010` = as.character(c(&quot;green&quot;, &quot;black&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Mar.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Apr.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `May.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Jun.2010` = as.character(c(&quot;green&quot;, &quot;red&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Jul.2010` = as.character(c(&quot;green&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Ago.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Sep.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Oct.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Nov.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;pink&quot;, &quot;grey&quot;)),
                 `Dez.2010` = as.character(c(&quot;red&quot;, &quot;white&quot;, &quot;grey&quot;, &quot;blue&quot;))
)

<sup>Created on 2023-06-19 with reprex v2.0.2</sup>

答案2

得分: 2

你可以将格式“枢轴”为“长”格式，并找出哪些行应该变成“NA”。

library(tidyverse)
df %>%
  pivot_longer(ends_with("2010")) %>%
  group_by(id) %>%
  mutate(value = ifelse(cumsum(name == no_change & !is.na(no_change)), NA, value)) %>%
  pivot_wider() %>%
  ungroup()
# A tibble: 4 × 14
     id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  <int> <chr>     <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>    <chr>   
1   123 May.2010  green    green    green    green    NA       NA       NA       NA       NA       NA       NA       NA      
2   124 NA        black    black    red      red      red      red      white    white    white    white    white    white   
3   125 NA        pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey    
4   126 Sep.2010  grey     grey     grey     grey     grey     grey     grey     grey     NA       NA       NA       NA

英文:

You can pivot the format into a "long" format, and find out which rows should be turned into NA.

library(tidyverse)
df %&gt;% 
  pivot_longer(ends_with(&quot;2010&quot;)) %&gt;% 
  group_by(id) %&gt;% 
  mutate(value = ifelse(cumsum(name == no_change &amp; !is.na(no_change)), NA, value)) %&gt;% 
  pivot_wider() %&gt;% 
  ungroup()
# A tibble: 4 &#215; 14
     id no_change Jan.2010 Feb.2010 Mar.2010 Apr.2010 May.2010 Jun.2010 Jul.2010 Ago.2010 Sep.2010 Oct.2010 Nov.2010 Dez.2010
  &lt;int&gt; &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;   
1   123 May.2010  green    green    green    green    NA       NA       NA       NA       NA       NA       NA       NA      
2   124 NA        black    black    red      red      red      red      white    white    white    white    white    white   
3   125 NA        pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     pink     grey    
4   126 Sep.2010  grey     grey     grey     grey     grey     grey     grey     grey     NA       NA       NA       NA      
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

更改一系列列的值，如果列名符合另一列的值。

问题

答案1

答案2

ggplot2 在函数中无法识别列名

如何在使用paste()函数指定变量时测试变量的值是否为NA？

为什么在R中使用Likert包的100%堆叠条形图时，x轴上的条形被反转？

R: 使用love.plot()给变量名斜体化。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。