使用dplyr根据条件和列表分配列值。

huangapple go评论64阅读模式
英文:

Assign column values based on condition and list with dplyr

问题

我有一个包含日期列和其他数据的tibble。我想根据日期落在特定范围内来添加一个标签列。我可以使用mutate和嵌套条件来实现,例如:

data <- data %>%
    mutate(season = ifelse(date < "2020-03-01", "Winter 20",
                           ifelse(date < "2020-07-01", "Spring 20", 
                                  ifelse(date < "2020-11-01", "Fall 20", "Winter 21")))

但这种方法似乎有点不够优雅且不够灵活。理想情况下,我想能够指定一个命名列表,例如:

season_breaks <- c("Winter 20" = "2020-03-01", 
                   "Spring 20" = "2020-07-01", 
                   "Fall 20" = "2020-11-01", 
                   "Winter 21" = "2021-03-01")

然后使用这个单独指定的列表来修改tibble,添加新的season列。(我在各种其他地方使用相同的日期截止日期集合,这就是为什么将它作为一个单独的列表很有帮助;而且如果需要,以后在一个地方进行修改更容易。)

是否有办法创建这样一个新列?

英文:

I have a tibble with a column of dates along with other data. I would like to add a column with labels depending on dates falling within certain ranges. I could do this with mutate and nested conditionals, for instance:

data &lt;- data %&gt;%
    mutate(season = ifelse(date &lt; &quot;2020-03-01&quot;, &quot;Winter 20&quot;,
                           ifelse(date &lt; &quot;2020-07-01&quot;, &quot;Spring 20&quot;, 
                                  ifelse(date &lt; &quot;2020-11-01&quot;, &quot;Fall 20&quot;, &quot;Winter 21&quot;)

But this seems somewhat inelegant and also inflexible. Ideally, I would like to be able to specify a named list, e.g.

season_breaks &lt;- c(&quot;Winter 20&quot; = &quot;2020-03-01&quot;, 
                   &quot;Spring 20&quot; = &quot;2020-07-01&quot;, 
                   &quot;Fall 20&quot; = &quot;2020-11-01&quot;, 
                   &quot;Winter 21&quot; = &quot;2021-03-01&quot;)

And use this separately specified list to modify the tibble with the new season column. (I use this same set of date cutoffs in various other places, which is why it's helpful to have it as a separate list; also it's easier to modify in one place later if needed.)

Is there a way to create a new column like this?

答案1

得分: 2

以下是使用cut的解决方案:

season_breaks <- c("Winter 20" = as.Date("2020-03-01"), 
                   "Spring 20" = as.Date("2020-07-01"), 
                   "Fall 20" = as.Date("2020-11-01"), 
                   "Winter 21" = as.Date("2021-03-01"))

data %>%
  mutate(season = cut(date, 
                      breaks = c(as.Date("1970-01-01"), unname(season_breaks)), 
                      labels = names(season_breaks)))
        date       season   
1 2020-01-01 Winter 20
2 2020-04-01 Spring 20
3 2020-08-01 Fall 20  
4 2020-12-01 Winter 21
英文:

Here is solution using cut:

season_breaks &lt;- c(&quot;Winter 20&quot; = as.Date(&quot;2020-03-01&quot;), 
                   &quot;Spring 20&quot; = as.Date(&quot;2020-07-01&quot;), 
                   &quot;Fall 20&quot; = as.Date(&quot;2020-11-01&quot;), 
                   &quot;Winter 21&quot; = as.Date(&quot;2021-03-01&quot;))


data %&gt;% 
  mutate(season = cut(date, 
                      breaks = c(as.Date(&quot;1970-01-01&quot;), unname(season_breaks)), 
                      labels = names(season_breaks)))
  date       season   
  &lt;date&gt;     &lt;fct&gt;    
1 2020-01-01 Winter 20
2 2020-04-01 Spring 20
3 2020-08-01 Fall 20  
4 2020-12-01 Winter 21

答案2

得分: 1

使用 dplyr 的新 (>= v1.1.0) [rolling join](https://dplyr.tidyverse.org/reference/join_by.html) 功能:

library(dplyr)

制作季节分割表

season_breaks <- data.frame(
season = names(season_breaks),
end_date = as.Date(unname(season_breaks))
)

data %>%
left_join(season_breaks, join_by(closest(date < end_date))) %>%
select(!end_date)


    date    season

1 2020-01-15 Winter 20
2 2020-04-15 Spring 20
3 2020-07-15 Fall 20
4 2020-10-15 Fall 20
5 2021-01-15 Winter 21


*示例数据:*

data <- data.frame(
date = seq(as.Date("2020-01-15"), length.out = 5, by = "3 months")
)

英文:

Using dplyr’s new (>= v1.1.0) rolling join feature:

library(dplyr)

# make table of season breaks
season_breaks &lt;- data.frame(
  season = names(season_breaks),
  end_date = as.Date(unname(season_breaks))
)

data %&gt;%
  left_join(season_breaks, join_by(closest(date &lt; end_date))) %&gt;%
  select(!end_date)
        date    season
1 2020-01-15 Winter 20
2 2020-04-15 Spring 20
3 2020-07-15   Fall 20
4 2020-10-15   Fall 20
5 2021-01-15 Winter 21

Example data:

data &lt;- data.frame(
  date = seq(as.Date(&quot;2020-01-15&quot;), length.out = 5, by = &quot;3 months&quot;)
)

答案3

得分: 0

示例数据:

library(tidyverse)
data <- data.frame(date = as.Date(c("2020-02-01", "2020-06-01", "2020-10-01", "2020-11-01", "2021-02-01")))

我只为您展示代码,因为您可能有不同的定义季节的方法:

按季度定义季节:

data %>% mutate(quarter = paste(quarter(date) %>% case_match(1 ~ "春季", 2 ~ "夏季", 3 ~ "秋季", 4 ~ "冬季"), year(date)))

        date      season
1 2020-02-01 春季 2020
2 2020-06-01 夏季 2020
3 2020-10-01 冬季 2020
4 2020-11-01 冬季 2020
5 2021-02-01 春季 2021

按月份定义季节:

您可能需要调整值以适应您对季节的定义。

data %>% 
  mutate(season = paste(month(date) %>% case_match(
    c(3:5) ~ "春季",
    c(6:8) ~ "夏季",
    c(9:10) ~ "秋季",
    c(11, 12, 1, 2) ~ "冬季"
  ), year(date)))

       date      season
1 2020-02-01 冬季 2020
2 2020-06-01 夏季 2020
3 2020-10-01 秋季 2020
4 2020-11-01 冬季 2020
5 2021-02-01 冬季 2021
英文:

Example data:

library(tidyverse)                                
data &lt;- data.frame(date = as.Date(c(&quot;2020-02-01&quot;,  &quot;2020-06-01&quot;,  &quot;2020-10-01&quot;,  &quot;2020-11-01&quot;, &quot;2021-02-01&quot;)))

I just show you the code as you may have different ways to define season:

Define season by quarter:

data %&gt;% mutate(quarter = paste(quarter(date) %&gt;% case_match(1 ~ &quot;Spring&quot;, 2 ~ &quot;Summer&quot;, 3 ~ &quot;Fall&quot;, 4 ~ &quot;Winter&quot;), year(date)))

        date      season
1 2020-02-01 Spring 2020
2 2020-06-01 Summer 2020
3 2020-10-01 Winter 2020
4 2020-11-01 Winter 2020
5 2021-02-01 Spring 2021
&gt; 

Define season by month:
You may need to adjust values to fit your definition of reasons.

data %&gt;%
  mutate(season = paste(month(date) %&gt;% case_match(
    c(3:5) ~ &quot;Spring&quot;,
    c(6:8) ~ &quot;Summer&quot;,
    c(9:10) ~ &quot;Fall&quot;,
    c(11, 12, 1, 2) ~ &quot;Winter&quot;
  ), year(date)))

       date      season
1 2020-02-01 Winter 2020
2 2020-06-01 Summer 2020
3 2020-10-01   Fall 2020
4 2020-11-01 Winter 2020
5 2021-02-01 Winter 2021

huangapple
  • 本文由 发表于 2023年3月12日 10:22:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/75710740.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定