根据组内的另一个日期分配缺失的结束日期。

huangapple go评论99阅读模式
英文:

Assign Missing End Dates based on another date in group

问题

我有一个类似的数据集,其中日期结束列有一些NA值,我想要用适当的日期填充它们。我的意图是查看组中的下一个日期开始变量,并用前一个日期结束填充日期,日期减去一天,如果没有可接受的日期填充,则为日期开始添加一年以用作日期结束。

我想请问如何解决这个问题?

英文:

I have a dataset similar to this

id <- c("A1", "A1", "A1", "A1", "B2","B2","C1", "C2")
date_start <- as.Date(c("2022-01-01", "2022-01-06", "2022-01-10", "2022-05-23","2023-01-02", "2022-07-19","2022-12-07", "2023-03-23"))
date_end <- as.Date(c("2022-01-05", NA, "2022-04-13", NA, NA, "2023-10-06", "2023-01-18", "2023-08-15"))

df <- data.frame(id, date_start, date_end)

tibble(df)

You can see that the date_end column has NA values which I would like to fill in with an appropriate date. The intention is to look at the next date_start variable in the group and fill in the previous date_end with a date less a day and if there is no acceptable date to fill add a year to the date_start for the date_end instead.

I can not come up with a solution, and was hoping for some insight.

答案1

得分: 2

df %>%
按id分组 %>%
变异(date_end=coalesce(date_end,lead(date_start-1),date_start+years(1)))

A tibble: 8 × 3

分组: id [4]

id date_start date_end

1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15

如果您有更新的tidyverse包:

df %>%
变异(date_end =
coalesce(date_end, lead(date_start-1), date_start + years(1)),
.by = id)

英文:
df %>%
   group_by(id) %>%
   mutate(date_end=coalesce(date_end,lead(date_start-1),date_start+years(1)))

# A tibble: 8 × 3
# Groups:   id [4]
  id    date_start date_end  
  <chr> <date>     <date>    
1 A1    2022-01-01 2022-01-05
2 A1    2022-01-06 2022-01-09
3 A1    2022-01-10 2022-04-13
4 A1    2022-05-23 2023-05-23
5 B2    2023-01-02 2022-07-18
6 B2    2022-07-19 2023-10-06
7 C1    2022-12-07 2023-01-18
8 C2    2023-03-23 2023-08-15

if you have the newer tidyverse package:

df %>%
  mutate(date_end = 
           coalesce(date_end, lead(date_start-1),  date_start + years(1)), 
         .by = id)

答案2

得分: 1

这是一个包含if_else语句的解决方案:

library(dplyr) #> 1.1.0
df %>%
  mutate(date_end = if_else(is.na(date_end), 
                       lead(date_start - days(1), default = last(date_start) + years(1)), 
                       date_end), .by=id)

 id date_start   date_end
1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15

希望这对你有帮助。

英文:

Here is a solution with an if_else statmentent:

library(dplyr) #> 1.1.0
df %>%
  mutate(date_end = if_else(is.na(date_end), 
                       lead(date_start - days(1), default = last(date_start) + years(1)), 
                       date_end), .by=id)

 id date_start   date_end
1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15

huangapple
  • 本文由 发表于 2023年5月26日 01:00:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334683.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定