根据组内的另一个日期分配缺失的结束日期。

huangapple go评论123阅读模式
英文:

Assign Missing End Dates based on another date in group

问题

我有一个类似的数据集,其中日期结束列有一些NA值,我想要用适当的日期填充它们。我的意图是查看组中的下一个日期开始变量,并用前一个日期结束填充日期,日期减去一天,如果没有可接受的日期填充,则为日期开始添加一年以用作日期结束。

我想请问如何解决这个问题?

英文:

I have a dataset similar to this

  1. id <- c("A1", "A1", "A1", "A1", "B2","B2","C1", "C2")
  2. date_start <- as.Date(c("2022-01-01", "2022-01-06", "2022-01-10", "2022-05-23","2023-01-02", "2022-07-19","2022-12-07", "2023-03-23"))
  3. date_end <- as.Date(c("2022-01-05", NA, "2022-04-13", NA, NA, "2023-10-06", "2023-01-18", "2023-08-15"))
  4. df <- data.frame(id, date_start, date_end)
  5. tibble(df)

You can see that the date_end column has NA values which I would like to fill in with an appropriate date. The intention is to look at the next date_start variable in the group and fill in the previous date_end with a date less a day and if there is no acceptable date to fill add a year to the date_start for the date_end instead.

I can not come up with a solution, and was hoping for some insight.

答案1

得分: 2

df %>%
按id分组 %>%
变异(date_end=coalesce(date_end,lead(date_start-1),date_start+years(1)))

A tibble: 8 × 3

分组: id [4]

id date_start date_end

1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15

如果您有更新的tidyverse包:

df %>%
变异(date_end =
coalesce(date_end, lead(date_start-1), date_start + years(1)),
.by = id)

英文:
  1. df %>%
  2. group_by(id) %>%
  3. mutate(date_end=coalesce(date_end,lead(date_start-1),date_start+years(1)))
  4. # A tibble: 8 × 3
  5. # Groups: id [4]
  6. id date_start date_end
  7. <chr> <date> <date>
  8. 1 A1 2022-01-01 2022-01-05
  9. 2 A1 2022-01-06 2022-01-09
  10. 3 A1 2022-01-10 2022-04-13
  11. 4 A1 2022-05-23 2023-05-23
  12. 5 B2 2023-01-02 2022-07-18
  13. 6 B2 2022-07-19 2023-10-06
  14. 7 C1 2022-12-07 2023-01-18
  15. 8 C2 2023-03-23 2023-08-15

if you have the newer tidyverse package:

  1. df %>%
  2. mutate(date_end =
  3. coalesce(date_end, lead(date_start-1), date_start + years(1)),
  4. .by = id)

答案2

得分: 1

这是一个包含if_else语句的解决方案:

  1. library(dplyr) #> 1.1.0
  2. df %>%
  3. mutate(date_end = if_else(is.na(date_end),
  4. lead(date_start - days(1), default = last(date_start) + years(1)),
  5. date_end), .by=id)
  6. id date_start date_end
  7. 1 A1 2022-01-01 2022-01-05
  8. 2 A1 2022-01-06 2022-01-09
  9. 3 A1 2022-01-10 2022-04-13
  10. 4 A1 2022-05-23 2023-05-23
  11. 5 B2 2023-01-02 2022-07-18
  12. 6 B2 2022-07-19 2023-10-06
  13. 7 C1 2022-12-07 2023-01-18
  14. 8 C2 2023-03-23 2023-08-15

希望这对你有帮助。

英文:

Here is a solution with an if_else statmentent:

  1. library(dplyr) #> 1.1.0
  2. df %>%
  3. mutate(date_end = if_else(is.na(date_end),
  4. lead(date_start - days(1), default = last(date_start) + years(1)),
  5. date_end), .by=id)
  6. id date_start date_end
  7. 1 A1 2022-01-01 2022-01-05
  8. 2 A1 2022-01-06 2022-01-09
  9. 3 A1 2022-01-10 2022-04-13
  10. 4 A1 2022-05-23 2023-05-23
  11. 5 B2 2023-01-02 2022-07-18
  12. 6 B2 2022-07-19 2023-10-06
  13. 7 C1 2022-12-07 2023-01-18
  14. 8 C2 2023-03-23 2023-08-15

huangapple
  • 本文由 发表于 2023年5月26日 01:00:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334683.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定