英文:
Assign Missing End Dates based on another date in group
问题
我有一个类似的数据集,其中日期结束列有一些NA值,我想要用适当的日期填充它们。我的意图是查看组中的下一个日期开始变量,并用前一个日期结束填充日期,日期减去一天,如果没有可接受的日期填充,则为日期开始添加一年以用作日期结束。
我想请问如何解决这个问题?
英文:
I have a dataset similar to this
id <- c("A1", "A1", "A1", "A1", "B2","B2","C1", "C2")
date_start <- as.Date(c("2022-01-01", "2022-01-06", "2022-01-10", "2022-05-23","2023-01-02", "2022-07-19","2022-12-07", "2023-03-23"))
date_end <- as.Date(c("2022-01-05", NA, "2022-04-13", NA, NA, "2023-10-06", "2023-01-18", "2023-08-15"))
df <- data.frame(id, date_start, date_end)
tibble(df)
You can see that the date_end column has NA values which I would like to fill in with an appropriate date. The intention is to look at the next date_start variable in the group and fill in the previous date_end with a date less a day and if there is no acceptable date to fill add a year to the date_start for the date_end instead.
I can not come up with a solution, and was hoping for some insight.
答案1
得分: 2
df %>%
按id分组 %>%
变异(date_end=coalesce(date_end,lead(date_start-1),date_start+years(1)))
A tibble: 8 × 3
分组: id [4]
id date_start date_end
1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15
如果您有更新的tidyverse包:
df %>%
变异(date_end =
coalesce(date_end, lead(date_start-1), date_start + years(1)),
.by = id)
英文:
df %>%
group_by(id) %>%
mutate(date_end=coalesce(date_end,lead(date_start-1),date_start+years(1)))
# A tibble: 8 × 3
# Groups: id [4]
id date_start date_end
<chr> <date> <date>
1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15
if you have the newer tidyverse package:
df %>%
mutate(date_end =
coalesce(date_end, lead(date_start-1), date_start + years(1)),
.by = id)
答案2
得分: 1
这是一个包含if_else
语句的解决方案:
library(dplyr) #> 1.1.0
df %>%
mutate(date_end = if_else(is.na(date_end),
lead(date_start - days(1), default = last(date_start) + years(1)),
date_end), .by=id)
id date_start date_end
1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15
希望这对你有帮助。
英文:
Here is a solution with an if_else
statmentent:
library(dplyr) #> 1.1.0
df %>%
mutate(date_end = if_else(is.na(date_end),
lead(date_start - days(1), default = last(date_start) + years(1)),
date_end), .by=id)
id date_start date_end
1 A1 2022-01-01 2022-01-05
2 A1 2022-01-06 2022-01-09
3 A1 2022-01-10 2022-04-13
4 A1 2022-05-23 2023-05-23
5 B2 2023-01-02 2022-07-18
6 B2 2022-07-19 2023-10-06
7 C1 2022-12-07 2023-01-18
8 C2 2023-03-23 2023-08-15
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论