英文:
Complete missing dates with minimum and maximum date per group
问题
以下是翻译好的部分:
我有以下的数据框df(下面是'dput'):
> df
group date1 date2 value
1 A 2023-01-04 2023-01-06 1
2 A 2023-01-06 2023-01-07 2
3 A 2023-01-08 2023-01-09 3
4 B 2023-01-05 2023-01-06 3
5 B 2023-01-06 2023-01-08 2
6 B 2023-01-08 2023-01-10 1
我想要在每个组中填充最小日期1和最大日期2之间的缺失日期。这意味着对于组A,应该填充2023-01-07
到2023-01-08
之间的日期。期望的输出应该如下所示:
group date1 date2 value
1 A 2023-01-04 2023-01-06 1
2 A 2023-01-06 2023-01-07 2
3 A 2023-01-07 2023-01-08 NA
4 A 2023-01-08 2023-01-09 3
5 B 2023-01-05 2023-01-06 3
5 B 2023-01-06 2023-01-08 2
6 B 2023-01-08 2023-01-10 1
所以我想知道是否有人知道如何根据每个组的最小日期和最大日期来填充这些日期?
英文:
I have the following dataframe df (dput
below):
> df
group date1 date2 value
1 A 2023-01-04 2023-01-06 1
2 A 2023-01-06 2023-01-07 2
3 A 2023-01-08 2023-01-09 3
4 B 2023-01-05 2023-01-06 3
5 B 2023-01-06 2023-01-08 2
6 B 2023-01-08 2023-01-10 1
I would like to complete the missing dates between the minimum start date1 and maximum date2 per group. This means that for group A the interval of 2023-01-07
to 2023-01-08
should be filled. The desired output should look like this:
group date1 date2 value
1 A 2023-01-04 2023-01-06 1
2 A 2023-01-06 2023-01-07 2
3 A 2023-01-07 2023-01-08 NA
4 A 2023-01-08 2023-01-09 3
5 B 2023-01-05 2023-01-06 3
5 B 2023-01-06 2023-01-08 2
6 B 2023-01-08 2023-01-10 1
So I was wondering if anyone knows how to complete these dates based on the minimum and maximum date per group?
data:
df <- data.frame(
group = rep(c("A", "B"), each = 3L),
date1 = c(
"2023-01-04", "2023-01-06", "2023-01-08", "2023-01-05", "2023-01-06",
"2023-01-08"
),
date2 = c(
"2023-01-06", "2023-01-07", "2023-01-09", "2023-01-06", "2023-01-08",
"2023-01-10"
),
value = c(1, 2, 3, 3, 2, 1)
)
答案1
得分: 2
以下是翻译好的内容:
与我先前的回答基本相同,尽管由于我们不再需要start
和end
日期,我们需要移动(lead
)其中一个。
out <- df %>%
group_by(group) %>%
summarize(newdate1 = date2, newdate2 = lead(date1)) %>%
rename(date1 = newdate1, date2 = newdate2) %>%
filter(date2 > date1) %>%
ungroup() %>%
bind_rows(df) %>%
arrange(group, date1)
out
# # A tibble: 7 × 4
# group date1 date2 value
# <chr> <chr> <chr> <dbl>
# 1 A 2023-01-04 2023-01-06 1
# 2 A 2023-01-06 2023-01-07 2
# 3 A 2023-01-07 2023-01-08 NA
# 4 A 2023-01-08 2023-01-09 3
# 5 B 2023-01-05 2023-01-06 3
# 6 B 2023-01-06 2023-01-08 2
# 7 B 2023-01-08 2023-01-10 1
英文:
Effectively the same as my previous answer, though since we no longer need start
and end
dates, we need to shift (lead
) one of them.
out <- df %>%
group_by(group) %>%
summarize(newdate1 = date2, newdate2 = lead(date1)) %>%
rename(date1 = newdate1, date2 = newdate2) %>%
filter(date2 > date1) %>%
ungroup() %>%
bind_rows(df) %>%
arrange(group, date1)
out
# # A tibble: 7 × 4
# group date1 date2 value
# <chr> <chr> <chr> <dbl>
# 1 A 2023-01-04 2023-01-06 1
# 2 A 2023-01-06 2023-01-07 2
# 3 A 2023-01-07 2023-01-08 NA
# 4 A 2023-01-08 2023-01-09 3
# 5 B 2023-01-05 2023-01-06 3
# 6 B 2023-01-06 2023-01-08 2
# 7 B 2023-01-08 2023-01-10 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论