用最小和最大日期填充每个组的缺失日期。

huangapple go评论128阅读模式
英文:

Complete missing dates with minimum and maximum date per group

问题

以下是翻译好的部分:

我有以下的数据框df(下面是'dput'):

> df
  group      date1      date2 value
1     A 2023-01-04 2023-01-06     1
2     A 2023-01-06 2023-01-07     2
3     A 2023-01-08 2023-01-09     3
4     B 2023-01-05 2023-01-06     3
5     B 2023-01-06 2023-01-08     2
6     B 2023-01-08 2023-01-10     1

我想要在每个组中填充最小日期1和最大日期2之间的缺失日期。这意味着对于组A,应该填充2023-01-072023-01-08之间的日期。期望的输出应该如下所示:

  group      date1      date2 value
1     A 2023-01-04 2023-01-06     1
2     A 2023-01-06 2023-01-07     2
3     A 2023-01-07 2023-01-08    NA
4     A 2023-01-08 2023-01-09     3
5     B 2023-01-05 2023-01-06     3
5     B 2023-01-06 2023-01-08     2
6     B 2023-01-08 2023-01-10     1

所以我想知道是否有人知道如何根据每个组的最小日期和最大日期来填充这些日期?

英文:

I have the following dataframe df (dput below):

> df
  group      date1      date2 value
1     A 2023-01-04 2023-01-06     1
2     A 2023-01-06 2023-01-07     2
3     A 2023-01-08 2023-01-09     3
4     B 2023-01-05 2023-01-06     3
5     B 2023-01-06 2023-01-08     2
6     B 2023-01-08 2023-01-10     1

I would like to complete the missing dates between the minimum start date1 and maximum date2 per group. This means that for group A the interval of 2023-01-07 to 2023-01-08 should be filled. The desired output should look like this:

  group      date1      date2 value
1     A 2023-01-04 2023-01-06     1
2     A 2023-01-06 2023-01-07     2
3     A 2023-01-07 2023-01-08    NA
4     A 2023-01-08 2023-01-09     3
5     B 2023-01-05 2023-01-06     3
5     B 2023-01-06 2023-01-08     2
6     B 2023-01-08 2023-01-10     1

So I was wondering if anyone knows how to complete these dates based on the minimum and maximum date per group?


data:

df <- data.frame(
  group = rep(c("A", "B"), each = 3L),
  date1 = c(
    "2023-01-04", "2023-01-06", "2023-01-08", "2023-01-05", "2023-01-06",
    "2023-01-08"
  ),
  date2 = c(
    "2023-01-06", "2023-01-07", "2023-01-09", "2023-01-06", "2023-01-08",
    "2023-01-10"
  ),
  value = c(1, 2, 3, 3, 2, 1)
)

答案1

得分: 2

以下是翻译好的内容:

与我先前的回答基本相同,尽管由于我们不再需要startend日期,我们需要移动(lead)其中一个。

out <- df %>%
  group_by(group) %>%
  summarize(newdate1 = date2, newdate2 = lead(date1)) %>%
  rename(date1 = newdate1, date2 = newdate2) %>%
  filter(date2 > date1) %>%
  ungroup() %>%
  bind_rows(df) %>%
  arrange(group, date1)
out
# # A tibble: 7 × 4
#   group date1      date2      value
#   <chr> <chr>      <chr>      <dbl>
# 1 A     2023-01-04 2023-01-06     1
# 2 A     2023-01-06 2023-01-07     2
# 3 A     2023-01-07 2023-01-08    NA
# 4 A     2023-01-08 2023-01-09     3
# 5 B     2023-01-05 2023-01-06     3
# 6 B     2023-01-06 2023-01-08     2
# 7 B     2023-01-08 2023-01-10     1
英文:

Effectively the same as my previous answer, though since we no longer need start and end dates, we need to shift (lead) one of them.

out &lt;- df %&gt;%
  group_by(group) %&gt;%
  summarize(newdate1 = date2, newdate2 = lead(date1)) %&gt;%
  rename(date1 = newdate1, date2 = newdate2) %&gt;%
  filter(date2 &gt; date1) %&gt;%
  ungroup() %&gt;%
  bind_rows(df) %&gt;%
  arrange(group, date1)
out
# # A tibble: 7 &#215; 4
#   group date1      date2      value
#   &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt;      &lt;dbl&gt;
# 1 A     2023-01-04 2023-01-06     1
# 2 A     2023-01-06 2023-01-07     2
# 3 A     2023-01-07 2023-01-08    NA
# 4 A     2023-01-08 2023-01-09     3
# 5 B     2023-01-05 2023-01-06     3
# 6 B     2023-01-06 2023-01-08     2
# 7 B     2023-01-08 2023-01-10     1

huangapple
  • 本文由 发表于 2023年4月17日 17:43:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76033756.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定