使用R创建组,使用30天的窗口。

huangapple go评论58阅读模式
英文:

Create groups using a 30 day window in R

问题

df2 <- df %>%
  group_by(id) %>%
  arrange(date) %>%
  mutate(window_start = floor_date(date, "30 days"),
         instance2 = row_number() - row_number()[(date - window_start) >= 30][1] + 1)
英文:

I have data to show a transaction date and the company id.

I want to number each transaction within a 30 day window starting from the first date in that window.

So for example for id 1, their window starts at 2023-01-04 and will run until 2023-02-03. Then the window will start again from the next transaction after that which is 2023-02-15 and then run until 2023-03-17 and so on.

The transactions within each window need to be numbered starting at 1 for each new window.

Example data below, the instance column shows the desired result.

df2 is as far as I have got, I just need to work out how to add the 30 day window, at the moment, its just numbering the rows per id without a time element.

library(tidyverse)
id&lt;-c(&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;, &quot;2&quot;,&quot;2&quot;,&quot;2&quot;)
date&lt;-as.Date(c(&quot;2023-01-04&quot;,&quot;2023-01-15&quot;,&quot;2023-02-01&quot;,&quot;2023-02-15&quot;,&quot;2023-03-15&quot;, &quot;2023-04-01&quot;, &quot;2023-01-01&quot;, &quot;2023-04-01&quot;,&quot;2023-05-03&quot;))
instance&lt;-c(1,2,3,1,2,1,1,1,1)
df&lt;-data.frame(id, date, instance)

df2&lt;-df%&gt;%
  group_by(id)%&gt;%
  mutate(instance2=row_number())


Can anyone suggest any updates to achieve this?

答案1

得分: 1

以下是代码的中文翻译:

这是一种(虽然冗长)将日期分组成30天窗口的方法。

df %>%
  group_by(id) %>%
  arrange(id, date) %>%
  mutate(days = as.numeric(date - lag(date, default = first(date))),
         days2 = 1 + cumsum(if_else(accumulate(days, ~if_else(.x >= 31, .y, .x + .y)) >= 31, 1, 0))) %>%
  group_by(id, days2) %>%
  mutate(instance = row_number()) %>%
  ungroup() %>%
  select(-days, -days2)

一个 tibble: 9 x 3

id date instance

1 1 2023-01-04 1
2 1 2023-01-15 2
3 1 2023-02-01 3
4 1 2023-02-15 1
5 1 2023-03-15 2
6 1 2023-04-01 1
7 2 2023-01-01 1
8 2 2023-04-01 1
9 2 2023-05-03 1

英文:

Here‘s one (admittedly verbose) approach to group the dates into 30-day windows.

df %&gt;%
  group_by(id) %&gt;%
  arrange(id, date) %&gt;%
  mutate(days = as.numeric(date - lag(date, default = first(date))),
         days2 = 1 + cumsum(if_else(accumulate(days, ~if_else(.x &gt;= 31, .y, .x + .y)) &gt;= 31, 1, 0))) %&gt;%
  group_by(id, days2) %&gt;%
  mutate(instance = row_number()) %&gt;%
  ungroup() %&gt;%
  select(-days, -days2)

# A tibble: 9 x 3
  id    date       instance
  &lt;chr&gt; &lt;date&gt;        &lt;int&gt;
1 1     2023-01-04        1
2 1     2023-01-15        2
3 1     2023-02-01        3
4 1     2023-02-15        1
5 1     2023-03-15        2
6 1     2023-04-01        1
7 2     2023-01-01        1
8 2     2023-04-01        1
9 2     2023-05-03        1

huangapple
  • 本文由 发表于 2023年5月7日 06:22:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76191426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定