使用R创建组,使用30天的窗口。

huangapple go评论87阅读模式
英文:

Create groups using a 30 day window in R

问题

  1. df2 <- df %>%
  2. group_by(id) %>%
  3. arrange(date) %>%
  4. mutate(window_start = floor_date(date, "30 days"),
  5. instance2 = row_number() - row_number()[(date - window_start) >= 30][1] + 1)
英文:

I have data to show a transaction date and the company id.

I want to number each transaction within a 30 day window starting from the first date in that window.

So for example for id 1, their window starts at 2023-01-04 and will run until 2023-02-03. Then the window will start again from the next transaction after that which is 2023-02-15 and then run until 2023-03-17 and so on.

The transactions within each window need to be numbered starting at 1 for each new window.

Example data below, the instance column shows the desired result.

df2 is as far as I have got, I just need to work out how to add the 30 day window, at the moment, its just numbering the rows per id without a time element.

  1. library(tidyverse)
  2. id&lt;-c(&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;,&quot;1&quot;, &quot;2&quot;,&quot;2&quot;,&quot;2&quot;)
  3. date&lt;-as.Date(c(&quot;2023-01-04&quot;,&quot;2023-01-15&quot;,&quot;2023-02-01&quot;,&quot;2023-02-15&quot;,&quot;2023-03-15&quot;, &quot;2023-04-01&quot;, &quot;2023-01-01&quot;, &quot;2023-04-01&quot;,&quot;2023-05-03&quot;))
  4. instance&lt;-c(1,2,3,1,2,1,1,1,1)
  5. df&lt;-data.frame(id, date, instance)
  6. df2&lt;-df%&gt;%
  7. group_by(id)%&gt;%
  8. mutate(instance2=row_number())

Can anyone suggest any updates to achieve this?

答案1

得分: 1

以下是代码的中文翻译:

这是一种(虽然冗长)将日期分组成30天窗口的方法。

  1. df %>%
  2. group_by(id) %>%
  3. arrange(id, date) %>%
  4. mutate(days = as.numeric(date - lag(date, default = first(date))),
  5. days2 = 1 + cumsum(if_else(accumulate(days, ~if_else(.x >= 31, .y, .x + .y)) >= 31, 1, 0))) %>%
  6. group_by(id, days2) %>%
  7. mutate(instance = row_number()) %>%
  8. ungroup() %>%
  9. select(-days, -days2)

一个 tibble: 9 x 3

id date instance

1 1 2023-01-04 1
2 1 2023-01-15 2
3 1 2023-02-01 3
4 1 2023-02-15 1
5 1 2023-03-15 2
6 1 2023-04-01 1
7 2 2023-01-01 1
8 2 2023-04-01 1
9 2 2023-05-03 1

英文:

Here‘s one (admittedly verbose) approach to group the dates into 30-day windows.

  1. df %&gt;%
  2. group_by(id) %&gt;%
  3. arrange(id, date) %&gt;%
  4. mutate(days = as.numeric(date - lag(date, default = first(date))),
  5. days2 = 1 + cumsum(if_else(accumulate(days, ~if_else(.x &gt;= 31, .y, .x + .y)) &gt;= 31, 1, 0))) %&gt;%
  6. group_by(id, days2) %&gt;%
  7. mutate(instance = row_number()) %&gt;%
  8. ungroup() %&gt;%
  9. select(-days, -days2)
  10. # A tibble: 9 x 3
  11. id date instance
  12. &lt;chr&gt; &lt;date&gt; &lt;int&gt;
  13. 1 1 2023-01-04 1
  14. 2 1 2023-01-15 2
  15. 3 1 2023-02-01 3
  16. 4 1 2023-02-15 1
  17. 5 1 2023-03-15 2
  18. 6 1 2023-04-01 1
  19. 7 2 2023-01-01 1
  20. 8 2 2023-04-01 1
  21. 9 2 2023-05-03 1

huangapple
  • 本文由 发表于 2023年5月7日 06:22:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76191426.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定