英文:
Create groups using a 30 day window in R
问题
df2 <- df %>%
group_by(id) %>%
arrange(date) %>%
mutate(window_start = floor_date(date, "30 days"),
instance2 = row_number() - row_number()[(date - window_start) >= 30][1] + 1)
英文:
I have data to show a transaction date and the company id.
I want to number each transaction within a 30 day window starting from the first date in that window.
So for example for id 1, their window starts at 2023-01-04 and will run until 2023-02-03. Then the window will start again from the next transaction after that which is 2023-02-15 and then run until 2023-03-17 and so on.
The transactions within each window need to be numbered starting at 1 for each new window.
Example data below, the instance column shows the desired result.
df2 is as far as I have got, I just need to work out how to add the 30 day window, at the moment, its just numbering the rows per id without a time element.
library(tidyverse)
id<-c("1","1","1","1","1","1", "2","2","2")
date<-as.Date(c("2023-01-04","2023-01-15","2023-02-01","2023-02-15","2023-03-15", "2023-04-01", "2023-01-01", "2023-04-01","2023-05-03"))
instance<-c(1,2,3,1,2,1,1,1,1)
df<-data.frame(id, date, instance)
df2<-df%>%
group_by(id)%>%
mutate(instance2=row_number())
Can anyone suggest any updates to achieve this?
答案1
得分: 1
以下是代码的中文翻译:
这是一种(虽然冗长)将日期分组成30天窗口的方法。
df %>%
group_by(id) %>%
arrange(id, date) %>%
mutate(days = as.numeric(date - lag(date, default = first(date))),
days2 = 1 + cumsum(if_else(accumulate(days, ~if_else(.x >= 31, .y, .x + .y)) >= 31, 1, 0))) %>%
group_by(id, days2) %>%
mutate(instance = row_number()) %>%
ungroup() %>%
select(-days, -days2)
一个 tibble: 9 x 3
id date instance
1 1 2023-01-04 1
2 1 2023-01-15 2
3 1 2023-02-01 3
4 1 2023-02-15 1
5 1 2023-03-15 2
6 1 2023-04-01 1
7 2 2023-01-01 1
8 2 2023-04-01 1
9 2 2023-05-03 1
英文:
Here‘s one (admittedly verbose) approach to group the dates into 30-day windows.
df %>%
group_by(id) %>%
arrange(id, date) %>%
mutate(days = as.numeric(date - lag(date, default = first(date))),
days2 = 1 + cumsum(if_else(accumulate(days, ~if_else(.x >= 31, .y, .x + .y)) >= 31, 1, 0))) %>%
group_by(id, days2) %>%
mutate(instance = row_number()) %>%
ungroup() %>%
select(-days, -days2)
# A tibble: 9 x 3
id date instance
<chr> <date> <int>
1 1 2023-01-04 1
2 1 2023-01-15 2
3 1 2023-02-01 3
4 1 2023-02-15 1
5 1 2023-03-15 2
6 1 2023-04-01 1
7 2 2023-01-01 1
8 2 2023-04-01 1
9 2 2023-05-03 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论