英文:
Grouping Customer Sessions by Customer and Time until Next Transaction
问题
需要按照距离下一次交易的时间对客户的购物会话进行分组。一个示例数据框如下:
library(tidyverse)
cust_transactions_before <-
tibble(
customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_)
)
我想按照 customer_name
进行分组,并使每位客户的第一笔交易的 cust_session
值从1开始。对于下一条观察值,如果 time_until_next
小于等于30,则将 cust_session
的值保持与前一个观察值相同。如果 time_until_next
大于30,则将前一个 cust_session
的值加1。
最后,如果 time_until_next
是 NA,则将其设置为前一个 cust_session
的值。
处理后的成功数据框如下:
cust_transactions_after <-
tibble(
customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_),
cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3)
)
希望这对你有帮助。
英文:
I need to bucket customer shopping sessions by time until next transaction. An example data frame is:
library(tidyverse)
cust_transactions_before <-
tibble(
customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_))
I would like to group by customer_name
and have the first transaction per customer start at 1 for the value cust_session
. For the next observation I'd like to do an if/then where if time_until_next
is <= 30 then keep the same session number for cust_session
as the previous observation. If time_until_next
is > 30 then take the previous cust_session
and add 1 to it.
Lastly, if time_until_next
is NA then have it equal the previous cust_session
.
A successful data frame after processing would look like this:
cust_transactions_after <-
tibble(
customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_),
cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3))
答案1
得分: 1
library(dplyr)
cust_transactions_before %>%
group_by(customer_name) %>%
mutate(cust_session = cumsum(lag(time_until_next, default = 31) > 30))
英文:
library(dplyr)
cust_transactions_before %>%
group_by(customer_name) %>%
mutate(cust_session = cumsum(lag(time_until_next, default = 31) > 30))
customer_name time_until_next cust_session
<chr> <dbl> <int>
1 a 41 1
2 a 19 2
3 a 5 2
4 a 27 2
5 a 49 2
6 a 3 3
7 a 10 3
8 a 20 3
9 a 13 3
10 a NA 3
11 b 25 1
12 b 17 1
13 b 8 1
14 b 33 1
15 b 25 2
16 b 31 2
17 b 19 3
18 b 5 3
19 b 27 3
20 b NA 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论