英文:
Grouping Customer Sessions by Customer and Time until Next Transaction
问题
需要按照距离下一次交易的时间对客户的购物会话进行分组。一个示例数据框如下:
library(tidyverse)
cust_transactions_before <- 
  tibble(
    customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
    time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_)
  )
我想按照 customer_name 进行分组,并使每位客户的第一笔交易的 cust_session 值从1开始。对于下一条观察值,如果 time_until_next 小于等于30,则将 cust_session 的值保持与前一个观察值相同。如果 time_until_next 大于30,则将前一个 cust_session 的值加1。
最后,如果 time_until_next 是 NA,则将其设置为前一个 cust_session 的值。
处理后的成功数据框如下:
cust_transactions_after <- 
  tibble(
    customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
    time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_),
    cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3)
  )
希望这对你有帮助。
英文:
I need to bucket customer shopping sessions by time until next transaction. An example data frame is:
library(tidyverse)
cust_transactions_before <- 
  tibble(
    customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
    time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_))
I would like to group by customer_name and have the first transaction per customer start at 1 for the value cust_session. For the next observation I'd like to do an if/then where if time_until_next is <= 30 then keep the same session number for cust_session as the previous observation. If time_until_next is > 30 then take the previous cust_session and add 1 to it.
Lastly, if time_until_next is NA then have it equal the previous cust_session.
A successful data frame after processing would look like this:
cust_transactions_after <- 
  tibble(
    customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
    time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_), 
    cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3))
答案1
得分: 1
library(dplyr)
cust_transactions_before %>% 
  group_by(customer_name) %>% 
  mutate(cust_session = cumsum(lag(time_until_next, default = 31) > 30))
英文:
library(dplyr)
cust_transactions_before %>% 
  group_by(customer_name) %>% 
  mutate(cust_session = cumsum(lag(time_until_next, default = 31) > 30))
   customer_name time_until_next cust_session
   <chr>                   <dbl>        <int>
 1 a                          41            1
 2 a                          19            2
 3 a                           5            2
 4 a                          27            2
 5 a                          49            2
 6 a                           3            3
 7 a                          10            3
 8 a                          20            3
 9 a                          13            3
10 a                          NA            3
11 b                          25            1
12 b                          17            1
13 b                           8            1
14 b                          33            1
15 b                          25            2
16 b                          31            2
17 b                          19            3
18 b                           5            3
19 b                          27            3
20 b                          NA            3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论