按客户和下次交易时间分组客户会话

huangapple go评论64阅读模式
英文:

Grouping Customer Sessions by Customer and Time until Next Transaction

问题

需要按照距离下一次交易的时间对客户的购物会话进行分组。一个示例数据框如下:

library(tidyverse)

cust_transactions_before <- 
  tibble(
    customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
    time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_)
  )

我想按照 customer_name 进行分组,并使每位客户的第一笔交易的 cust_session 值从1开始。对于下一条观察值,如果 time_until_next 小于等于30,则将 cust_session 的值保持与前一个观察值相同。如果 time_until_next 大于30,则将前一个 cust_session 的值加1。

最后,如果 time_until_next 是 NA,则将其设置为前一个 cust_session 的值。

处理后的成功数据框如下:

cust_transactions_after <- 
  tibble(
    customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
    time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_),
    cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3)
  )

希望这对你有帮助。

英文:

I need to bucket customer shopping sessions by time until next transaction. An example data frame is:

library(tidyverse)

cust_transactions_before &lt;- 
  tibble(
    customer_name = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;),
    time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_))

I would like to group by customer_name and have the first transaction per customer start at 1 for the value cust_session. For the next observation I'd like to do an if/then where if time_until_next is <= 30 then keep the same session number for cust_session as the previous observation. If time_until_next is > 30 then take the previous cust_session and add 1 to it.

Lastly, if time_until_next is NA then have it equal the previous cust_session.

A successful data frame after processing would look like this:

cust_transactions_after &lt;- 
  tibble(
    customer_name = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;),
    time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_), 
    cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3))

答案1

得分: 1

library(dplyr)
cust_transactions_before %>% 
  group_by(customer_name) %>% 
  mutate(cust_session = cumsum(lag(time_until_next, default = 31) > 30))
英文:
library(dplyr)
cust_transactions_before %&gt;% 
  group_by(customer_name) %&gt;% 
  mutate(cust_session = cumsum(lag(time_until_next, default = 31) &gt; 30))

   customer_name time_until_next cust_session
   &lt;chr&gt;                   &lt;dbl&gt;        &lt;int&gt;
 1 a                          41            1
 2 a                          19            2
 3 a                           5            2
 4 a                          27            2
 5 a                          49            2
 6 a                           3            3
 7 a                          10            3
 8 a                          20            3
 9 a                          13            3
10 a                          NA            3
11 b                          25            1
12 b                          17            1
13 b                           8            1
14 b                          33            1
15 b                          25            2
16 b                          31            2
17 b                          19            3
18 b                           5            3
19 b                          27            3
20 b                          NA            3

huangapple
  • 本文由 发表于 2023年2月9日 01:02:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定