按客户和下次交易时间分组客户会话

huangapple go评论90阅读模式
英文:

Grouping Customer Sessions by Customer and Time until Next Transaction

问题

需要按照距离下一次交易的时间对客户的购物会话进行分组。一个示例数据框如下:

  1. library(tidyverse)
  2. cust_transactions_before <-
  3. tibble(
  4. customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
  5. time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_)
  6. )

我想按照 customer_name 进行分组,并使每位客户的第一笔交易的 cust_session 值从1开始。对于下一条观察值,如果 time_until_next 小于等于30,则将 cust_session 的值保持与前一个观察值相同。如果 time_until_next 大于30,则将前一个 cust_session 的值加1。

最后,如果 time_until_next 是 NA,则将其设置为前一个 cust_session 的值。

处理后的成功数据框如下:

  1. cust_transactions_after <-
  2. tibble(
  3. customer_name = c("a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
  4. time_until_next = c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_),
  5. cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3)
  6. )

希望这对你有帮助。

英文:

I need to bucket customer shopping sessions by time until next transaction. An example data frame is:

  1. library(tidyverse)
  2. cust_transactions_before &lt;-
  3. tibble(
  4. customer_name = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;),
  5. time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_))

I would like to group by customer_name and have the first transaction per customer start at 1 for the value cust_session. For the next observation I'd like to do an if/then where if time_until_next is <= 30 then keep the same session number for cust_session as the previous observation. If time_until_next is > 30 then take the previous cust_session and add 1 to it.

Lastly, if time_until_next is NA then have it equal the previous cust_session.

A successful data frame after processing would look like this:

  1. cust_transactions_after &lt;-
  2. tibble(
  3. customer_name = c(&quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;a&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;, &quot;b&quot;),
  4. time_until_next =c(41, 19, 5, 27, 49, 3, 10, 20, 13, NA_integer_, 25, 17, 8, 33, 25, 31, 19, 5, 27, NA_integer_),
  5. cust_session = c(1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 3, 3, 3, 3))

答案1

得分: 1

  1. library(dplyr)
  2. cust_transactions_before %>%
  3. group_by(customer_name) %>%
  4. mutate(cust_session = cumsum(lag(time_until_next, default = 31) > 30))
英文:
  1. library(dplyr)
  2. cust_transactions_before %&gt;%
  3. group_by(customer_name) %&gt;%
  4. mutate(cust_session = cumsum(lag(time_until_next, default = 31) &gt; 30))
  5. customer_name time_until_next cust_session
  6. &lt;chr&gt; &lt;dbl&gt; &lt;int&gt;
  7. 1 a 41 1
  8. 2 a 19 2
  9. 3 a 5 2
  10. 4 a 27 2
  11. 5 a 49 2
  12. 6 a 3 3
  13. 7 a 10 3
  14. 8 a 20 3
  15. 9 a 13 3
  16. 10 a NA 3
  17. 11 b 25 1
  18. 12 b 17 1
  19. 13 b 8 1
  20. 14 b 33 1
  21. 15 b 25 2
  22. 16 b 31 2
  23. 17 b 19 3
  24. 18 b 5 3
  25. 19 b 27 3
  26. 20 b NA 3

huangapple
  • 本文由 发表于 2023年2月9日 01:02:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389214.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定