根据R中的单独逻辑变量计算连续周数。

huangapple go评论113阅读模式
英文:

Count consecutive weeks based on separate Logical Variable in R

问题

在我的数据框 df 中,我需要构建列 count。它应该递增地计算在同一个 customer_id 的两个 Buy=TRUE 之间连续的 Buy=FALSE。然而,如果同一个 customer_id 没有先前的 Buy=TRUE,那么应该为 NA。一个 data.table 或 dplyr 的解决方案会很好。

  1. df <- data.frame(
  2. customer_id = rep(1:2, each = 6),
  3. Buy = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE),
  4. count = c(NA, NA, 0, 1, 2, 0, NA, 0, 1, 0, 0, 1)
  5. )
英文:

In my data.frame df, I need to construct the column count. It should incrementally count the consecutive Buy=FALSE between two Buy=TRUE for the same customer_id. However, it should be NA if there is no previous Buy=TRUE for the same customer_id. A data.table or dplyr solution would be great.

  1. df &lt;- data.frame(
  2. customer_id = rep(1:2, each = 6),
  3. Buy = c(FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, TRUE, FALSE),
  4. count = c(NA, NA, 0, 1, 2, 0, NA, 0, 1, 0, 0, 1)
  5. )

答案1

得分: 1

  1. library(dplyr)
  2. df |&gt;
  3. mutate(
  4. consec_id = consecutive_id(Buy),
  5. any_true = cumsum(Buy) &gt; 0,
  6. .by = customer_id
  7. ) |&gt;
  8. mutate(
  9. result = case_when(
  10. Buy ~ 0L,
  11. any_true ~ row_number(),
  12. TRUE ~ NA_integer_
  13. ),
  14. .by = c(customer_id, consec_id)
  15. )
  16. # customer_id Buy count consec_id any_true result
  17. # 1 1 FALSE NA 1 FALSE NA
  18. # 2 1 FALSE NA 1 FALSE NA
  19. # 3 1 TRUE 0 2 TRUE 0
  20. # 4 1 FALSE 1 3 TRUE 1
  21. # 5 1 FALSE 2 3 TRUE 2
  22. # 6 1 TRUE 0 4 TRUE 0
  23. # 7 2 FALSE NA 1 FALSE NA
  24. # 8 2 TRUE 0 2 TRUE 0
  25. # 9 2 FALSE 1 3 TRUE 1
  26. # 10 2 TRUE 0 4 TRUE 0
  27. # 11 2 TRUE 0 4 TRUE 0
  28. # 12 2 FALSE 1 5 TRUE 1
英文:
  1. library(dplyr)
  2. df |&gt;
  3. mutate(
  4. consec_id = consecutive_id(Buy),
  5. any_true = cumsum(Buy) &gt; 0,
  6. .by = customer_id
  7. ) |&gt;
  8. mutate(
  9. result = case_when(
  10. Buy ~ 0L,
  11. any_true ~ row_number(),
  12. TRUE ~ NA_integer_
  13. ),
  14. .by = c(customer_id, consec_id)
  15. )
  16. # customer_id Buy count consec_id any_true result
  17. # 1 1 FALSE NA 1 FALSE NA
  18. # 2 1 FALSE NA 1 FALSE NA
  19. # 3 1 TRUE 0 2 TRUE 0
  20. # 4 1 FALSE 1 3 TRUE 1
  21. # 5 1 FALSE 2 3 TRUE 2
  22. # 6 1 TRUE 0 4 TRUE 0
  23. # 7 2 FALSE NA 1 FALSE NA
  24. # 8 2 TRUE 0 2 TRUE 0
  25. # 9 2 FALSE 1 3 TRUE 1
  26. # 10 2 TRUE 0 4 TRUE 0
  27. # 11 2 TRUE 0 4 TRUE 0
  28. # 12 2 FALSE 1 5 TRUE 1

答案2

得分: 1

一种方法可能是

  1. df$count <- NULL
  1. library(dplyr)
  2. df %>%
  3. group_by(grp = cumsum(Buy), customer_id) %>%
  4. mutate(count = row_number() - 1, grp = Buy[1] == F) %>%
  5. ungroup() %>%
  6. mutate(count = ifelse(grp, NA, count), grp = NULL)
  7. # A tibble: 12 × 3
  8. customer_id Buy count
  9. <int> <lgl> <dbl>
  10. 1 1 FALSE NA
  11. 2 1 FALSE NA
  12. 3 1 TRUE 0
  13. 4 1 FALSE 1
  14. 5 1 FALSE 2
  15. 6 1 TRUE 0
  16. 7 2 FALSE NA
  17. 8 2 TRUE 0
  18. 9 2 FALSE 1
  19. 10 2 TRUE 0
  20. 11 2 TRUE 0
  21. 12 2 FALSE 1
英文:

One way could be

  1. df$count &lt;- NULL
  1. library(dplyr)
  2. df %&gt;%
  3. group_by(grp = cumsum(Buy), customer_id) %&gt;%
  4. mutate(count = row_number() - 1, grp = Buy[1] == F) %&gt;%
  5. ungroup() %&gt;%
  6. mutate(count = ifelse(grp, NA, count), grp = NULL)
  7. # A tibble: 12 &#215; 3
  8. customer_id Buy count
  9. &lt;int&gt; &lt;lgl&gt; &lt;dbl&gt;
  10. 1 1 FALSE NA
  11. 2 1 FALSE NA
  12. 3 1 TRUE 0
  13. 4 1 FALSE 1
  14. 5 1 FALSE 2
  15. 6 1 TRUE 0
  16. 7 2 FALSE NA
  17. 8 2 TRUE 0
  18. 9 2 FALSE 1
  19. 10 2 TRUE 0
  20. 11 2 TRUE 0
  21. 12 2 FALSE 1

huangapple
  • 本文由 发表于 2023年6月14日 23:30:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76475234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定