获取每个组的所有级别组合。

huangapple go评论85阅读模式
英文:

Get all level combinations for each group

问题

我有一个客户ID的列表,每个客户ID都有一组唯一的产品。理论上,每个客户ID最多可能有 ~150 种唯一的产品。

  1. df <- tibble(ID = c(1,1,1,2,2,3,3,4),
  2. prod = c("Prod1", "Prod2", "Prod3", "Prod1", "Prod4", "Prod3", "Prod5", "Prod2"))

从中,我需要为每个客户ID获取所有可能的产品组合,不仅仅是在最高级别(按ID分组)。也就是说,需要包括所有产品的组合,就像 expand_grid() 会做的那样,同时还要包括所有 1,...,n 个元素的组合,其中 n 是该ID拥有的唯一产品的数量。

因此,最终数据集应如下所示:

  1. df_results <- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
  2. combo = c("Prod1", "Prod2", "Prod3", "Prod1|Prod2", "Prod1|Prod3", "Prod2|Prod3", "Prod1|Prod2|Prod3",
  3. "Prod1", "Prod4", "Prod1|Prod4",
  4. "Prod3", "Prod5", "Prod3|Prod5",
  5. "Prod2"))
英文:

I have a list of customer IDs, each with a list of unique products they used. There can theoretically be up to ~150 unique products.

  1. df &lt;- tibble(ID = c(1,1,1,2,2,3,3,4),
  2. prod = c(&quot;Prod1&quot;, &quot;Prod2&quot;, &quot;Prod3&quot;, &quot;Prod1&quot;, &quot;Prod4&quot;, &quot;Prod3&quot;, &quot;Prod5&quot;, &quot;Prod2&quot;))

From that, I need to get all possible combinations of products for each ID, not only on the highest level (grouped by ID). That is, include the combination with all products, as expand_grid() would do, but also all combinations of 1,...,n elements, where n is the number of unique products the ID has.

Final dataset should therefore look as such:

  1. df_results &lt;- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
  2. combo = c(&quot;Prod1&quot;, &quot;Prod2&quot;, &quot;Prod3&quot;, &quot;Prod1|Prod2&quot;, &quot;Prod1|Prod3&quot;, &quot;Prod2|Prod3&quot;, &quot;Prod1|Prod2|Prod3&quot;,
  3. &quot;Prod1&quot;, &quot;Prod4&quot;, &quot;Prod1|Prod4&quot;,
  4. &quot;Prod3&quot;, &quot;Prod5&quot;, &quot;Prod3|Prod5&quot;,
  5. &quot;Prod2&quot;))

答案1

得分: 6

以下是代码部分的翻译:

  1. library(dplyr)
  2. df %>%
  3. group_by(ID) %>%
  4. reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = "|")))))
  1. # A tibble: 14 × 2
  2. ID combo
  3. <dbl> <chr>
  4. 1 1 Prod1
  5. 2 1 Prod2
  6. 3 1 Prod3
  7. 4 1 Prod1|Prod2
  8. 5 1 Prod1|Prod3
  9. 6 1 Prod2|Prod3
  10. 7 1 Prod1|Prod2|Prod3
  11. 8 2 Prod1
  12. 9 2 Prod4
  13. 10 2 Prod1|Prod4
  14. 11 3 Prod3
  15. 12 3 Prod5
  16. 13 3 Prod3|Prod5
  17. 14 4 Prod2
  1. stack(tapply(df$prod, df$ID,
  2. \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = "|")))))[2:1]
英文:

An extension of the canonical answer:

  1. library(dplyr)
  2. df %&gt;%
  3. group_by(ID) %&gt;%
  4. reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = &quot;|&quot;))))))
  1. # A tibble: 14 &#215; 2
  2. ID combo
  3. &lt;dbl&gt; &lt;chr&gt;
  4. 1 1 Prod1
  5. 2 1 Prod2
  6. 3 1 Prod3
  7. 4 1 Prod1|Prod2
  8. 5 1 Prod1|Prod3
  9. 6 1 Prod2|Prod3
  10. 7 1 Prod1|Prod2|Prod3
  11. 8 2 Prod1
  12. 9 2 Prod4
  13. 10 2 Prod1|Prod4
  14. 11 3 Prod3
  15. 12 3 Prod5
  16. 13 3 Prod3|Prod5
  17. 14 4 Prod2

Or in base R:

  1. stack(tapply(df$prod, df$ID,
  2. \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = &quot;|&quot;))))))[2:1]

答案2

得分: 2

另一个 tidyverse 的选项可能是:

  1. df %>%
  2. group_by(ID) %>%
  3. transmute(combo = map2(.x = list(prod),
  4. .y = seq_along(prod),
  5. ~ combn(.x, .y, FUN = paste, collapse = "|"))) %>%
  6. unnest_longer(combo)
  1. ID combo


1 1 Prod1
2 1 Prod2
3 1 Prod3
4 1 Prod1|Prod2
5 1 Prod1|Prod3
6 1 Prod2|Prod3
7 1 Prod1|Prod2|Prod3
8 2 Prod1
9 2 Prod4
10 2 Prod1|Prod4
11 3 Prod3
12 3 Prod5
13 3 Prod3|Prod5
14 4 Prod2

  1. <details>
  2. <summary>英文:</summary>
  3. Another `tidyverse` option could be:
  4. df %&gt;%
  5. group_by(ID) %&gt;%
  6. transmute(combo = map2(.x = list(prod),
  7. .y = seq_along(prod),
  8. ~ combn(.x, .y, FUN = paste, collapse = &quot;|&quot;))) %&gt;%
  9. unnest_longer(combo)
  10. ID combo
  11. &lt;dbl&gt; &lt;chr&gt;
  12. 1 1 Prod1
  13. 2 1 Prod2
  14. 3 1 Prod3
  15. 4 1 Prod1|Prod2
  16. 5 1 Prod1|Prod3
  17. 6 1 Prod2|Prod3
  18. 7 1 Prod1|Prod2|Prod3
  19. 8 2 Prod1
  20. 9 2 Prod4
  21. 10 2 Prod1|Prod4
  22. 11 3 Prod3
  23. 12 3 Prod5
  24. 13 3 Prod3|Prod5
  25. 14 4 Prod2
  26. </details>
  27. # 答案3
  28. **得分**: 1
  29. 这里是另一个使用基本的R选项,使用`intToBits`将所有组合映射为整数索引的二进制表示。
  30. ```R
  31. with(
  32. df,
  33. setNames(
  34. rev(
  35. stack(
  36. by(
  37. Prod, ID,
  38. function(p) {
  39. sapply(
  40. seq(2^length(p) - 1),
  41. function(k) paste0(p[which(intToBits(k) > 0)], collapse = "|")
  42. )
  43. }
  44. )
  45. )
  46. ), names(df)
  47. )
  48. )

得到的结果如下:

  1. ID Prod
  2. 1 1 Prod1
  3. 2 1 Prod2
  4. 3 1 Prod1|Prod2
  5. 4 1 Prod3
  6. 5 1 Prod1|Prod3
  7. 6 1 Prod2|Prod3
  8. 7 1 Prod1|Prod2|Prod3
  9. 8 2 Prod1
  10. 9 2 Prod4
  11. 10 2 Prod1|Prod4
  12. 11 3 Prod3
  13. 12 3 Prod5
  14. 13 3 Prod3|Prod5
  15. 14 4 Prod2

如果您想探索使用expand.grid的可能性(但不建议,因为它相当低效),您可以尝试以下代码:

  1. with(
  2. df,
  3. setNames(
  4. rev(
  5. stack(
  6. lapply(
  7. split(Prod, ID),
  8. function(x) {
  9. unique(
  10. apply(
  11. expand.grid(rep(list(x), length(x))),
  12. 1,
  13. function(v) {
  14. paste0(sort(unique(v)), collapse = "|")
  15. }
  16. )
  17. )
  18. }
  19. )
  20. )
  21. ), names(df)
  22. )
  23. )

得到的结果如下:

  1. ID Prod
  2. 1 1 Prod1
  3. 2 1 Prod1|Prod2
  4. 3 1 Prod1|Prod3
  5. 4 1 Prod1|Prod2|Prod3
  6. 5 1 Prod2
  7. 6 1 Prod2|Prod3
  8. 7 1 Prod3
  9. 8 2 Prod1
  10. 9 2 Prod1|Prod4
  11. 10 2 Prod4
  12. 11 3 Prod3
  13. 12 3 Prod3|Prod5
  14. 13 3 Prod5
  15. 14 4 Prod2
英文:

Here is another base R option using intToBits to map all combinations into binary presentation of integer indexing

  1. with(
  2. df,
  3. setNames(
  4. rev(
  5. stack(
  6. by(
  7. Prod, ID,
  8. function(p) {
  9. sapply(
  10. seq(2^length(p) - 1),
  11. function(k) paste0(p[which(intToBits(k) &gt; 0)], collapse = &quot;|&quot;)
  12. )
  13. }
  14. )
  15. )
  16. ), names(df)
  17. )
  18. )

which gives

  1. ID Prod
  2. 1 1 Prod1
  3. 2 1 Prod2
  4. 3 1 Prod1|Prod2
  5. 4 1 Prod3
  6. 5 1 Prod1|Prod3
  7. 6 1 Prod2|Prod3
  8. 7 1 Prod1|Prod2|Prod3
  9. 8 2 Prod1
  10. 9 2 Prod4
  11. 10 2 Prod1|Prod4
  12. 11 3 Prod3
  13. 12 3 Prod5
  14. 13 3 Prod3|Prod5
  15. 14 4 Prod2

If you want to EXPLORE THE POSSIBILITY OF USING expand.grid (but NOT recommend it since it is rather inefficient), you can try the code below

  1. with(
  2. df,
  3. setNames(
  4. rev(
  5. stack(
  6. lapply(
  7. split(Prod, ID),
  8. function(x) {
  9. unique(
  10. apply(
  11. expand.grid(rep(list(x), length(x))),
  12. 1,
  13. function(v) {
  14. paste0(sort(unique(v)), collapse = &quot;|&quot;)
  15. }
  16. )
  17. )
  18. }
  19. )
  20. )
  21. ), names(df)
  22. )
  23. )

which gives

  1. ID Prod
  2. 1 1 Prod1
  3. 2 1 Prod1|Prod2
  4. 3 1 Prod1|Prod3
  5. 4 1 Prod1|Prod2|Prod3
  6. 5 1 Prod2
  7. 6 1 Prod2|Prod3
  8. 7 1 Prod3
  9. 8 2 Prod1
  10. 9 2 Prod1|Prod4
  11. 10 2 Prod4
  12. 11 3 Prod3
  13. 12 3 Prod3|Prod5
  14. 13 3 Prod5
  15. 14 4 Prod2

huangapple
  • 本文由 发表于 2023年4月4日 17:34:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定