获取每个组的所有级别组合。

huangapple go评论45阅读模式
英文:

Get all level combinations for each group

问题

我有一个客户ID的列表,每个客户ID都有一组唯一的产品。理论上,每个客户ID最多可能有 ~150 种唯一的产品。

df <- tibble(ID = c(1,1,1,2,2,3,3,4),
             prod = c("Prod1", "Prod2", "Prod3", "Prod1", "Prod4", "Prod3", "Prod5", "Prod2"))

从中,我需要为每个客户ID获取所有可能的产品组合,不仅仅是在最高级别(按ID分组)。也就是说,需要包括所有产品的组合,就像 expand_grid() 会做的那样,同时还要包括所有 1,...,n 个元素的组合,其中 n 是该ID拥有的唯一产品的数量。

因此,最终数据集应如下所示:

df_results <- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
                     combo = c("Prod1", "Prod2", "Prod3", "Prod1|Prod2", "Prod1|Prod3", "Prod2|Prod3", "Prod1|Prod2|Prod3",
                               "Prod1", "Prod4", "Prod1|Prod4",
                               "Prod3", "Prod5", "Prod3|Prod5",
                               "Prod2"))
英文:

I have a list of customer IDs, each with a list of unique products they used. There can theoretically be up to ~150 unique products.

df &lt;- tibble(ID = c(1,1,1,2,2,3,3,4),
             prod = c(&quot;Prod1&quot;, &quot;Prod2&quot;, &quot;Prod3&quot;, &quot;Prod1&quot;, &quot;Prod4&quot;, &quot;Prod3&quot;, &quot;Prod5&quot;, &quot;Prod2&quot;))

From that, I need to get all possible combinations of products for each ID, not only on the highest level (grouped by ID). That is, include the combination with all products, as expand_grid() would do, but also all combinations of 1,...,n elements, where n is the number of unique products the ID has.

Final dataset should therefore look as such:

df_results &lt;- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
                     combo = c(&quot;Prod1&quot;, &quot;Prod2&quot;, &quot;Prod3&quot;, &quot;Prod1|Prod2&quot;, &quot;Prod1|Prod3&quot;, &quot;Prod2|Prod3&quot;, &quot;Prod1|Prod2|Prod3&quot;,
                               &quot;Prod1&quot;, &quot;Prod4&quot;, &quot;Prod1|Prod4&quot;,
                               &quot;Prod3&quot;, &quot;Prod5&quot;, &quot;Prod3|Prod5&quot;,
                               &quot;Prod2&quot;))

答案1

得分: 6

以下是代码部分的翻译:

library(dplyr)
df %>% 
  group_by(ID) %>% 
  reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = "|")))))
# A tibble: 14 × 2
      ID combo
   <dbl> <chr>
 1     1 Prod1
 2     1 Prod2
 3     1 Prod3
 4     1 Prod1|Prod2
 5     1 Prod1|Prod3
 6     1 Prod2|Prod3
 7     1 Prod1|Prod2|Prod3
 8     2 Prod1
 9     2 Prod4
10     2 Prod1|Prod4
11     3 Prod3
12     3 Prod5
13     3 Prod3|Prod5
14     4 Prod2
stack(tapply(df$prod, df$ID, 
       \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = "|")))))[2:1]
英文:

An extension of the canonical answer:

library(dplyr)
df %&gt;% 
  group_by(ID) %&gt;% 
  reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = &quot;|&quot;))))))
# A tibble: 14 &#215; 2
      ID combo            
   &lt;dbl&gt; &lt;chr&gt;            
 1     1 Prod1            
 2     1 Prod2            
 3     1 Prod3            
 4     1 Prod1|Prod2      
 5     1 Prod1|Prod3      
 6     1 Prod2|Prod3      
 7     1 Prod1|Prod2|Prod3
 8     2 Prod1            
 9     2 Prod4            
10     2 Prod1|Prod4      
11     3 Prod3            
12     3 Prod5            
13     3 Prod3|Prod5      
14     4 Prod2           

Or in base R:

stack(tapply(df$prod, df$ID, 
       \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = &quot;|&quot;))))))[2:1]

答案2

得分: 2

另一个 tidyverse 的选项可能是:

df %>%
 group_by(ID) %>%
 transmute(combo = map2(.x = list(prod), 
                        .y = seq_along(prod),
                        ~ combn(.x, .y, FUN = paste, collapse = "|"))) %>%
 unnest_longer(combo)
  ID combo            


1 1 Prod1
2 1 Prod2
3 1 Prod3
4 1 Prod1|Prod2
5 1 Prod1|Prod3
6 1 Prod2|Prod3
7 1 Prod1|Prod2|Prod3
8 2 Prod1
9 2 Prod4
10 2 Prod1|Prod4
11 3 Prod3
12 3 Prod5
13 3 Prod3|Prod5
14 4 Prod2



<details>
<summary>英文:</summary>

Another `tidyverse` option could be:

    df %&gt;%
     group_by(ID) %&gt;%
     transmute(combo = map2(.x = list(prod), 
                            .y = seq_along(prod),
                            ~ combn(.x, .y, FUN = paste, collapse = &quot;|&quot;))) %&gt;%
     unnest_longer(combo)

          ID combo            
       &lt;dbl&gt; &lt;chr&gt;            
     1     1 Prod1            
     2     1 Prod2            
     3     1 Prod3            
     4     1 Prod1|Prod2      
     5     1 Prod1|Prod3      
     6     1 Prod2|Prod3      
     7     1 Prod1|Prod2|Prod3
     8     2 Prod1            
     9     2 Prod4            
    10     2 Prod1|Prod4      
    11     3 Prod3            
    12     3 Prod5            
    13     3 Prod3|Prod5      
    14     4 Prod2  

</details>



# 答案3
**得分**: 1

这里是另一个使用基本的R选项,使用`intToBits`将所有组合映射为整数索引的二进制表示。

```R
with(
  df,
  setNames(
    rev(
      stack(
        by(
          Prod, ID,
          function(p) {
            sapply(
              seq(2^length(p) - 1),
              function(k) paste0(p[which(intToBits(k) > 0)], collapse = "|")
            )
          }
        )
      )
    ), names(df)
  )
)

得到的结果如下:

   ID              Prod
1   1             Prod1
2   1             Prod2
3   1       Prod1|Prod2
4   1             Prod3
5   1       Prod1|Prod3
6   1       Prod2|Prod3
7   1 Prod1|Prod2|Prod3
8   2             Prod1
9   2             Prod4
10  2       Prod1|Prod4
11  3             Prod3
12  3             Prod5
13  3       Prod3|Prod5
14  4             Prod2

如果您想探索使用expand.grid的可能性(但不建议,因为它相当低效),您可以尝试以下代码:

with(
  df,
  setNames(
    rev(
      stack(
        lapply(
          split(Prod, ID),
          function(x) {
            unique(
              apply(
                expand.grid(rep(list(x), length(x))),
                1,
                function(v) {
                  paste0(sort(unique(v)), collapse = "|")
                }
              )
            )
          }
        )
      )
    ), names(df)
  )
)

得到的结果如下:

   ID              Prod
1   1             Prod1
2   1       Prod1|Prod2
3   1       Prod1|Prod3
4   1 Prod1|Prod2|Prod3
5   1             Prod2
6   1       Prod2|Prod3
7   1             Prod3
8   2             Prod1
9   2       Prod1|Prod4
10  2             Prod4
11  3             Prod3
12  3       Prod3|Prod5
13  3             Prod5
14  4             Prod2
英文:

Here is another base R option using intToBits to map all combinations into binary presentation of integer indexing

with(
  df,
  setNames(
    rev(
      stack(
        by(
          Prod, ID,
          function(p) {
            sapply(
              seq(2^length(p) - 1),
              function(k) paste0(p[which(intToBits(k) &gt; 0)], collapse = &quot;|&quot;)
            )
          }
        )
      )
    ), names(df)
  )
)

which gives

   ID              Prod
1   1             Prod1
2   1             Prod2
3   1       Prod1|Prod2
4   1             Prod3
5   1       Prod1|Prod3
6   1       Prod2|Prod3
7   1 Prod1|Prod2|Prod3
8   2             Prod1
9   2             Prod4
10  2       Prod1|Prod4
11  3             Prod3
12  3             Prod5
13  3       Prod3|Prod5
14  4             Prod2

If you want to EXPLORE THE POSSIBILITY OF USING expand.grid (but NOT recommend it since it is rather inefficient), you can try the code below

with(
  df,
  setNames(
    rev(
      stack(
        lapply(
          split(Prod, ID),
          function(x) {
            unique(
              apply(
                expand.grid(rep(list(x), length(x))),
                1,
                function(v) {
                  paste0(sort(unique(v)), collapse = &quot;|&quot;)
                }
              )
            )
          }
        )
      )
    ), names(df)
  )
)

which gives

   ID              Prod
1   1             Prod1
2   1       Prod1|Prod2
3   1       Prod1|Prod3
4   1 Prod1|Prod2|Prod3
5   1             Prod2
6   1       Prod2|Prod3
7   1             Prod3
8   2             Prod1
9   2       Prod1|Prod4
10  2             Prod4
11  3             Prod3
12  3       Prod3|Prod5
13  3             Prod5
14  4             Prod2

huangapple
  • 本文由 发表于 2023年4月4日 17:34:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/75927775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定