2023年4月4日 17:34:09go评论144阅读模式

英文:

Get all level combinations for each group

问题

我有一个客户ID的列表，每个客户ID都有一组唯一的产品。理论上，每个客户ID最多可能有 ~150 种唯一的产品。

df <- tibble(ID = c(1,1,1,2,2,3,3,4),
             prod = c("Prod1", "Prod2", "Prod3", "Prod1", "Prod4", "Prod3", "Prod5", "Prod2"))

从中，我需要为每个客户ID获取所有可能的产品组合，不仅仅是在最高级别（按ID分组）。也就是说，需要包括所有产品的组合，就像 expand_grid() 会做的那样，同时还要包括所有 1,...,n 个元素的组合，其中 n 是该ID拥有的唯一产品的数量。

因此，最终数据集应如下所示：

df_results <- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
                     combo = c("Prod1", "Prod2", "Prod3", "Prod1|Prod2", "Prod1|Prod3", "Prod2|Prod3", "Prod1|Prod2|Prod3",
                               "Prod1", "Prod4", "Prod1|Prod4",
                               "Prod3", "Prod5", "Prod3|Prod5",
                               "Prod2"))

英文:

I have a list of customer IDs, each with a list of unique products they used. There can theoretically be up to ~150 unique products.

df &lt;- tibble(ID = c(1,1,1,2,2,3,3,4),
             prod = c(&quot;Prod1&quot;, &quot;Prod2&quot;, &quot;Prod3&quot;, &quot;Prod1&quot;, &quot;Prod4&quot;, &quot;Prod3&quot;, &quot;Prod5&quot;, &quot;Prod2&quot;))

From that, I need to get all possible combinations of products for each ID, not only on the highest level (grouped by ID). That is, include the combination with all products, as expand_grid() would do, but also all combinations of 1,...,n elements, where n is the number of unique products the ID has.

Final dataset should therefore look as such:

df_results &lt;- tibble(ID = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4),
                     combo = c(&quot;Prod1&quot;, &quot;Prod2&quot;, &quot;Prod3&quot;, &quot;Prod1|Prod2&quot;, &quot;Prod1|Prod3&quot;, &quot;Prod2|Prod3&quot;, &quot;Prod1|Prod2|Prod3&quot;,
                               &quot;Prod1&quot;, &quot;Prod4&quot;, &quot;Prod1|Prod4&quot;,
                               &quot;Prod3&quot;, &quot;Prod5&quot;, &quot;Prod3|Prod5&quot;,
                               &quot;Prod2&quot;))

答案1

得分: 6

以下是代码部分的翻译：

library(dplyr)
df %>% 
  group_by(ID) %>% 
  reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = "|")))))

# A tibble: 14 × 2
      ID combo
   <dbl> <chr>
 1     1 Prod1
 2     1 Prod2
 3     1 Prod3
 4     1 Prod1|Prod2
 5     1 Prod1|Prod3
 6     1 Prod2|Prod3
 7     1 Prod1|Prod2|Prod3
 8     2 Prod1
 9     2 Prod4
10     2 Prod1|Prod4
11     3 Prod3
12     3 Prod5
13     3 Prod3|Prod5
14     4 Prod2

stack(tapply(df$prod, df$ID, 
       \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = "|")))))[2:1]

英文:

An extension of the canonical answer:

library(dplyr)
df %&gt;% 
  group_by(ID) %&gt;% 
  reframe(combo = as.character(do.call(c, lapply(seq_along(prod), \(m) combn(x = prod, m = m, FUN = \(x) paste(x, collapse = &quot;|&quot;))))))

# A tibble: 14 &#215; 2
      ID combo            
   &lt;dbl&gt; &lt;chr&gt;            
 1     1 Prod1            
 2     1 Prod2            
 3     1 Prod3            
 4     1 Prod1|Prod2      
 5     1 Prod1|Prod3      
 6     1 Prod2|Prod3      
 7     1 Prod1|Prod2|Prod3
 8     2 Prod1            
 9     2 Prod4            
10     2 Prod1|Prod4      
11     3 Prod3            
12     3 Prod5            
13     3 Prod3|Prod5      
14     4 Prod2

Or in base R:

stack(tapply(df$prod, df$ID, 
       \(prod) do.call(c, lapply(seq_along(prod), \(m) combn(prod, m, FUN = \(x) paste(x, collapse = &quot;|&quot;))))))[2:1]

答案2

得分: 2

另一个 tidyverse 的选项可能是：

df %>%
 group_by(ID) %>%
 transmute(combo = map2(.x = list(prod), 
                        .y = seq_along(prod),
                        ~ combn(.x, .y, FUN = paste, collapse = "|"))) %>%
 unnest_longer(combo)

  ID combo



<details>
<summary>英文:</summary>

Another `tidyverse` option could be:

    df %&gt;%
     group_by(ID) %&gt;%
     transmute(combo = map2(.x = list(prod), 
                            .y = seq_along(prod),
                            ~ combn(.x, .y, FUN = paste, collapse = &quot;|&quot;))) %&gt;%
     unnest_longer(combo)

          ID combo            
       &lt;dbl&gt; &lt;chr&gt;            
     1     1 Prod1            
     2     1 Prod2            
     3     1 Prod3            
     4     1 Prod1|Prod2      
     5     1 Prod1|Prod3      
     6     1 Prod2|Prod3      
     7     1 Prod1|Prod2|Prod3
     8     2 Prod1            
     9     2 Prod4            
    10     2 Prod1|Prod4      
    11     3 Prod3            
    12     3 Prod5            
    13     3 Prod3|Prod5      
    14     4 Prod2  

</details>



# 答案3
**得分**: 1

这里是另一个使用基本的R选项，使用`intToBits`将所有组合映射为整数索引的二进制表示。

```R
with(
  df,
  setNames(
    rev(
      stack(
        by(
          Prod, ID,
          function(p) {
            sapply(
              seq(2^length(p) - 1),
              function(k) paste0(p[which(intToBits(k) > 0)], collapse = "|")
            )
          }
        )
      )
    ), names(df)
  )
)

得到的结果如下：

   ID              Prod
1   1             Prod1
2   1             Prod2
3   1       Prod1|Prod2
4   1             Prod3
5   1       Prod1|Prod3
6   1       Prod2|Prod3
7   1 Prod1|Prod2|Prod3
8   2             Prod1
9   2             Prod4
10  2       Prod1|Prod4
11  3             Prod3
12  3             Prod5
13  3       Prod3|Prod5
14  4             Prod2

如果您想探索使用expand.grid的可能性（但不建议，因为它相当低效），您可以尝试以下代码：

with(
  df,
  setNames(
    rev(
      stack(
        lapply(
          split(Prod, ID),
          function(x) {
            unique(
              apply(
                expand.grid(rep(list(x), length(x))),
                1,
                function(v) {
                  paste0(sort(unique(v)), collapse = "|")
                }
              )
            )
          }
        )
      )
    ), names(df)
  )
)

得到的结果如下：

   ID              Prod
1   1             Prod1
2   1       Prod1|Prod2
3   1       Prod1|Prod3
4   1 Prod1|Prod2|Prod3
5   1             Prod2
6   1       Prod2|Prod3
7   1             Prod3
8   2             Prod1
9   2       Prod1|Prod4
10  2             Prod4
11  3             Prod3
12  3       Prod3|Prod5
13  3             Prod5
14  4             Prod2

英文:

Here is another base R option using intToBits to map all combinations into binary presentation of integer indexing

with(
  df,
  setNames(
    rev(
      stack(
        by(
          Prod, ID,
          function(p) {
            sapply(
              seq(2^length(p) - 1),
              function(k) paste0(p[which(intToBits(k) &gt; 0)], collapse = &quot;|&quot;)
            )
          }
        )
      )
    ), names(df)
  )
)

which gives

   ID              Prod
1   1             Prod1
2   1             Prod2
3   1       Prod1|Prod2
4   1             Prod3
5   1       Prod1|Prod3
6   1       Prod2|Prod3
7   1 Prod1|Prod2|Prod3
8   2             Prod1
9   2             Prod4
10  2       Prod1|Prod4
11  3             Prod3
12  3             Prod5
13  3       Prod3|Prod5
14  4             Prod2

If you want to EXPLORE THE POSSIBILITY OF USING expand.grid (but NOT recommend it since it is rather inefficient), you can try the code below

with(
  df,
  setNames(
    rev(
      stack(
        lapply(
          split(Prod, ID),
          function(x) {
            unique(
              apply(
                expand.grid(rep(list(x), length(x))),
                1,
                function(v) {
                  paste0(sort(unique(v)), collapse = &quot;|&quot;)
                }
              )
            )
          }
        )
      )
    ), names(df)
  )
)

which gives

   ID              Prod
1   1             Prod1
2   1       Prod1|Prod2
3   1       Prod1|Prod3
4   1 Prod1|Prod2|Prod3
5   1             Prod2
6   1       Prod2|Prod3
7   1             Prod3
8   2             Prod1
9   2       Prod1|Prod4
10  2             Prod4
11  3             Prod3
12  3       Prod3|Prod5
13  3             Prod5
14  4             Prod2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

获取每个组的所有级别组合。

问题

答案1

答案2

How can I remove rows of a dataframe that contain two specific characters?

基于条件筛选行在 R 中

提取字符串中的前导数字，但长度会变化。

如何比较一组向量以查找它们是否包含共同元素？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论