如何在SQL中创建客户拥有的产品组合。

huangapple go评论138阅读模式
英文:

How to make the combination of products that a client has in SQL

问题

Client Product
1 A
1 B
1 C
2 A
2 B
2 D
3 A
3 B

如何在AWS Athena中编写查询以获取这些结果? Athena不支持存储过程。

在这里,我们有以下组合:

AB:3个案例(客户1、2、3)
AC:1个案例(客户1)
AD:1个案例(客户2)
BC:1个案例(客户1)
BD:1个案例(客户2)
CD:0个案例
ABC:1个案例(客户1)
ABD:1个案例(客户2)

我有一个查询可以获取AB的组合,但要获取其他组合,我需要更改过滤器(WHERE)并运行N次。 如果有数百种组合,这是不可行的。

英文:

I have a table thar contains many row with clients and produts that they have.
It is like that:

Client Product
1 A
1 B
1 C
2 A
2 B
2 D
3 A
3 B

How can a write a query to get that resultrs in AWS Athena?
The Athena doesn't support procedure.

Here, we have the following combinations:

AB:  3 cases(clients 1, 2, 3)  
AC:  1 case (client 1)  
AD:  1 case (client 2)  
BC:  1 case (client 1)  
BD:  1 case(client 2)
CD:  0 cases 
ABC: 1 cases (client 1)
ABD: 1 case (client 2)

I have a query that get the combination of AB but to get the other I need to change de filters (WHERE) and run N times. If I have hundred of combinations it is unfeasible to do.

答案1

得分: 1

你可以尝试使用 combinations 函数,如果在 Athena 中可用的话(它在 Athena 基于的 Trino 中是存在的):

WITH dataset (Client, Product) AS (
    values (1, 'A'),
    (1, 'B'),
    (1, 'C'),
    (2, 'A'),
    (2, 'B'),
    (2, 'D'),
    (3, 'A'),
    (3, 'B')
)

select p.comb, count(Client) count
from (select comb
      from (select array_distinct(array_agg(Product)) arr
            from dataset) p,
           unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) p -- 构建所有产品组合
left join (select client, comb
           from (select client, array_distinct(array_agg(Product)) as arr
                 from dataset
                 group by client),
                unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) cp -- 为具体客户构建所有产品组合 
          on cp.comb = p.comb
group by p.comb
order by cardinality(p.comb), p.comb;

输出结果:

comb count
[A, B] 3
[A, C] 1
[A, D] 1
[B, C] 1
[B, D] 1
[C, D] 0
[A, B, C] 1
[A, B, D] 1
[A, C, D] 0
[B, C, D] 0

请注意,combinations 函数在输出中限制为最多 100000 条记录,因此如果实际的产品组合数量更高,这个方法可能不起作用(无论如何,在相对较大数量的产品上,这都会是一个相当昂贵的查询)。

英文:

You can try using combinations function if it is available in Athena (it is present in Trino on which Athena is based upon):

WITH dataset (Client, Product) AS (
    values (1,	'A'),
    (1,	'B'),
    (1,	'C'),
    (2,	'A'),
    (2,	'B'),
    (2,	'D'),
    (3,	'A'),
    (3,	'B')
)


select p.comb, count(Client) count
from (select comb
      from (select array_distinct(array_agg(Product)) arr
            from dataset) p,
           unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) p -- build all product combinations
left join (select client, comb
           from (select client, array_distinct(array_agg(Product)) as arr
                 from dataset
                 group by client),
                unnest(combinations(arr, 2) || combinations(arr, 3)) as t(comb)) cp -- build all product combinations for concrete client 
          on cp.comb = p.comb
group by p.comb
order by cardinality(p.comb), p.comb;

Output:

comb count
[A, B] 3
[A, C] 1
[A, D] 1
[B, C] 1
[B, D] 1
[C, D] 0
[A, B, C] 1
[A, B, D] 1
[A, C, D] 0
[B, C, D] 0

Note that combinations is limited to only 100000 records in output, so if actual number of product combination is higher this will not work (also either way this would be quite expensive query on any relatively big number of products).

答案2

得分: 0

根据您提供的信息,我理解您想在Amazon Athena中计算每个客户的所有可能的产品组合。这是一个组合问题,可以通过使用自连接来解决。但是,对于大量唯一产品,这种解决方案在计算上会变得非常昂贵,因为自连接的数量会随着要组合的产品数量呈指数增长。

对于2个产品:

WITH combinations_2 AS (
  SELECT a.client AS client, 
         a.product AS product1, 
         b.product AS product2
  FROM your_table_name a
  JOIN your_table_name b
  ON a.client = b.client
  WHERE a.product < b.product
)
SELECT product1, product2, COUNT(DISTINCT client) AS num_clients
FROM combinations_2
GROUP BY product1, product2;

对于3个产品:

WITH combinations_3 AS (
  SELECT a.client AS client, 
         a.product AS product1, 
         b.product AS product2,
         c.product AS product3
  FROM your_table_name a
  JOIN your_table_name b
  ON a.client = b.client
  JOIN your_table_name c
  ON a.client = c.client
  WHERE a.product < b.product 
  AND b.product < c.product
)
SELECT product1, product2, product3, COUNT(DISTINCT client) AS num_clients
FROM combinations_3
GROUP BY product1, product2, product3;

由于AWS Athena不支持存储过程或用户定义函数来处理递归或循环,因此使用纯SQL表达任意大小的组合非常具有挑战性。没有一种简单有效的方法来生成未知数量产品的所有组合。

英文:

Based on what you provided, I understood that you want compute all possible combinations of products per client in Amazon Athena. This is a combinatorial problem and can be approached using self-joins. However, this kind of solutions becomes computationally expensive for large numbers of unique products, as the number of self-joins would grow exponentially with the number of products you want to combine.

For 2 products :

WITH combinations_2 AS (
  SELECT a.client AS client, 
         a.product AS product1, 
         b.product AS product2
  FROM your_table_name a
  JOIN your_table_name b
  ON a.client = b.client
  WHERE a.product &lt; b.product
)
SELECT product1, product2, COUNT(DISTINCT client) AS num_clients
FROM combinations_2
GROUP BY product1, product2;

And for 3 products:

WITH combinations_3 AS (
  SELECT a.client AS client, 
         a.product AS product1, 
         b.product AS product2,
         c.product AS product3
  FROM your_table_name a
  JOIN your_table_name b
  ON a.client = b.client
  JOIN your_table_name c
  ON a.client = c.client
  WHERE a.product &lt; b.product 
  AND b.product &lt; c.product
)
SELECT product1, product2, product3, COUNT(DISTINCT client) AS num_clients
FROM combinations_3
GROUP BY product1, product2, product3;

AWS Athena which don't have the support for stored procedures or user-defined functions to handle recursion or loops, expressing combinations of an arbitrary size is quite challenging. There's no simple, efficient way to generate all combinations of an unknown number of products using pure SQL.

答案3

得分: 0

我不太了解Athena,但我想提供一个使用STRING_AGG和CTE的示例,以阐述我可能会如何处理这个问题,以防有所帮助。

WITH prod_mix AS (
  SELECT client,
  STRING_AGG(product,' ORDER BY product') as pmix
 FROM client_product
 GROUP BY client
  ),
combined AS (
  SELECT A.*, B.pmix
  FROM client_product A
  INNER JOIN prod_mix B
  ON A.client = B.client
  ) , 
 client_mix AS (
    SELECT pmix,
    STRING_AGG(DISTINCT client,' ORDER BY client') AS cmix
   FROM combined
   GROUP BY pmix
 )
 SELECT * FROM client_mix;
英文:

I don't know much about Athena, but I though I'd throw an example of how I might approach that using STRING_AGG and CTE's, just in case it helps.

WITH prod_mix AS (
  SELECT client,
  STRING_AGG(product,&#39;&#39; ORDER BY product) as pmix
 FROM client_product
 GROUP BY client
  ),
combined AS (
  SELECT A.*, B.pmix
  FROM client_product A
  INNER JOIN prod_mix B
  ON A.client = B.client
  ) , 
 client_mix AS (
    SELECT pmix,
    STRING_AGG(DISTINCT client,&#39;&#39; ORDER BY client) AS cmix
   FROM combined
   GROUP BY pmix
 )
 SELECT * FROM client_mix;

huangapple
  • 本文由 发表于 2023年8月10日 21:44:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76876322.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定