如何从列表中创建一个触点表格

huangapple go评论70阅读模式
英文:

How to Create a Touchpoint Table from a list

问题

我正在Azure Databricks环境中处理一个SQL查询,该查询涉及以下数据集:

CREATE OR REPLACE TABLE touchpoints_table
(
  List  STRING,
  Path_Lenght INT
);

INSERT INTO touchpoints_table VALUES
  ('BBB, AAA, CCC', 3),  
  ('BBB', 1),
  ('DDD, AAA', 2), 
  ('DDD, BBB, AAA, EEE, CCC', 5), 
  ('EEE, AAA, EEE, CCC', 4);
  
SELECT * FROM touchpoints_table

任务是生成以下表格:

|     |   Content       |   Unique    |  Started  |  Middleway  |  Finished  |
|  0  |     AAA         |     0       |     0     |      3      |      1     | 
|  1  |     BBB         |     0       |     1     |      1      |      0     |
|  2  |     CCC         |     1       |     0     |      0      |      3     |
|  3  |     DDD         |     0       |     2     |      0      |      0     |
|  4  |     EEE         |     0       |     1     |      2      |      0     |

其中各列包含以下内容:

  • Content:在List中找到的元素
  • Unique:元素单独出现的次数
  • Started:元素在开头出现的次数
  • Finished:元素在末尾出现的次数
  • Middleway:元素在开头和末尾之间出现的次数。

使用以下查询,我几乎可以得到结果,但是分组似乎没有正确工作:

WITH tb1 AS(
  SELECT 
    CAST(touch_array AS STRING) AS touch_list,
    EXPLODE(touch_array) AS explode_list,
    ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
    COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
  FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table) 
  )
  SELECT
     explode_list AS Content,
     SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
     SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
     SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
     SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished  
  FROM tb1 
  GROUP BY explode_list
  ORDER BY explode_list    

我可以通过提供以下代码来帮助你解决此任务:

英文:

I am working on a SQL query in the Azure Databricks environment that has the following dataset:

CREATE OR REPLACE TABLE touchpoints_table
(
  List  STRING,
  Path_Lenght INT
);

INSERT INTO touchpoints_table VALUES
  (&#39;BBB, AAA, CCC&#39;, 3),  
  (&#39;BBB&#39;, 1),
  (&#39;DDD, AAA&#39;, 2), 
  (&#39;DDD, BBB, AAA, EEE, CCC&#39;, 5), 
  (&#39;EEE, AAA, EEE, CCC&#39;, 4);
  
SELECT * FROM touchpoints_table
|     |             List           |   Path_length   | 
|  0  | BBB, AAA, CCC              |       3         |  
|  1  | CCC                        |       1         |   
|  2  | DDD, AAA                   |       2         |   
|  3  | DDD, BBB, AAA, EEE, CCC    |       5         |  
|  4  | EEE, AAA, EEE, CCC         |       4         |

and the task consists of generating the following table:

|     |   Content       |   Unique    |  Started  |  Middleway  |  Finished  |
|  0  |     AAA         |     0       |     0     |      3      |      1     | 
|  1  |     BBB         |     0       |     1     |      1      |      0     |
|  2  |     CCC         |     1       |     0     |      0      |      3     |
|  3  |     DDD         |     0       |     2     |      0      |      0     |
|  4  |     EEE         |     0       |     1     |      2      |      0     |

where the columns contain the following:

  • Content: the elements found in the List
  • Unique: the number of times that the element appears alone in the list
  • Started: the number of times that the element appears at the beginning
  • Finished: the number of times that the element appears at the end
  • Middleway: the number of times the element appears between the beginning and the end.

Using the following query I almost get the result but somehow the group by does not worked correctly

WITH tb1 AS(
  SELECT 
    CAST(touch_array AS STRING) AS touch_list,
    EXPLODE(touch_array) AS explode_list,
    ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
    COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
  FROM (SELECT SPLIT(List, &#39;,&#39;) AS touch_array FROM touchpoints_table) 
  )
  SELECT
     explode_list AS Content,
     SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
     SUM(CASE WHEN touch_count=1 AND touch_lenght &gt; 1 THEN 1 ELSE 0 END) AS Started,
     SUM(CASE WHEN touch_count&gt;1 AND touch_count &lt; touch_lenght THEN 1 ELSE 0 END) AS Middleway,
     SUM(CASE WHEN touch_count&gt;1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished  
  FROM tb1 
  GROUP BY explode_list
  ORDER BY explode_list    
|     |   Content       |   Unique    |  Started  |  Middleway  |  Finished  |
|  0  |     AAA         |     0       |     0     |      3      |      1     | 
|  1  |     BBB         |     0       |     0     |      1      |      0     |
|  2  |     CCC         |     0       |     0     |      0      |      3     |
|  3  |     EEE         |     0       |     0     |      2      |      0     |
|  4  |     BBB         |     1       |     1     |      0      |      0     |
|  5  |     DDD         |     0       |     2     |      0      |      0     |
|  6  |     EEE         |     0       |     1     |      0      |      0     |

Could you help me by suggesting a code that solves this task?

答案1

得分: 1

 SQL Server 中的查询示例:

```sql
with allElements as(
  select list ,el,elN,elQty
  from touchpoints_table tp 
  cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN 
                ,count(*)over() elQty
                from string_split(tp.list,',')
              ) t
)
select el
   ,sum(case when elQty=1 then 1 else 0 end) as 'unique'
   ,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
   ,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
   ,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el

演示链接


<details>
<summary>英文:</summary>

Query example for SQL Server

with allElements as(
select list ,el,elN,elQty
from touchpoints_table tp
cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN
,count(*)over() elQty
from string_split(tp.list,',')
) t
)
select el
,sum(case when elQty=1 then 1 else 0 end) as 'unique'
,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el

[Demo](https://dbfiddle.uk/jNgQ__F6)


</details>



huangapple
  • 本文由 发表于 2023年1月9日 06:04:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75051599.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定