英文:
How to Create a Touchpoint Table from a list
问题
我正在Azure Databricks环境中处理一个SQL查询,该查询涉及以下数据集:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
任务是生成以下表格:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
其中各列包含以下内容:
- Content:在List中找到的元素
- Unique:元素单独出现的次数
- Started:元素在开头出现的次数
- Finished:元素在末尾出现的次数
- Middleway:元素在开头和末尾之间出现的次数。
使用以下查询,我几乎可以得到结果,但是分组似乎没有正确工作:
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
我可以通过提供以下代码来帮助你解决此任务:
英文:
I am working on a SQL query in the Azure Databricks environment that has the following dataset:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
and the task consists of generating the following table:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
where the columns contain the following:
- Content: the elements found in the List
- Unique: the number of times that the element appears alone in the list
- Started: the number of times that the element appears at the beginning
- Finished: the number of times that the element appears at the end
- Middleway: the number of times the element appears between the beginning and the end.
Using the following query I almost get the result but somehow the group by does not worked correctly
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
Could you help me by suggesting a code that solves this task?
答案1
得分: 1
在 SQL Server 中的查询示例:
```sql
with allElements as(
select list ,el,elN,elQty
from touchpoints_table tp
cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN
,count(*)over() elQty
from string_split(tp.list,',')
) t
)
select el
,sum(case when elQty=1 then 1 else 0 end) as 'unique'
,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el
<details>
<summary>英文:</summary>
Query example for SQL Server
with allElements as(
select list ,el,elN,elQty
from touchpoints_table tp
cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN
,count(*)over() elQty
from string_split(tp.list,',')
) t
)
select el
,sum(case when elQty=1 then 1 else 0 end) as 'unique'
,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el
[Demo](https://dbfiddle.uk/jNgQ__F6)
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论