英文:
Data concatenation issue In Azure Databricks
问题
以下是翻译好的部分:
Source Data
期望结果
解释:
场景1:如果对于一个ID,我有3条记录,其中包括2个名称和1条储蓄记录,我们期望在“期望结果”列中如下所示
场景2:
场景1:如果对于一个ID,我有2条记录,其中包括2个名称和没有储蓄记录,我们期望在“期望结果”列中如下所示
场景3:
如果对于一个ID,我有2条记录,其中包括1个名称和1条储蓄记录,我们期望在“期望结果”列中如下所示
场景4:
如果对于一个ID,我只有1条记录,那就是只有一个名称,我们期望在“期望结果”列中如下所示
场景5:
如果对于一个ID,我只有1条记录,要么只有储蓄,要么为空/空白,我们期望在“期望结果”列中如下所示
英文:
Source Data
Expected Results
explanation:
scenario 1: if i have 3 records for one ID along with 2 names and one savings records, we are expecting into 'Expected Results' column like below
Scenario 2:
scenario 1: if i have 2 records for one ID along with 2 names and no savings records, we are expecting into 'Expected Results' column like below
scenario 3:
if i have 2 records for one ID along with 1 names and one savings records, we are expecting into 'Expected Results' column like below
scenario 4:
if i have one records for one ID, that is only one name , we are expecting into 'Expected Results' column like below
scenario 5:
if i have one records for one ID, that is only savings or null/empty , we are expecting into 'Expected Results' column like below
答案1
得分: 1
以下是翻译好的代码部分:
SELECT a.*,
CASE
WHEN b.Expected_results LIKE '%&%' THEN b.Expected_results
WHEN b.Expected_results IS NULL THEN 'Not Required'
ELSE CONCAT(b.Expected_results, ' only')
END AS Expected_results
FROM Input_Table a
LEFT JOIN (
SELECT id, ARRAY_JOIN(COLLECT_SET(Name), ' & ') Expected_results
FROM Input_Table
WHERE Name <> 'Savings'
GROUP BY id
) b ON a.id = b.id
ORDER BY id;
Output:
ID | Name | Expected_results |
---|---|---|
1 | Savings | Praveen & Anil |
1 | Praveen | Praveen & Anil |
1 | Anil | Praveen & Anil |
2 | Kumar | Kumar & Ravi |
2 | Ravi | Kumar & Ravi |
3 | Santhi | Santhi only |
4 | Priya | Priya only |
5 | Savings | Not Required |
6 | Nandu | Nandu only |
6 | Savings | Nandu only |
7 | Balu | Balu only |
英文:
Databricks SQL:
SELECT a.*,
CASE
WHEN b.Expected_results LIKE '%&%' THEN b.Expected_results
WHEN b.Expected_results IS NULL THEN 'Not Required'
ELSE CONCAT(b.Expected_results, ' only')
END AS Expected_results
FROM Input_Table a
LEFT JOIN (
SELECT id, ARRAY_JOIN(COLLECT_SET(Name), ' & ') Expected_results
FROM Input_Table
WHERE Name <> 'Savings'
GROUP BY id
) b ON a.id = b.id
ORDER BY id;
The inner query groups the records by ID and concatenates the names of individuals with the same ID, separated by a comma. It uses the COLLECT_SET
function to collect the names into an array, and the ARRAY_JOIN
function to concatenate the array elements into a string. It does not include the records where the name is "Savings" to collect set and array_join function.
The outer query joins the original table with the concatenated names(inner query), and uses a CASE
statement to handle the different scenarios you mentioned. It uses a LEFT JOIN
to join the original table with the subquery on the ID column. This ensures that all records from the original table are included in the result, even if there is no matching record in the subquery.
The CASE
statement checks the concatenated names in the subquery and handles the different scenarios you mentioned. If the concatenated names contain a comma, it returns the concatenated names as is. If there is no concatenated name (i.e., the subquery returns NULL), it returns "Not Required". Otherwise, it concatenates the concatenated names with the string " only".
Output:
ID | Name | Expected_results |
---|---|---|
1 | Savings | Praveen & Anil |
1 | Praveen | Praveen & Anil |
1 | Anil | Praveen & Anil |
2 | Kumar | Kumar & Ravi |
2 | Ravi | Kumar & Ravi |
3 | Santhi | Santhi only |
4 | Priya | Priya only |
5 | Savings | Not Required |
6 | Nandu | Nandu only |
6 | Savings | Nandu only |
7 | Balu | Balu only |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论