在Azure Databricks中的数据连接问题

huangapple go评论68阅读模式
英文:

Data concatenation issue In Azure Databricks

问题

以下是翻译好的部分:

Source Data

在Azure Databricks中的数据连接问题

期望结果

在Azure Databricks中的数据连接问题

解释:
场景1:如果对于一个ID,我有3条记录,其中包括2个名称和1条储蓄记录,我们期望在“期望结果”列中如下所示

在Azure Databricks中的数据连接问题

场景2:
场景1:如果对于一个ID,我有2条记录,其中包括2个名称和没有储蓄记录,我们期望在“期望结果”列中如下所示

在Azure Databricks中的数据连接问题

场景3:
如果对于一个ID,我有2条记录,其中包括1个名称和1条储蓄记录,我们期望在“期望结果”列中如下所示

在Azure Databricks中的数据连接问题

场景4:
如果对于一个ID,我只有1条记录,那就是只有一个名称,我们期望在“期望结果”列中如下所示

在Azure Databricks中的数据连接问题

场景5:
如果对于一个ID,我只有1条记录,要么只有储蓄,要么为空/空白,我们期望在“期望结果”列中如下所示

在Azure Databricks中的数据连接问题

英文:

Source Data

在Azure Databricks中的数据连接问题

Expected Results

在Azure Databricks中的数据连接问题

explanation:
scenario 1: if i have 3 records for one ID along with 2 names and one savings records, we are expecting into 'Expected Results' column like below

在Azure Databricks中的数据连接问题

Scenario 2:
scenario 1: if i have 2 records for one ID along with 2 names and no savings records, we are expecting into 'Expected Results' column like below

在Azure Databricks中的数据连接问题

scenario 3:
if i have 2 records for one ID along with 1 names and one savings records, we are expecting into 'Expected Results' column like below

在Azure Databricks中的数据连接问题

scenario 4:
if i have one records for one ID, that is only one name , we are expecting into 'Expected Results' column like below

在Azure Databricks中的数据连接问题

scenario 5:
if i have one records for one ID, that is only savings or null/empty , we are expecting into 'Expected Results' column like below

在Azure Databricks中的数据连接问题

答案1

得分: 1

以下是翻译好的代码部分:

SELECT a.*, 
       CASE 
         WHEN b.Expected_results LIKE '%&%' THEN b.Expected_results
         WHEN b.Expected_results IS NULL THEN 'Not Required' 
         ELSE CONCAT(b.Expected_results, ' only') 
       END AS Expected_results
FROM Input_Table a 
LEFT JOIN (
  SELECT id, ARRAY_JOIN(COLLECT_SET(Name), ' & ') Expected_results
  FROM Input_Table
  WHERE Name <> 'Savings'
  GROUP BY id
) b ON a.id = b.id
ORDER BY id;

Output:

ID Name Expected_results
1 Savings Praveen & Anil
1 Praveen Praveen & Anil
1 Anil Praveen & Anil
2 Kumar Kumar & Ravi
2 Ravi Kumar & Ravi
3 Santhi Santhi only
4 Priya Priya only
5 Savings Not Required
6 Nandu Nandu only
6 Savings Nandu only
7 Balu Balu only
英文:

Databricks SQL:

SELECT a.*, 
       CASE 
         WHEN b.Expected_results LIKE &#39;%&amp;%&#39; THEN b.Expected_results
         WHEN b.Expected_results IS NULL THEN &#39;Not Required&#39; 
         ELSE CONCAT(b.Expected_results, &#39; only&#39;) 
       END AS Expected_results
FROM Input_Table a 
LEFT JOIN (
  SELECT id, ARRAY_JOIN(COLLECT_SET(Name), &#39; &amp; &#39;) Expected_results
  FROM Input_Table
  WHERE Name &lt;&gt; &#39;Savings&#39;
  GROUP BY id
) b ON a.id = b.id
ORDER BY id;

The inner query groups the records by ID and concatenates the names of individuals with the same ID, separated by a comma. It uses the COLLECT_SET function to collect the names into an array, and the ARRAY_JOIN function to concatenate the array elements into a string. It does not include the records where the name is "Savings" to collect set and array_join function.

The outer query joins the original table with the concatenated names(inner query), and uses a CASE statement to handle the different scenarios you mentioned. It uses a LEFT JOIN to join the original table with the subquery on the ID column. This ensures that all records from the original table are included in the result, even if there is no matching record in the subquery.

The CASE statement checks the concatenated names in the subquery and handles the different scenarios you mentioned. If the concatenated names contain a comma, it returns the concatenated names as is. If there is no concatenated name (i.e., the subquery returns NULL), it returns "Not Required". Otherwise, it concatenates the concatenated names with the string " only".

Output:

ID Name Expected_results
1 Savings Praveen & Anil
1 Praveen Praveen & Anil
1 Anil Praveen & Anil
2 Kumar Kumar & Ravi
2 Ravi Kumar & Ravi
3 Santhi Santhi only
4 Priya Priya only
5 Savings Not Required
6 Nandu Nandu only
6 Savings Nandu only
7 Balu Balu only

huangapple
  • 本文由 发表于 2023年6月5日 18:37:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76405595.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定