英文:
How to Join tables flat with fixed params with Spark SQL
问题
我有一个在Spark中的数据集,其中有一些列。我想要通过保留一些列作为参考(id、date、name)来扁平地连接这些数据集。
数据集1:(d1)
id | date | name | val1 |
---|---|---|---|
1 | 15-06-2022 | n1 | 10 |
2 | 15-06-2022 | n1 | 12 |
3 | 15-06-2022 | n1 | 24 |
4 | 16-09-2021 | n2 | 27 |
数据集2:(d2)
id | date | name | val2 |
---|---|---|---|
1 | 15-06-2022 | n1 | 15 |
1 | 12-06-2021 | n1 | 123 |
2 | 15-06-2022 | n2 | 56 |
15 | 21-03-2022 | n1 | 43 |
4 | 16-09-2021 | n2 | 65 |
输出
id | date | name | val1 | val2 |
---|---|---|---|---|
1 | 15-06-2022 | n1 | 10 | 15 |
1 | 12-06-2021 | n1 | 0 | 123 |
2 | 15-06-2022 | n1 | 12 | 0 |
2 | 15-06-2022 | n2 | 0 | 56 |
3 | 15-06-2022 | n1 | 24 | 0 |
4 | 16-09-2021 | n2 | 27 | 65 |
15 | 21-03-2022 | n1 | 0 | 43 |
我将使id、date、name列保持不变,如果这三列都相同,那么我将进行扁平连接,否则在空值的情况下创建新行,并将值设置为0。
请帮助我使用Spark中的适当SQL或数据集逻辑来获得所需的输出。
英文:
I have datesets in spark with certain columns. I want to join those dataset flat by keeping some columns as reference(id, date, name).
DATASET 1: (d1)
id | date | name | val1 |
---|---|---|---|
1 | 15-06-2022 | n1 | 10 |
2 | 15-06-2022 | n1 | 12 |
3 | 15-06-2022 | n1 | 24 |
4 | 16-09-2021 | n2 | 27 |
DATASET 1: (d2)
id | date | name | val2 |
---|---|---|---|
1 | 15-06-2022 | n1 | 15 |
1 | 12-06-2021 | n1 | 123 |
2 | 15-06-2022 | n2 | 56 |
15 | 21-03-2022 | n1 | 43 |
4 | 16-09-2021 | n2 | 65 |
Output
id | date | name | val1 | val2 |
---|---|---|---|---|
1 | 15-06-2022 | n1 | 10 | 15 |
1 | 12-06-2021 | n1 | 0 | 123 |
2 | 15-06-2022 | n1 | 12 | 0 |
2 | 15-06-2022 | n2 | 0 | 56 |
3 | 15-06-2022 | n1 | 24 | 0 |
4 | 16-09-2021 | n2 | 27 | 65 |
15 | 21-03-2022 | n1 | 0 | 43 |
I will make the id, data, name columns fixed, If all three are same then I will do flat join, else create new row with the values being 0 incase of null
Please help me with appropriate sql or dataset logic in spark to get the desired output.
答案1
得分: 1
使用基于Id,date,name
的full outer join
来将两个数据集的所有数据合并,如果有匹配的记录,则根据Id、date和name进行连接。使用coalesce
来避免空值。
declare @Datset1 table (
Id int,
date varchar(120),
name varchar(10),
val1 int
);
insert into @Datset1
values(1,'15-06-2022','n1',10),
(2,'15-06-2022','n1',12),
(3,'15-06-2022','n1',24),
(4,'16-09-2021','n2',27);
declare @Datset2 table (
Id int,
date varchar(120),
name varchar(10),
val2 int
);
insert into @Datset2
values(1 ,'15-06-2022','n1', 15),
(1 ,'12-06-2021','n1', 123),
(2 ,'15-06-2022','n2', 56),
(15 ,'21-03-2022','n1', 43),
(4 ,'16-09-2021','n2', 65);
select coalesce(dt1.Id,dt2.Id) as Id,
coalesce(dt1.date,dt2.date) as date,
coalesce(dt1.name,dt2.name) as name,
coalesce(dt1.val1,0) as val1,
coalesce(dt2.val2,0) as val2
from @Datset1 dt1 full outer join @Datset2 dt2 on dt1.Id=dt2.Id and dt1.date=dt2.date and dt1.name=dt2.name;
输出如下:
Id | date | name | val1 | val2 |
---|---|---|---|---|
1 | 15-06-2022 | n1 | 10 | 15 |
2 | 15-06-2022 | n1 | 12 | 0 |
3 | 15-06-2022 | n1 | 24 | 0 |
4 | 16-09-2021 | n2 | 27 | 65 |
1 | 12-06-2021 | n1 | 0 | 123 |
2 | 15-06-2022 | n2 | 0 | 56 |
15 | 21-03-2022 | n1 | 0 | 43 |
英文:
use full outer join
based on Id,date,name
to Result all data of two data sets if matching Records are there it will join based on Id,date,name. use coalesce
to avoid null's
declare @Datset1 table (
Id int,
date varchar(120),
name varchar(10),
val1 int
);
insert into @Datset1
values(1,'15-06-2022','n1',10),
(2,'15-06-2022','n1',12),
(3,'15-06-2022','n1',24),
(4,'16-09-2021','n2',27);
declare @Datset2 table (
Id int,
date varchar(120),
name varchar(10),
val2 int
);
insert into @Datset2
values(1 ,'15-06-2022','n1', 15),
(1 ,'12-06-2021','n1', 123),
(2 ,'15-06-2022','n2', 56),
(15 ,'21-03-2022','n1', 43),
(4 ,'16-09-2021','n2', 65);
select coalesce(dt1.Id,dt2.Id) as Id,
coalesce(dt1.date,dt2.date) as date,
coalesce(dt1.name,dt2.name) as name,
coalesce(dt1.val1,0) as val1,
coalesce(dt2.val2,0) as val2
from @Datset1 dt1 full outer join @Datset2 dt2 on dt1.Id=dt2.Id and dt1.date=dt2.date and dt1.name=dt2.name;
Output will be as follows
Id | date | name | val1 | val2 |
---|---|---|---|---|
1 | 15-06-2022 | n1 | 10 | 15 |
2 | 15-06-2022 | n1 | 12 | 0 |
3 | 15-06-2022 | n1 | 24 | 0 |
4 | 16-09-2021 | n2 | 27 | 65 |
1 | 12-06-2021 | n1 | 0 | 123 |
2 | 15-06-2022 | n2 | 0 | 56 |
15 | 21-03-2022 | n1 | 0 | 43 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论