英文:
check which columns have different values after grouping by
问题
我有这样的数据集:
host_id binary_id service binary_date region binary_path ......
1 2 hello 05-06-2012 美国 /bin/usr
1 2 hello 08-09-2022 美国 /bin/usr
2 3 another 08-09-2026 英国 /bin/java
host_id,binary_id,service
是我感兴趣的组合,binary_date region binary_path
是我在这个表中有的许多列中的子集,该表大约有50列。
我想检查在按 host_id,binary_id,service
分组后,这50列中哪些列具有不同的值。例如,在这种情况下,我会得到 binary_date
列,因为这是唯一在分组后具有不同值的列。
预期结果应该是:
host_id binary_id service binary_date region binary_path ......
1 2 hello 05-06-2012 null null
1 2 hello 08-09-2022 null null
你可以看到,在这种情况下,只有 binary_date
包含不同的值,其他列 region,binary_path
是 null,因为在组中的行之间值完全相同。
英文:
I have a dataset like this:
host_id binary_id service binary_date region binary_path ......
1 2 hello 05-06-2012 US /bin/usr
1 2 hello 08-09-2022 US /bin/usr
2 3 another 08-09-2026 UK /bin/java
host_id,binary_id,service
is the group I'm interested in, binary_date region binary_path
is a subset of many columns I have in this table, the table has approximately 50 columns.
I would like to check which of the 50 columns have different values after grouping by host_id,binary_id,service
. So for example in this scenario I would get the binary_date
column since this is the only column which has different values after grouped on.
Expected result would be:
host_id binary_id service binary_date region binary_path ......
1 2 hello 05-06-2012 null null
1 2 hello 08-09-2022 null null
You can see that in this case only the binary_date
contain the different values, the other columns region,binary_path
are null because the values are exactly the same across the rows in the group.
答案1
得分: 1
这是一种方法:
row number()
返回一个分区内的唯一标识符;这个唯一编号用于确定一行在同一分区内是否具有不同的值,如果每个分区的计数等于行号的总数,就表示发生了更改,否则返回空值。
with cte as (
select *, count(1) over (partition by host_id, binary_id, service) as count_,
row_number() over (partition by host_id, binary_id, service, binary_date) as rn_binary_date,
row_number() over (partition by host_id, binary_id, service, region) as rn_region,
row_number() over (partition by host_id, binary_id, service, binary_path) as rn_binary_path
from mytable
)
select host_id, binary_id, service,
case
when count_ = sum(rn_binary_date) over(partition by host_id, binary_id, service)
then binary_date
end as binary_date,
case
when count_ = sum(rn_region) over(partition by host_id, binary_id, service)
then region
end as region,
case
when count_ = sum(rn_binary_path) over(partition by host_id, binary_id, service)
then binary_path
end as binary_path
from cte
where count_ > 1
结果:
host_id binary_id service binary_date region binary_path
1 2 hello 2012-06-05 null null
1 2 hello 2022-09-08 null null
英文:
This is a way to do it :
row number()
returns a unique id within a partition; this unique number is used to determine whether or not a row has different values within the same partition, if count per partition equals total of row numbers, a change has occurred, otherwise, null is returned.
with cte as (
select *, count(1) over (partition by host_id, binary_id, service) as count_,
row_number() over (partition by host_id, binary_id, service, binary_date) as rn_binary_date,
row_number() over (partition by host_id, binary_id, service, region) as rn_region,
row_number() over (partition by host_id, binary_id, service, binary_path) as rn_binary_path
from mytable
)
select host_id, binary_id, service,
case
when count_ = sum(rn_binary_date) over(partition by host_id, binary_id, service)
then binary_date
end as binary_date,
case
when count_ = sum(rn_region) over(partition by host_id, binary_id, service)
then region
end as region,
case
when count_ = sum(rn_binary_path) over(partition by host_id, binary_id, service)
then binary_path
end as binary_path
from cte
where count_ > 1
Result :
host_id binary_id service binary_date region binary_path
1 2 hello 2012-06-05 null null
1 2 hello 2022-09-08 null null
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论