2023年6月8日 10:37:13go评论60阅读模式

英文:

check which columns have different values after grouping by

问题

我有这样的数据集：

host_id binary_id service binary_date region binary_path ......
1       2         hello   05-06-2012  美国     /bin/usr
1       2         hello   08-09-2022  美国     /bin/usr
2       3         another 08-09-2026  英国     /bin/java

host_id,binary_id,service 是我感兴趣的组合，binary_date region binary_path 是我在这个表中有的许多列中的子集，该表大约有50列。
我想检查在按 host_id,binary_id,service 分组后，这50列中哪些列具有不同的值。例如，在这种情况下，我会得到 binary_date 列，因为这是唯一在分组后具有不同值的列。

预期结果应该是：

host_id binary_id service binary_date region binary_path ......
1       2         hello   05-06-2012  null   null
1       2         hello   08-09-2022  null   null

你可以看到，在这种情况下，只有 binary_date 包含不同的值，其他列 region,binary_path 是 null，因为在组中的行之间值完全相同。

英文:

I have a dataset like this:

host_id binary_id service binary_date region binary_path ......
1       2         hello   05-06-2012  US     /bin/usr
1       2         hello   08-09-2022  US     /bin/usr
2       3         another 08-09-2026  UK     /bin/java

host_id,binary_id,service is the group I'm interested in, binary_date region binary_path is a subset of many columns I have in this table, the table has approximately 50 columns.
I would like to check which of the 50 columns have different values after grouping by host_id,binary_id,service. So for example in this scenario I would get the binary_date column since this is the only column which has different values after grouped on.

Expected result would be:

host_id binary_id service binary_date region binary_path ......
1       2         hello   05-06-2012  null   null
1       2         hello   08-09-2022  null   null

You can see that in this case only the binary_date contain the different values, the other columns region,binary_path are null because the values are exactly the same across the rows in the group.

答案1

得分: 1

这是一种方法：

row number() 返回一个分区内的唯一标识符；这个唯一编号用于确定一行在同一分区内是否具有不同的值，如果每个分区的计数等于行号的总数，就表示发生了更改，否则返回空值。

with cte as (
   select *, count(1) over (partition by host_id, binary_id, service) as count_,
             row_number() over (partition by host_id, binary_id, service, binary_date) as rn_binary_date,
             row_number() over (partition by host_id, binary_id, service, region) as rn_region,
             row_number() over (partition by host_id, binary_id, service, binary_path) as rn_binary_path
   from mytable
)
select host_id, binary_id, service, 
      case 
        when count_ = sum(rn_binary_date) over(partition by host_id, binary_id, service) 
        then binary_date 
      end as binary_date,
      case 
        when count_ = sum(rn_region) over(partition by host_id, binary_id, service)
        then region 
      end as region,
      case 
        when count_ = sum(rn_binary_path) over(partition by host_id, binary_id, service)
        then binary_path 
      end as binary_path
from cte
where count_ > 1

结果：

host_id	binary_id	service	    binary_date	region	binary_path
1	    2	        hello	    2012-06-05	null	null
1	    2	        hello	    2022-09-08	null	null

演示在此处

英文:

This is a way to do it :

row number() returns a unique id within a partition; this unique number is used to determine whether or not a row has different values within the same partition, if count per partition equals total of row numbers, a change has occurred, otherwise, null is returned.

with cte as (
   select *, count(1) over (partition by host_id, binary_id, service) as count_,
             row_number() over (partition by host_id, binary_id, service, binary_date) as rn_binary_date,
             row_number() over (partition by host_id, binary_id, service, region) as rn_region,
             row_number() over (partition by host_id, binary_id, service, binary_path) as rn_binary_path
   from mytable
)
select host_id, binary_id, service, 
      case 
        when count_ = sum(rn_binary_date) over(partition by host_id, binary_id, service) 
        then binary_date 
      end as binary_date,
      case 
        when count_ = sum(rn_region) over(partition by host_id, binary_id, service)
        then region 
      end as region,
      case 
        when count_ = sum(rn_binary_path) over(partition by host_id, binary_id, service)
        then binary_path 
      end as binary_path
from cte
where count_ &gt; 1

Result :

host_id	binary_id	service	    binary_date	region	binary_path
1	    2	        hello	    2012-06-05	null	null
1	    2	        hello	    2022-09-08	null	null

Demo here

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

检查在分组后哪些列具有不同的值。

问题

答案1

PgBouncer throwing PSQLException: ERROR: unsupported pkt type: 80 when issuing query "SHOW POOLS" on "pgbouncer" database via JDBC

如何在使用Node.js进行评论时将登录数据存储在会话存储中并检索数据。

数值未保存在数据库中。

如何重新格式化已获取的psql查询输出？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论