2023年2月14日 04:16:48go评论59阅读模式

英文:

count (distinct ...) VS count from (select distinct ...) VS group by

问题

在Snowflake上，我在从相同的表中计算不同值的唯一值时，会根据计算方法得到不同的结果。我曾经认为它们是等效的。鉴于存在差异，首先我想知道在哪些情况下这些策略不能互换，其次，如何判断哪种策略返回了正确的数量。

我包括了我用来测试的查询：

select 'count_distinct_subquery' as strat, count(*) from (
  select distinct 
  plan_code,
  fis_we_dt,
  sku_no,
  pog_segment_name,
  shelf_no,
  position_id
  from src 
)
union all
select 'count_distinct' as strat, count(
  distinct 
  plan_code,
  fis_we_dt,
  sku_no,
  pog_segment_name,
  shelf_no,
  position_id
)
from src 
union all 
select 'group_by_subquery' as strat, count(*) from (
  select *
  from src 
  group by 
  plan_code,
  fis_we_dt,
  sku_no,
  pog_segment_name,
  shelf_no,
  position_id
)

输出结果如图所示。

英文:

On snowflake, I'm getting different results depending on how I count the distinct values from the same table. I used to think them as equivalent. Given the discrepancies, first I'd like to know in which scenarios these strategies can not be interchanged, and second, how to tell what strategy is returning the right number.

I include the query I'm using to test this

select &#39;count_distinct_subquery&#39; as strat,count(*) from (
  select distinct 
  plan_code,
  fis_we_dt,
  sku_no,
  pog_segment_name,
  shelf_no,
  position_id
  from src 
)
union all
select &#39;count_distinct&#39; as strat,count(
  distinct 
  plan_code,
  fis_we_dt,
  sku_no,
  pog_segment_name,
  shelf_no,
  position_id
)
from src 
union all 
select &#39;group_by_subquery&#39; as strat, count(*) from (
  select *
  from src 
  group by 
  plan_code,
  fis_we_dt,
  sku_no,
  pog_segment_name,
  shelf_no,
  position_id
)

The output is as in the image

答案1

得分: 1

第二个版本的 count (distinct expr1, ...) 会跳过 NULL 值。

CREATE OR REPLACE TABLE src
AS 
SELECT NULL AS plan_code, 1 AS fis_we_dt;

select 'count_distinct_subquery' as strat,count(*) from (
  select distinct 
  plan_code,
  fis_we_dt
  from src 
)
union all
select 'count_distinct' as strat,count(
  distinct 
  plan_code,
  fis_we_dt
)
from src 
union all 
select 'group_by_subquery' as strat, count(*) from (
  select *
  from src 
  group by 
  plan_code,
  fis_we_dt
);

COUNT(*)，表示函数应返回不包含任何 NULL 值的行数。请参阅 COUNT(*) 和聚合函数和 NULL 值以获取示例和更多信息。

英文:

The second version count (distinct expr1, ...) skips NULLs.

CREATE OR REPLACE TABLE src
AS 
SELECT NULL AS plan_code, 1 AS fis_we_dt;

select &#39;count_distinct_subquery&#39; as strat,count(*) from (
  select distinct 
  plan_code,
  fis_we_dt
  from src 
)
union all
select &#39;count_distinct&#39; as strat,count(
  distinct 
  plan_code,
  fis_we_dt
)
from src 
union all 
select &#39;group_by_subquery&#39; as strat, count(*) from (
  select *
  from src 
  group by 
  plan_code,
  fis_we_dt
);

>COUNT(*)
>
> COUNT( [ DISTINCT ] <expr1> [ , <expr2> ... ] )
>
> Alias.*, which indicates that the function should return the number of rows that do not contain any NULLs. See Examples for an example.

> Aggregate Functions and NULL Values
>
>Some aggregate functions can be passed more than one column. For example:
>
> SELECT COUNT(col1, col2) FROM table1;
>
> In these instances, the aggregate function ignores a row if any individual column is NULL.

答案2

得分: 0

根据Lukasz的回答：

with data(col1, col2) as (
    select * from values
    (1, 10),
    (1, 10),
    (1, null),
    (null, null),
    (null, null)
), unions as (
    select 
        'count_distinct_subquery' as strat,
        count(*) as count
    from (
        select distinct col1, col2
        from data 
    )
    
    union all
    
    select 
        'count_distinct' as strat,
        count(distinct col1, col2)
    from data
    
    union all 
    
    select 
        'group_by_subquery' as strat, 
        count(*) 
    from (
        select *
        from data
        group by 1,2
    )
)
select * from unions;

结果如下：

STRAT	COUNT
count_distinct_subquery	3
count_distinct	1
group_by_subquery	3

第一个和最后一个是相同的。子选择的DISTINCT和GROUP BY会找出排列组合，并将NULL视为一个独立的元素，然后计算出现的行数，即3。

中间的部分是要求没有NULL的行数，即1。

expr1
这应该是要么：

列名，可以是合格的名称（例如 database.schema.table.column_name）。

Alias.*，表示函数应返回不包含任何NULL的行数。请参考示例以了解示例。

通过从第一个和第三个查询中提取子查询，然后将其包装在count(distinct col1, col2)而不是count(*)中，可以进一步看出这一点：

    select 
        'group_by_subquery_count_cols' as strat, 
        count(distinct col1, col2)
    from (
        select *
        from data
        group by 1,2
    )

    union all

    select 
        'count_distinct_subquery_count_cols' as strat,
        count(distinct col1, col2)
    from (
        select distinct col1, col2
        from data 
    )

现在我们再次得到1，而不是3，来自相同的数据。

STRAT	COUNT
count_distinct_subquery	3
count_distinct	1
group_by_subquery	3
group_by_subquery_count_cols	1
count_distinct_subquery_count_cols	1

英文:

As per Lukasz answer:

with data(col1, col2) as (
    select * from values
    (1, 10),
    (1, 10),
    (1, null),
    (null, null),
    (null, null)
), unions as (
    select 
        &#39;count_distinct_subquery&#39; as strat,
        count(*) as count
    from (
        select distinct col1, col2
        from data 
    )
    
    union all
    
    select 
        &#39;count_distinct&#39; as strat,
        count(distinct col1, col2)
    from data
    
    union all 
    
    select 
        &#39;group_by_subquery&#39; as strat, 
        count(*) 
    from (
        select *
        from data
        group by 1,2
    )
)
select * from unions;

gives:

STRAT	COUNT
count_distinct_subquery	3
count_distinct	1
group_by_subquery	3

the first and last are the same thing. The sub-select DISTINCT and GROUP BY are find the permutations, AND they respect NULLs as things, and then you count the number of rows present, which is 3.

The middle is asking for the count of row with no-nulls, which is 1.

> expr1
> This should be either:
>
> A column name, which can be a qualified name (e.g. > database.schema.table.column_name).
>
> Alias.*, which indicates that the function should return the number of rows that do not contain any NULLs. See Examples for an example.

This can be further seen by taking the sub-selects from the first and third queries, and wrap those in a count(distinct col1, col2) instead of the count(*)

    select 
        &#39;group_by_subquery_count_cols&#39; as strat, 
        count(distinct col1, col2)
    from (
        select *
        from data
        group by 1,2
    )

    union all

    select 
        &#39;count_distinct_subquery_count_cols&#39; as strat,
        count(distinct col1, col2)
    from (
        select distinct col1, col2
        from data 
    )

and now we get 1's again not three's, from the same data.

STRAT	COUNT
count_distinct_subquery	3
count_distinct	1
group_by_subquery	3
group_by_subquery_count_cols	1
count_distinct_subquery_count_cols	1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

count (distinct …) VS count from (select distinct …) VS group by

问题

答案1

答案2

Jooq 查询嵌套对象，其名称不同于表格。

从单列表中获取结果集在SQL中。

如何使用 GORM 中的原始 SQL 构建器获取更新后的输出。

寻找有关如何构建具有多个映射的SQL查询的信息。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论