使用count()和count(distinct)来计算重复项的数量。

huangapple go评论72阅读模式
英文:

Count duplicates using count() and count(distinct)

问题

我不清楚为什么这个SQL查询可以用来计算特定字段的重复值数量:

select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);

返回结果为 88

而这个查询却不能:

SELECT count(Country) - count(Distinct Country) 
FROM Customers;

返回结果为 70

另外,为什么它们会产生不同的结果呢?

*FYI,这些查询在W3Schools的trySQL界面中可行。

https://www.w3schools.com/sql/trysql.asp?filename=trysql_asc#:~:text=If%20you%20switch%20to%20a,your%20browser%2C%20for%20testing%20purposes.

我尝试过的方法:

我运行了这些查询,它们给出了不同的结果(分别返回88和70)。我预期它们应该产生相同的结果。

英文:

I am unclear why this SQL query can be used to count duplicate values for a particular field:

select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);

returns 88**

while this query cannot:

SELECT count(Country) - count(Distinct Country) 
FROM Customers;

returns 70**

As a side note, why do they give different results?

*FYI, these queries work in the W3Schools trySQL interface.

https://www.w3schools.com/sql/trysql.asp?filename=trysql_asc#:~:text=If%20you%20switch%20to%20a,your%20browser%2C%20for%20testing%20purposes.

What I have tried:

I have run these queries and they give different results (returning 88 vs 70). I expected them to produce the same result.

答案1

得分: 0

考虑你的Customers数据库中的这些行,它们位于Country列中:

  1. 印度
  2. 印度
  3. 印度
  4. 美国
  5. 美国
  6. 澳大利亚

现在考虑你的第一个查询:

select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);

首先让我们考虑你正在使用的内部查询:

select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1

这个查询的输出(Output1)将是:

Country country_count
印度 3
美国 2

现在考虑将外部查询应用于这个输出:

select sum(country_count)
FROM Output1

答案应该是5。

但是对于你的第二个查询:

SELECT count(Country) - count(Distinct Country) 
FROM Customers;

输出应该是6-3 = 3。

所以,这两个查询做了不同的事情。这就是你得到不同结果的原因。

第一个查询:忽略只有1次计数的国家,然后将其他国家的计数相加。

第二个查询:忽略所有不同国家的第一次计数,然后将其余计数相加。

希望这有所帮助。

英文:

Consider these rows in your Customers database in Country column

  1. India
  2. India
  3. India
  4. USA
  5. USA
  6. Australia

Now consider your 1st query:-

select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);

First let's consider the inner query that you are using:-

select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1

The output (Output1) of this query will be:-

Country | country_count

India | 3

USA | 2

Now consider applying the outer query on this output:-

select sum(country_count)
FROM Output1

The answer should be 5 for this

But for your 2nd query:-

SELECT count(Country) - count(Distinct Country) 
FROM Customers;

The output should be 6-3 = 3

So, both are doing different things. That's the reason you are getting different outputs.

1st query : Ignore the Countries which have only 1 count and then add up all the counts for the other ones.

2nd query : Ignore the first count from all the distinct Countries and add up rest of the counts.

Hope this helps.

huangapple
  • 本文由 发表于 2023年7月13日 23:07:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76680918.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定