英文:
Count duplicates using count() and count(distinct)
问题
我不清楚为什么这个SQL查询可以用来计算特定字段的重复值数量:
select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);
返回结果为 88
而这个查询却不能:
SELECT count(Country) - count(Distinct Country)
FROM Customers;
返回结果为 70
另外,为什么它们会产生不同的结果呢?
*FYI,这些查询在W3Schools的trySQL界面中可行。
我尝试过的方法:
我运行了这些查询,它们给出了不同的结果(分别返回88和70)。我预期它们应该产生相同的结果。
英文:
I am unclear why this SQL query can be used to count duplicate values for a particular field:
select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);
returns 88**
while this query cannot:
SELECT count(Country) - count(Distinct Country)
FROM Customers;
returns 70**
As a side note, why do they give different results?
*FYI, these queries work in the W3Schools trySQL interface.
What I have tried:
I have run these queries and they give different results (returning 88 vs 70). I expected them to produce the same result.
答案1
得分: 0
考虑你的Customers
数据库中的这些行,它们位于Country
列中:
- 印度
- 印度
- 印度
- 美国
- 美国
- 澳大利亚
现在考虑你的第一个查询:
select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);
首先让我们考虑你正在使用的内部查询:
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
这个查询的输出(Output1
)将是:
Country | country_count |
---|---|
印度 | 3 |
美国 | 2 |
现在考虑将外部查询应用于这个输出:
select sum(country_count)
FROM Output1
答案应该是5。
但是对于你的第二个查询:
SELECT count(Country) - count(Distinct Country)
FROM Customers;
输出应该是6-3 = 3。
所以,这两个查询做了不同的事情。这就是你得到不同结果的原因。
第一个查询:忽略只有1次计数的国家,然后将其他国家的计数相加。
第二个查询:忽略所有不同国家的第一次计数,然后将其余计数相加。
希望这有所帮助。
英文:
Consider these rows in your Customers
database in Country
column
- India
- India
- India
- USA
- USA
- Australia
Now consider your 1st query:-
select sum(country_count)
from (
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
);
First let's consider the inner query that you are using:-
select Country, count(Country) as country_count
from Customers
group by Country
having count(Country) > 1
The output (Output1
) of this query will be:-
Country | country_count
India | 3
USA | 2
Now consider applying the outer query on this output:-
select sum(country_count)
FROM Output1
The answer should be 5 for this
But for your 2nd query:-
SELECT count(Country) - count(Distinct Country)
FROM Customers;
The output should be 6-3 = 3
So, both are doing different things. That's the reason you are getting different outputs.
1st query : Ignore the Countries which have only 1 count and then add up all the counts for the other ones.
2nd query : Ignore the first count from all the distinct Countries and add up rest of the counts.
Hope this helps.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论