英文:
Avoid Multiple subqueries in SQL with the same conditions
问题
我有一个类似这样的调查数据数据库:
```sql
create table Products(
Id int primary key,
Name nvarchar(max)
)
create table Opinions(
Id int primary key,
ProductId int foreign key references Products(Id),
IsEasyToUse bit default((0)),
IsGoodPrice bit default((0)),
IsDurable bit default((0)),
-- 还有其他列...
)
我想创建一个报告,显示每个产品以及来自Opinions
表中所有意见列的百分比。
到目前为止,我一直以天真的方式进行操作:
select
p.Name,
1.0 *
(select count(*) from Opinions o where o.ProductId = p.Id and o.IsEasyToUse = 1) /
(select count(*) from Opinions o where o.ProductId = p.Id) as EasyToUse,
1.0 *
(select count(*) from Opinions o where o.ProductId = p.Id and o.IsGoodPrice = 1) /
(select count(*) from Opinions o where o.ProductId = p.Id) as GoodPrice,
1.0 *
(select count(*) from Opinions o where o.ProductId = p.Id and o.IsDurable = 1) /
(select count(*) from Opinions o where o.ProductId = p.Id) as Durable
from
(select Id, Name from Products) p
这在技术上可以工作,但我不喜欢条件的重复。而且Opinions表的大小正在增长,性能问题开始变得明显(现在在100万行时需要大约1分钟)。
是否有一种更快速(以及更容易修改以便将来应用更多过滤条件)的重写方法?
<details>
<summary>英文:</summary>
I have a DB of survey data that looks like this:
create table Products(
Id int primary key,
Name nvarchar(max)
)
create table Opinions(
Id int primary key,
ProductId int foreign key references Products(Id),
IsEasyToUse bit default((0)),
IsGoodPrice bit default((0)),
IsDurable bit default((0)),
-- and so on...
)
I want to create a report that shows every product and a percentage of all the opinion columns from the `Opinions` table.
So far, I have been doing it the naive way:
select
p.Name,
1.0 *
(select count() from Opinions o where o.ProductId = p.Id and o.IsEasyToUse = 1) /
(select count() from Opinions o where o.ProductId = p.Id) as EasyToUse,
1.0 *
(select count() from Opinions o where o.ProductId = p.Id and o.IsGoodPrice = 1) /
(select count() from Opinions o where o.ProductId = p.Id) as GoodPrice,
1.0 *
(select count() from Opinions o where o.ProductId = p.Id and o.IsDurable = 1) /
(select count() from Opinions o where o.ProductId = p.Id) as Durable
from
(select Id, Name from Products) p
This technically works, but I am not happy with the repetition of the conditions. Also the Opinions table is growing in size so the performance issues are starting to become noticeable (it now takes around a minute with 1m rows).
Is there a way to rewrite it in a way that's faster (and easier to modify in the future in case I want to apply more filters)?
</details>
# 答案1
**得分**: 1
The problem isn't just the repetition of conditions. Each subquery will be executed separately.
A better option would be to join the two tables, group by product then calculate the counts. It's not possible to count based on a condition. The same thing can be done using `SUM(IIF(condition,1,0))` though:
```sql
SELECT p.ID,
MAX(p.Name) as Name,
SUM(IIF(IsEasyToUse,1.0,0))/COUNT(*) as EasyToUse,
SUM(IIF(IsGoodPrice,1.0,0))/COUNT(*) as GoodPrice,
SUM(IIF(IsDurable,1.0,0))/COUNT(*) as Durable
FROM Products p
INNER JOIN Opinions o on p.ID=o.ProductID
GROUP BY p.ID
英文:
The problem isn't just the repetition of conditions. Each subquery will be executed separately.
A better option would be to join the two tables, group by product then calculate the counts. It's not possible to count based on a condition. The same thing can be done using SUM(IIF(condition,1,0))
though:
SELECT p.ID,
MAX(p.Name) as Name,
SUM(IIF(IsEasyToUse,1.0,0))/COUNT(*) as EasyToUse,
SUM(IIF(IsGoodPrice,1.0,0))/COUNT(*) as GoodPrice,
SUM(IIF(IsDurable,1.0,0))/COUNT(*) as Durable
FROM Products p
INNER JOIN Opinions o on p.ID=o.ProductID
GROUP BY p.ID
答案2
得分: 1
以下是翻译好的部分:
作为入门:我们可以连接和聚合,而不是执行多个相关子查询。
然后:我们不需要计算总数;相反,我们可以只使用 avg()
;减少聚合函数的调用应该提高查询的效率:
从产品 p
内部连接意见 o
按 p.id,p.name 分组
根据你的数据如何分布在两个表之间,apply
可能提供了一个很好的替代方法,因为它避免了外部聚合:
从产品 p
交叉应用 (
选择
avg(case when IsEasyToUse = 1 then 1.0 else 0 end) as EasyToUse,
avg(case when IsGoodPrice = 1 then 1.0 else 0 end) as GoodPrice,
avg(case when IsDurable = 1 then 1.0 else 0 end) as Durable
从意见 o
在 o.productID = p.id
) x
英文:
As a starter: we can join and aggregate instead of executing multiples correlated subqueries.
Then: we don't need to compute the total count; instead, we can just juse avg()
; less calls to aggregate functions should improve the efficiency of the query:
select p.id, p.name
avg(case when IsEasyToUse = 1 then 1.0 else 0 end) as EasyToUse,
avg(case when IsGoodPrice = 1 then 1.0 else 0 end) as GoodPrice,
avg(case when IsDurable = 1 then 1.0 else 0 end) as Durable
from products p
inner join opinions o on o.productID = p.id
group by p.id, p.name
Depending on how your data is spread across the two tables, apply
might offer a good alternative, since it avoids outer aggregation:
select p.id, p.name, x.*
from products p
cross apply (
select
avg(case when IsEasyToUse = 1 then 1.0 else 0 end) as EasyToUse,
avg(case when IsGoodPrice = 1 then 1.0 else 0 end) as GoodPrice,
avg(case when IsDurable = 1 then 1.0 else 0 end) as Durable
from opinions o
where o.productID = p.id
) x
答案3
得分: 0
你可以尝试这个查询
SELECT
ProductId = p.Id
, ProductName = p.Name
, op.GoodPrice
, op.EasyToUse
, op.Durable
FROM dbo.Products p
INNER JOIN
(
SELECT o.ProductId
, EasyToUse = 1.0 * SUM(CASE o.IsEasyToUse WHEN 0 THEN 0 ELSE 1 END ) / COUNT(o.IsEasyToUse)
, GoodPrice = 1.0 * SUM(CASE o.IsGoodPrice WHEN 0 THEN 0 ELSE 1 END ) / COUNT(o.IsGoodPrice)
, Durable = 1.0 * SUM(CASE o.IsDurable WHEN 0 THEN 0 ELSE 1 END ) / COUNT(o.IsDurable)
FROM dbo.Opinions o
GROUP BY o.ProductId
) op
ON p.Id = op.ProductId
英文:
you could try this query
SELECT
ProductId = p.Id
, ProductName = p.Name
, op.GoodPrice
, op.EasyToUse
, op.Durable
FROM dbo.Products p
INNER JOIN
(
SELECT o.ProductId
, EasyToUse = 1.0 * SUM(CASE o.IsEasyToUse WHEN 0 THEN 0 ELSE 1 END ) / COUNT(o.IsEasyToUse)
, GoodPrice = 1.0 * SUM(CASE o.IsGoodPrice WHEN 0 THEN 0 ELSE 1 END ) / COUNT(o.IsGoodPrice)
, Durable = 1.0 * SUM(CASE o.IsDurable WHEN 0 THEN 0 ELSE 1 END ) / COUNT(o.IsDurable)
FROM dbo.Opinions o
GROUP BY o.ProductId
) op
ON p.Id = op.ProductId
答案4
得分: 0
另一种计算百分比的方法,其中总数只能计算一次:
SELECT Name, 1.0 * EasyToUse/total AS EasyToUse,
1.0 * GoodPrice/total AS GoodPrice,
1.0 * Durable/total AS Durable
FROM (
SELECT p.Name,
COUNT(CASE WHEN IsEasyToUse = 1 THEN 1 END) AS EasyToUse,
COUNT(CASE WHEN IsGoodPrice = 1 THEN 1 END) AS GoodPrice,
COUNT(CASE WHEN IsDurable = 1 THEN 1 END) AS Durable,
COUNT(p.Id) AS total
FROM Products p
INNER JOIN Opinions o on o.ProductID = p.Id
GROUP BY p.Id, p.Name
) AS S
英文:
Another method for calculating percentages in which the total count can only be calculated once:
SELECT Name, 1.0 * EasyToUse/total AS EasyToUse,
1.0 * GoodPrice/total AS GoodPrice,
1.0 * Durable/total AS Durable
FROM (
SELECT p.Name,
COUNT(CASE WHEN IsEasyToUse = 1 THEN 1 END) AS EasyToUse,
COUNT(CASE WHEN IsGoodPrice = 1 THEN 1 END) AS GoodPrice,
COUNT(CASE WHEN IsDurable = 1 THEN 1 END) AS Durable,
COUNT(p.Id) AS total
FROM Products p
INNER JOIN Opinions o on o.ProductID = p.Id
GROUP BY p.Id, p.Name
) AS S
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论