英文:
Performance problem with query that is solved with a cursor in SQL Server
问题
I understand that you want a translation of the provided code and explanation. Here's the translated code:
我被要求在我们的系统中找到5个人(pfileid),这些人在我们的系统中具有移动电话号码,并且只有一种类型(dfiletypeid)的多个文件。他们还必须是文件上的特定类型的人(pfiletypeid = 2)。
我想出了以下查询:
```sql
SELECT TOP 5
y.PFileId,
(SELECT TOP 1 pfilecommdescr FROM _PFileComm
WHERE PFileId = y.PFileId AND pfilecommtypeid = 2
AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE '04%') AS gsm,
y.c AS c
FROM
(SELECT
x.pfileid,
COUNT(*) AS c
FROM
(SELECT
dflpf.pfileid,
df.dfiletypeid,
COUNT(*) AS c
FROM
_dfile df
JOIN
_dfilelinkpfile dflpf ON df.dfileid = dflpf.dfileid
AND dflpf.pfiletypeid = 2
GROUP BY
dflpf.pfileid, df.dfiletypeid) x
GROUP BY
x.pfileid) y
WHERE
EXISTS (SELECT 1 FROM _pfilecomm
WHERE pfileid = y.pfileid AND pfilecommtypeid = 2
AND LEN(pfilecommdescr) = 10
AND pfilecommdescr LIKE '04%')
AND (SELECT COUNT(*) FROM _dfilelinkpfile
WHERE pfileid = y.pfileid AND pfiletypeid = 2) > 1
AND NOT EXISTS (SELECT 1 FROM _DFileLinkPFile
WHERE PFileId = y.PFileId AND PFileTypeId <> 2)
AND y.c = 1
我尝试运行它几个小时,但没有得到结果。我不明白为什么它需要这么长时间,所以我写了一个游标:
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'tblTemp_test')
BEGIN
CREATE TABLE tblTemp_test (PFileId INT, GSM varchar(255), c INT)
END
DECLARE @PFileId BIGINT
DECLARE CheckPFileId CURSOR FOR SELECT PFileId FROM _PFile ORDER BY PFileId DESC
OPEN CheckPFileId
FETCH NEXT FROM CheckPFileId INTO @PFileId
WHILE @@Fetch_Status = 0 AND (SELECT COUNT(*) FROM dbo.tblTemp_Axel_test) <> 5
BEGIN
insert into dbo.tblTemp_Axel_test (PFileId, GSM, c)
select
y.PFileId,
(select top 1 pfilecommdescr from _PFileComm where PFileId = y.PFileId and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like '04%') as gsm,
y.c as c
from
(
select
x.pfileid,
count(*) as c
from
(
select
dflpf.pfileid,
df.dfiletypeid,
count(*) as c
from _dfile df
join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid and dflpf.pfiletypeid = 2
where dflpf.PFileId = @PFileId
group by dflpf.pfileid,df.dfiletypeid
) x
group by x.pfileid
) y
where exists (select 1 from _pfilecomm where pfileid = y.pfileid and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like '04%')
and (select count(*) from _dfilelinkpfile where pfileid = y.pfileid and pfiletypeid = 2) > 1
and not exists (select 1 from _DFileLinkPFile where PFileId = y.PFileId and PFileTypeId <> 2)
and y.c = 1
FETCH NEXT FROM CheckPFileId INTO @PFileId
END
CLOSE CheckPFileId
DEALLOCATE CheckPFileId
select * from dbo.tblTemp_test
它在几秒钟内给出了答案。有人知道为什么第一个查询不能在类似的时间内给我结果吗?我以为应该尽量避免在SQL Server中使用游标以避免性能问题。这可能是我没有写出最优化的查询。我问这个问题是因为我之前也遇到过这个问题,找不到答案。
英文:
I got asked the question to find 5 people (pfileid) in our system that have a mobile phone number in our system and that have multiple files that can be of only one type (dfiletypeid). They also have to be a specific type of person on the file (pfiletypeid = 2).
I came up with this query:
SELECT TOP 5
y.PFileId,
(SELECT TOP 1 pfilecommdescr FROM _PFileComm
WHERE PFileId = y.PFileId AND pfilecommtypeid = 2
AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE '04%') AS gsm,
y.c AS c
FROM
(SELECT
x.pfileid,
COUNT(*) AS c
FROM
(SELECT
dflpf.pfileid,
df.dfiletypeid,
COUNT(*) AS c
FROM
_dfile df
JOIN
_dfilelinkpfile dflpf ON df.dfileid = dflpf.dfileid
AND dflpf.pfiletypeid = 2
GROUP BY
dflpf.pfileid, df.dfiletypeid) x
GROUP BY
x.pfileid) y
WHERE
EXISTS (SELECT 1 FROM _pfilecomm
WHERE pfileid = y.pfileid AND pfilecommtypeid = 2
AND LEN(pfilecommdescr) = 10
AND pfilecommdescr LIKE '04%')
AND (SELECT COUNT(*) FROM _dfilelinkpfile
WHERE pfileid = y.pfileid AND pfiletypeid = 2) > 1
AND NOT EXISTS (SELECT 1 FROM _DFileLinkPFile
WHERE PFileId = y.PFileId AND PFileTypeId <> 2)
AND y.c = 1
I tried to run it for hours and I didn't get a result. I don't understand why it takes so long so I wrote a cursor instead:
IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'tblTemp_test')
BEGIN
CREATE TABLE tblTemp_test (PFileId INT, GSM varchar(255), c INT)
END
DECLARE @PFileId BIGINT
DECLARE CheckPFileId CURSOR FOR SELECT PFileId FROM _PFile ORDER BY PFileId DESC
OPEN CheckPFileId
FETCH NEXT FROM CheckPFileId INTO @PFileId
WHILE @@Fetch_Status = 0 AND (SELECT COUNT(*) FROM dbo.tblTemp_Axel_test) <> 5
BEGIN
insert into dbo.tblTemp_Axel_test (PFileId, GSM, c)
select
y.PFileId,
(select top 1 pfilecommdescr from _PFileComm where PFileId = y.PFileId and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like '04%') as gsm,
y.c as c
from
(
select
x.pfileid,
count(*) as c
from
(
select
dflpf.pfileid,
df.dfiletypeid,
count(*) as c
from _dfile df
join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid and dflpf.pfiletypeid = 2
where dflpf.PFileId = @PFileId
group by dflpf.pfileid,df.dfiletypeid
) x
group by x.pfileid
) y
where exists (select 1 from _pfilecomm where pfileid = y.pfileid and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like '04%')
and (select count(*) from _dfilelinkpfile where pfileid = y.pfileid and pfiletypeid = 2) > 1
and not exists (select 1 from _DFileLinkPFile where PFileId = y.PFileId and PFileTypeId <> 2)
and y.c = 1
FETCH NEXT FROM CheckPFileId INTO @PFileId
END
CLOSE CheckPFileId
DEALLOCATE CheckPFileId
select * from dbo.tblTemp_test
It came up with the answer in seconds. Does anyone know why the first query can't give me a result in a similar time? I thought cursors were supposed to be the avoided at all costs in sql server for performance issues. It could be I didn't write the query optimally. I'm asking because I've had this problem before and I can't find an answer to it.
I tried to look in the estimated execution plan but it doesn't give me an answer to why it takes so long.
答案1
得分: 1
你在将查询包装成游标时所做的唯一更改是在最内部的查询中添加了一个WHERE
子句,因此它必定是速度提升的原因。
我会假设至少有一个表_dfile
、_dfilelinkpfile
存储了大量数据,并且您使用的连接 - join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid
- 产生了一个非常大的结果集。
使用游标,您将该结果集限制为当前处理的PFileID
记录,这可能只是您的JOIN
可以返回的所有记录的一个小子集,这使得整个查询运行速度快。如果您的数据库中有很多人满足您描述的条件,那么SQL Server只需要评估游标的少数迭代,而不是计算完整的连接结果集并对其执行整个查询。
您可以通过以下方式验证上述内容:
- 使用附加的
WHERE
子句运行您的原始查询,指定单个ID - 就像在游标中一样(它可能会运行得很快); - 计算游标中的迭代次数并将数字存储在某个地方(它可能远远低于您的数据库中不同的PFileID数量)。
话虽如此,在这里使用游标仍然不是一个好主意 - 它仅表示您的查询未以最佳方式编写或优化器无法正确处理它。例如,在您的查询中存在一些不必要的聚合(最内部查询中的COUNT(*),您从未使用过)。
没有您的数据库结构,很难进行任何测试,但我建议稍微简化查询:
SELECT TOP 5 PFileId, pcomm.PFileCommDescr, dflpf.FileCount
FROM _dfile df
-- only people having mobile phone number
CROSS APPLY(
SELECT TOP 1 PFileCommDescr
FROM _pfilecomm
WHERE PFileId = df.PFileId AND pfilecommtypeid = 2
AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE '04%'
) pcomm
-- only people who have more than one file with pfiletypeid = 2
CROSS APPLY(
SELECT COUNT(*) as FileCount
FROM _dfilelinkpfile
WHERE df.PFileId = PFileId AND pfiletypeid = 2
GROUP BY PFileId
HAVING COUNT(*) > 1
) dflpf
-- only people whose all files have the same dfiletypeid
CROSS APPLY(
SELECT COUNT(DISTINCT dfiletypeid) as DFileTypeCount
FROM _dfilelinkpfile
WHERE df.PFileId = PFileId
GROUP BY PFileId
HAVING COUNT(*) = 1
) dflpf_ntypes
英文:
The only thing you have changed in your query when wrapping it in cursor was adding a WHERE
clause in the innermost query, so it must be the cause for the speedup.
I would assume that at least one of tables _dfile
, _dfilelinkpfile
stores a lot of data and that join you have used - join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid
- produces a very big result set.
Using cursor, you constrain that result set to records with currenly processed PFileID
, which is probably a tiny subset of all records that your JOIN
can return, which makes the whole query run fast. If there are a lot of people in your database that satisfy conditions you described, there is a high chance that Sql Server will only have to evaluate a few iterations of cursor, as opposed to computing the full joining result set and performing the whole query on it.
You can verify the above by:
- running your original query with additional
WHERE
clause, specifying a single ID - just like in your cursor (it will probably run fast); - counting iterations in your cursor and storing the number somewhere (it will probably be much lower than number of distinct PFileIDs in your database).
That being said, using a cursor here is still not a great idea - the fact that it helped with performance only means that your query is not written in optimal way or that the optimizer cannot handle it correctly. For instance, there is some unneccessary aggregation in your query (COUNT(*) in the innermost query that you're never using).
It's impossible to test anything without your database structure, but I would try to simplify the query a bit:
SELECT TOP 5 PFileId, pcomm.PFileCommDescr, dflpf.FileCount
FROM _dfile df
-- only people having mobile phone number
CROSS APPLY(
SELECT TOP 1 PFileCommDescr
FROM _pfilecomm
WHERE PFileId = df.PFileId AND pfilecommtypeid = 2
AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE '04%'
) pcomm
-- only people who have more than one file with pfiletypeid = 2
CROSS APPLY(
SELECT COUNT(*) as FileCount
FROM _dfilelinkpfile
WHERE df.PFileId = PFileId AND pfiletypeid = 2
GROUP BY PFileId
HAVING COUNT(*) > 1
) dflpf
-- only people whose all files have tha same dfiletypeid
CROSS APPLY(
SELECT COUNT(DISTINCT dfiletypeid) as DFileTypeCount
FROM _dfilelinkpfile
WHERE df.PFileId = PFileId
GROUP BY PFileId
HAVING COUNT(*) = 1
) dflpf_ntypes
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论