2023年5月11日 19:18:39go评论54阅读模式

英文:

Performance problem with query that is solved with a cursor in SQL Server

问题

I understand that you want a translation of the provided code and explanation. Here's the translated code:

我被要求在我们的系统中找到5个人（pfileid），这些人在我们的系统中具有移动电话号码，并且只有一种类型（dfiletypeid）的多个文件。他们还必须是文件上的特定类型的人（pfiletypeid = 2）。

我想出了以下查询：

```sql
SELECT TOP 5
    y.PFileId,
    (SELECT TOP 1 pfilecommdescr FROM _PFileComm 
     WHERE PFileId = y.PFileId AND pfilecommtypeid = 2 
       AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE '04%') AS gsm,
    y.c AS c
FROM
	(SELECT
		 x.pfileid,
		 COUNT(*) AS c
	 FROM
		 (SELECT
			  dflpf.pfileid,
			  df.dfiletypeid,
			  COUNT(*) AS c
		  FROM 
              _dfile df
		  JOIN
              _dfilelinkpfile dflpf ON df.dfileid = dflpf.dfileid 
                                    AND dflpf.pfiletypeid = 2
		  GROUP BY
              dflpf.pfileid, df.dfiletypeid) x
     GROUP BY
         x.pfileid) y
WHERE
    EXISTS (SELECT 1 FROM _pfilecomm 
            WHERE pfileid = y.pfileid AND pfilecommtypeid = 2 
              AND LEN(pfilecommdescr) = 10 
              AND pfilecommdescr LIKE '04%')
    AND (SELECT COUNT(*) FROM _dfilelinkpfile 
         WHERE pfileid = y.pfileid AND pfiletypeid = 2) > 1
    AND NOT EXISTS (SELECT 1 FROM _DFileLinkPFile 
                    WHERE PFileId = y.PFileId AND PFileTypeId <> 2)
    AND y.c = 1

我尝试运行它几个小时，但没有得到结果。我不明白为什么它需要这么长时间，所以我写了一个游标：

IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = 'tblTemp_test')
BEGIN
	CREATE TABLE tblTemp_test (PFileId INT, GSM varchar(255), c INT)
END

DECLARE @PFileId BIGINT

DECLARE CheckPFileId CURSOR FOR SELECT PFileId FROM _PFile ORDER BY PFileId DESC

OPEN CheckPFileId

FETCH NEXT FROM CheckPFileId INTO @PFileId

WHILE @@Fetch_Status = 0 AND (SELECT COUNT(*) FROM dbo.tblTemp_Axel_test) <> 5
BEGIN
	insert into dbo.tblTemp_Axel_test (PFileId, GSM, c)
    select
    y.PFileId,
    (select top 1 pfilecommdescr from _PFileComm where PFileId = y.PFileId and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like '04%') as gsm,
    y.c as c
    from
		(
			select
			x.pfileid,
			count(*) as c
			from
				(
					select
					dflpf.pfileid,
					df.dfiletypeid,
					count(*) as c
					from _dfile df
					join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid and     dflpf.pfiletypeid = 2
					where dflpf.PFileId = @PFileId
					group by dflpf.pfileid,df.dfiletypeid
				) x
			group by x.pfileid
		) y
    where exists (select 1 from _pfilecomm where pfileid = y.pfileid and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like '04%')
    and (select count(*) from _dfilelinkpfile where pfileid = y.pfileid and pfiletypeid = 2) > 1
    and not exists (select 1 from _DFileLinkPFile where PFileId = y.PFileId and PFileTypeId <> 2)
    and y.c = 1
    
    FETCH NEXT FROM CheckPFileId INTO @PFileId
END

CLOSE CheckPFileId

DEALLOCATE CheckPFileId 

select * from dbo.tblTemp_test

它在几秒钟内给出了答案。有人知道为什么第一个查询不能在类似的时间内给我结果吗？我以为应该尽量避免在SQL Server中使用游标以避免性能问题。这可能是我没有写出最优化的查询。我问这个问题是因为我之前也遇到过这个问题，找不到答案。

英文:

I got asked the question to find 5 people (pfileid) in our system that have a mobile phone number in our system and that have multiple files that can be of only one type (dfiletypeid). They also have to be a specific type of person on the file (pfiletypeid = 2).

I came up with this query:

SELECT TOP 5
    y.PFileId,
    (SELECT TOP 1 pfilecommdescr FROM _PFileComm 
     WHERE PFileId = y.PFileId AND pfilecommtypeid = 2 
       AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE &#39;04%&#39;) AS gsm,
    y.c AS c
FROM
	(SELECT
		 x.pfileid,
		 COUNT(*) AS c
	 FROM
		 (SELECT
			  dflpf.pfileid,
			  df.dfiletypeid,
			  COUNT(*) AS c
		  FROM 
              _dfile df
		  JOIN
              _dfilelinkpfile dflpf ON df.dfileid = dflpf.dfileid 
                                    AND dflpf.pfiletypeid = 2
		  GROUP BY
              dflpf.pfileid, df.dfiletypeid) x
     GROUP BY
         x.pfileid) y
WHERE
    EXISTS (SELECT 1 FROM _pfilecomm 
            WHERE pfileid = y.pfileid AND pfilecommtypeid = 2 
              AND LEN(pfilecommdescr) = 10 
              AND pfilecommdescr LIKE &#39;04%&#39;)
    AND (SELECT COUNT(*) FROM _dfilelinkpfile 
         WHERE pfileid = y.pfileid AND pfiletypeid = 2) &gt; 1
    AND NOT EXISTS (SELECT 1 FROM _DFileLinkPFile 
                    WHERE PFileId = y.PFileId AND PFileTypeId &lt;&gt; 2)
    AND y.c = 1

I tried to run it for hours and I didn't get a result. I don't understand why it takes so long so I wrote a cursor instead:

IF NOT EXISTS (SELECT 1 FROM sys.tables WHERE name = &#39;tblTemp_test&#39;)
BEGIN
CREATE TABLE tblTemp_test (PFileId INT, GSM varchar(255), c INT)
END
DECLARE @PFileId BIGINT
DECLARE CheckPFileId CURSOR FOR SELECT PFileId FROM _PFile ORDER BY PFileId DESC
OPEN CheckPFileId
FETCH NEXT FROM CheckPFileId INTO @PFileId
WHILE @@Fetch_Status = 0 AND (SELECT COUNT(*) FROM dbo.tblTemp_Axel_test) &lt;&gt; 5
BEGIN
insert into dbo.tblTemp_Axel_test (PFileId, GSM, c)
select
y.PFileId,
(select top 1 pfilecommdescr from _PFileComm where PFileId = y.PFileId and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like &#39;04%&#39;) as gsm,
y.c as c
from
(
select
x.pfileid,
count(*) as c
from
(
select
dflpf.pfileid,
df.dfiletypeid,
count(*) as c
from _dfile df
join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid and     dflpf.pfiletypeid = 2
where dflpf.PFileId = @PFileId
group by dflpf.pfileid,df.dfiletypeid
) x
group by x.pfileid
) y
where exists (select 1 from _pfilecomm where pfileid = y.pfileid and pfilecommtypeid = 2 and len(pfilecommdescr) = 10 and pfilecommdescr like &#39;04%&#39;)
and (select count(*) from _dfilelinkpfile where pfileid = y.pfileid and pfiletypeid = 2) &gt; 1
and not exists (select 1 from _DFileLinkPFile where PFileId = y.PFileId and PFileTypeId &lt;&gt; 2)
and y.c = 1
FETCH NEXT FROM CheckPFileId INTO @PFileId
END
CLOSE CheckPFileId
DEALLOCATE CheckPFileId 
select * from dbo.tblTemp_test

It came up with the answer in seconds. Does anyone know why the first query can't give me a result in a similar time? I thought cursors were supposed to be the avoided at all costs in sql server for performance issues. It could be I didn't write the query optimally. I'm asking because I've had this problem before and I can't find an answer to it.

I tried to look in the estimated execution plan but it doesn't give me an answer to why it takes so long.

答案1

得分: 1

你在将查询包装成游标时所做的唯一更改是在最内部的查询中添加了一个WHERE子句，因此它必定是速度提升的原因。

我会假设至少有一个表_dfile、_dfilelinkpfile存储了大量数据，并且您使用的连接 - join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid - 产生了一个非常大的结果集。

使用游标，您将该结果集限制为当前处理的PFileID记录，这可能只是您的JOIN可以返回的所有记录的一个小子集，这使得整个查询运行速度快。如果您的数据库中有很多人满足您描述的条件，那么SQL Server只需要评估游标的少数迭代，而不是计算完整的连接结果集并对其执行整个查询。

您可以通过以下方式验证上述内容：

使用附加的WHERE子句运行您的原始查询，指定单个ID - 就像在游标中一样（它可能会运行得很快）；
计算游标中的迭代次数并将数字存储在某个地方（它可能远远低于您的数据库中不同的PFileID数量）。

话虽如此，在这里使用游标仍然不是一个好主意 - 它仅表示您的查询未以最佳方式编写或优化器无法正确处理它。例如，在您的查询中存在一些不必要的聚合（最内部查询中的COUNT(*)，您从未使用过）。

没有您的数据库结构，很难进行任何测试，但我建议稍微简化查询：

SELECT TOP 5 PFileId, pcomm.PFileCommDescr, dflpf.FileCount
FROM _dfile df

-- only people having mobile phone number
CROSS APPLY(
    SELECT TOP 1 PFileCommDescr 
    FROM _pfilecomm 
    WHERE PFileId = df.PFileId AND pfilecommtypeid = 2 
        AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE '04%'
) pcomm

-- only people who have more than one file with pfiletypeid = 2
CROSS APPLY(
    SELECT COUNT(*) as FileCount
    FROM _dfilelinkpfile
    WHERE df.PFileId = PFileId AND pfiletypeid = 2
    GROUP BY PFileId
    HAVING COUNT(*) > 1
) dflpf

-- only people whose all files have the same dfiletypeid
CROSS APPLY(
    SELECT COUNT(DISTINCT dfiletypeid) as DFileTypeCount
    FROM _dfilelinkpfile
    WHERE df.PFileId = PFileId
    GROUP BY PFileId
    HAVING COUNT(*) = 1
) dflpf_ntypes

英文:

The only thing you have changed in your query when wrapping it in cursor was adding a WHERE clause in the innermost query, so it must be the cause for the speedup.

I would assume that at least one of tables _dfile, _dfilelinkpfile stores a lot of data and that join you have used - join _dfilelinkpfile dflpf on df.dfileid = dflpf.dfileid - produces a very big result set.

Using cursor, you constrain that result set to records with currenly processed PFileID, which is probably a tiny subset of all records that your JOIN can return, which makes the whole query run fast. If there are a lot of people in your database that satisfy conditions you described, there is a high chance that Sql Server will only have to evaluate a few iterations of cursor, as opposed to computing the full joining result set and performing the whole query on it.

You can verify the above by:

running your original query with additional WHERE clause, specifying a single ID - just like in your cursor (it will probably run fast);
counting iterations in your cursor and storing the number somewhere (it will probably be much lower than number of distinct PFileIDs in your database).

That being said, using a cursor here is still not a great idea - the fact that it helped with performance only means that your query is not written in optimal way or that the optimizer cannot handle it correctly. For instance, there is some unneccessary aggregation in your query (COUNT(*) in the innermost query that you're never using).

It's impossible to test anything without your database structure, but I would try to simplify the query a bit:

SELECT TOP 5 PFileId, pcomm.PFileCommDescr, dflpf.FileCount
FROM _dfile df
-- only people having mobile phone number
CROSS APPLY(
SELECT TOP 1 PFileCommDescr 
FROM _pfilecomm 
WHERE PFileId = df.PFileId AND pfilecommtypeid = 2 
AND LEN(pfilecommdescr) = 10 AND pfilecommdescr LIKE &#39;04%&#39;
) pcomm
-- only people who have more than one file with pfiletypeid = 2
CROSS APPLY(
SELECT COUNT(*) as FileCount
FROM _dfilelinkpfile
WHERE df.PFileId = PFileId AND pfiletypeid = 2
GROUP BY PFileId
HAVING COUNT(*) &gt; 1
) dflpf
-- only people whose all files have tha same dfiletypeid
CROSS APPLY(
SELECT COUNT(DISTINCT dfiletypeid) as DFileTypeCount
FROM _dfilelinkpfile
WHERE df.PFileId = PFileId
GROUP BY PFileId
HAVING COUNT(*) = 1
) dflpf_ntypes

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

SQL Server中使用游标解决的查询性能问题

问题

答案1

随机化查询结果顺序，带分页。

在SQL中，出现了带有CASE WHEN表达式的ORDER BY错误。

MSSQL的ORDER BY与OFFSET和FETCH NEXT组合使用时无法正常工作

如何转换一个SQL表？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论