英文:
More performant SQL Delete than using NOT EXISTS?
问题
以下是翻译好的部分:
我有一个查询,从表中删除大量数据,如下所示。 它使用一个基于尝试不破坏事务日志的 while 循环,但 Customers 表中有约 2 亿条记录,它正在删除约 200 万条记录。 我想知道是否替换 NOT EXISTS 会有所帮助。
WHILE (1=1)
BEGIN
DELETE TOP(10000) FROM Customers
WHERE NOT EXISTS (SELECT * FROM CustomerInvoices WHERE CustomerInvoices.CustomerId =
Customers.CustomerId)
IF (@@ROWCOUNT = 0)
BREAK
END
英文:
I have a query that is deleting a lot of data from a table as follows. It uses a while loop based on trying to not destroy the transaction log but the Customers table has about 200 million records in it and it is deleting approx. 2 million. I was wondering if replacing the NOT EXISTS would help at all.
WHILE (1=1)
BEGIN
DELETE TOP(10000) FROM Customers
WHERE NOT EXISTS (SELECT * FROM CustomerInvoices WHERE CustomerInvoices.CustomerId =
Customers.CustomerId)
IF (@@ROWCOUNT = 0)
BREAK
END
答案1
得分: 1
你的问题是,在查找匹配“NOT EXISTS”之前,需要检查Customers中的行数,每批次都会增加。
匹配行的比例会稳步下降,直到最后一批,您需要扫描剩下的1.98亿行才能找到最后的1万行。
您总共有200批次。平均每个批次都会读取1亿行Customers中的数据(最早的批次较少,后面的批次较多),总共从该表中读取了200亿行,而在CustomerInvoices表中也有类似数量的行。
如果执行计划是串行扫描,那么很可能每个批次都会遍历所有已经处理过的行(在每个以前的批次中都已经确定不符合条件),然后最终才会处理感兴趣的行。
您可以创建一个带有连续整数列的临时表...
DECLARE @LastRow INT
CREATE TABLE #DeleteCandidates(Id int PRIMARY KEY, CustomerId INT);
INSERT #DeleteCandidates
SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT 0)) AS Id,
Customers.CustomerId
FROM Customers
WHERE NOT EXISTS (SELECT *
FROM CustomerInvoices
WHERE CustomerInvoices.CustomerId = Customers.CustomerId)
SET @LastRow = @@ROWCOUNT
然后编写一些代码以处理该临时表中的“<batch_size>”范围的“Id”。
例如,如下所示...
DECLARE @BatchSize INT = 10000
DECLARE @MinId INT = 1
WHILE @MinId <= @LastRow
BEGIN
DELETE FROM Customers
WHERE Customers.CustomerId IN (SELECT dc.CustomerId
FROM #DeleteCandidates dc
WHERE dc.Id >= @MinId
AND dc.Id < @MinId + @BatchSize)
AND NOT EXISTS (SELECT *
FROM CustomerInvoices/*WITH (HOLDLOCK )*/
WHERE CustomerInvoices.CustomerId = Customers.CustomerId)
SET @MinId = @MinId + @BatchSize
END
在实际的DELETE操作中,仍然需要使用“NOT EXISTS”,以防自识别为删除候选项的标识已不再符合条件。
您还可以考虑使用“HOLDLOCK”提示来处理在DELETE查询本身运行时可能发生的真正并发插入情况。
英文:
Your problem is that the number of rows in Customers it needs to check before finding 10000
matching the NOT EXISTS
grows every batch.
The ratio of matching rows will steadily drop until by the final batch you are scanning the whole 198 million remaining rows to find the last 10,000.
You are doing 200 batches. On average each batch reads 100 million in Customers
rows (the earliest batches much less and the later ones more more) - this totals to 20 billion rows read over all just from that table and a similar amount in CustomerInvoices
.
If the execution plan is a serial scan then likely every batch this will go over all the ones already processed in every previous batch and found to be not eligible before finally getting to the ones of interest.
You can create a temp table with a sequential integer column...
DECLARE @LastRow INT
CREATE TABLE #DeleteCandidates(Id int PRIMARY KEY, CustomerId INT);
INSERT #DeleteCandidates
SELECT ROW_NUMBER()
OVER (
ORDER BY (SELECT 0)) AS Id,
Customers.CustomerId
FROM Customers
WHERE NOT EXISTS (SELECT *
FROM CustomerInvoices
WHERE CustomerInvoices.CustomerId = Customers.CustomerId)
SET @LastRow = @@ROWCOUNT
Then write some code to process that temp table in <batch_size>
chunks of Id
ranges.
e.g. as below
DECLARE @BatchSize INT = 10000
DECLARE @MinId INT = 1
WHILE @MinId <= @LastRow
BEGIN
DELETE FROM Customers
WHERE Customers.CustomerId IN (SELECT dc.CustomerId
FROM #DeleteCandidates dc
WHERE dc.Id >= @MinId
AND dc.Id < @MinId + @BatchSize)
AND NOT EXISTS (SELECT *
FROM CustomerInvoices/*WITH (HOLDLOCK )*/
WHERE CustomerInvoices.CustomerId = Customers.CustomerId)
SET @MinId = @MinId + @BatchSize
END
You still need a NOT EXISTS
on the actual DELETE
in case there were inserts since the identification that means a delete candidate is no longer eligible.
You might also consider the HOLDLOCK
hint to deal with the possibility of truly concurrent inserts whilst the DELETE
query itself is running.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论