2023年2月27日 13:46:50go评论89阅读模式

英文:

How to make SQL only read CTE referenced in IN clause once?

问题

WITH RootItemIDs AS 
(
    SELECT DISTINCT ParentID 
    FROM C_Item 
    WHERE ParentID NOT IN (SELECT ItemID FROM C_Item)
)
SELECT COUNT(*)
FROM C_OrderHeader oh
JOIN C_OrderDetail od ON oh.OrderID = od.OrderID
JOIN C_Item i ON od.ItemID = i.ItemID
JOIN C_Item rootItem ON i.ParentID = rootItem.ItemID 
                     OR (i.ParentID IN (SELECT ParentID FROM RootItemIDs) 
                         AND i.ItemID = rootItem.ItemID)

英文:

WITH RootItemIDs AS 
(
	SELECT DISTINCT ParentID 
    FROM C_Item 
    WHERE ParentID NOT IN (SELECT ItemID FROM C_Item)
)
SELECT COUNT(*)
FROM C_OrderHeader oh
JOIN C_OrderDetail od ON oh.OrderID = od.OrderID
JOIN C_Item i ON od.ItemID = i.ItemID
JOIN C_Item rootItem ON i.ParentID = rootItem.ItemID 
                     OR (i.ParentID IN (SELECT * FROM RootItemIDs) 
                         AND i.ItemID = rootItem.ItemID)

I'm using COUNT(*) for convenience here, but eventually the point will be to create a view with columns for the order item and show information about its parent item next to it.

Some C_Item rows have ParentID values that don't point to anything though, in which case it should be considered the "parent" (this is the reason I used the term "root" instead of "parent" in the CTE to be a little more accurate).

Here's the plan: https://www.brentozar.com/pastetheplan/?id=ByD4R3F0s. Notice the part in the attached picture where it is returning millions of rows for a table that only has 83 records because it is reading the table for every order detail. FYI, I tried with a temp table instead of a CTE and got the same issue.

Hardcoding values in the IN clause (in place of SELECT * FROM RootItemIDs) fixes the issue, of course, but obviously I'd prefer not to do that so I don't have to update the query in the future if more values or added.

How would you rewrite this to not scan C_Items repeatedly? (and preferably without temp tables if possible as I'd like to be able to use it in a view.)

EDIT: Forgot to add the aforementioned picture; added now.

答案1

得分: 1

Here is the translated code part:

**ORACLE**
对于我们不了解实际数据（或RDBMS）的情况，我们只能尝试创建一些想象中的示例数据，并尝试回答问题。假设您的cte RootItemIDs 应该用于过滤具有不存在的PARENT_ID的ITEM_ID，示例数据（包括您的cte）类似于以下内容：

The code provided is quite complex and seems to involve SQL database operations. If you have any specific questions or need further assistance with this code, please feel free to ask.

英文:

ORACLE
Not having any insight into actual data (or RDBMS) you are working with, we could only try to create some imaginary sample data and try to answer the question. Assuming that your cte RootItemIDs should be used to filter just the ITEM_IDs with non existant PARENT_ID with the sample data (including your cte) similar to this:

WITH
    order_header(ORDER_ID, ORDER_DATE, CUSTOMER_ID) AS 
        (
            Select 1, To_Date(&#39;01.02.2023&#39;, &#39;dd.mm.yyyy&#39;), 101 From Dual Union ALL
            Select 2, To_Date(&#39;01.02.2023&#39;, &#39;dd.mm.yyyy&#39;), 101 From Dual Union ALL
            Select 3, To_Date(&#39;01.02.2023&#39;, &#39;dd.mm.yyyy&#39;), 101 From Dual 
        ),
    order_detail (ORDER_ID, ITEM_ID, QUANTITY, PRICE ) AS
        (
            Select 1,  &#39;ITM_11&#39;, 10, 15 From Dual Union All
            Select 1,  &#39;ITM_12&#39;, 12, 10 From Dual Union All
            -- 
            Select 2,  &#39;ITM_21&#39;, 15, 12 From Dual Union All
            Select 2,  &#39;ITM_22&#39;, 12, 19 From Dual Union All
            Select 2,  &#39;ITM_23&#39;, 25, 17 From Dual Union All
            --
            Select 3,  &#39;ITM_31&#39;, 15, 12 From Dual 
        ),
    item (ITEM_ID, PARENT_ID) AS
        (
            Select &#39;ITM_10&#39;, &#39;ITM_10&#39; From Dual Union All
            Select &#39;ITM_11&#39;, &#39;ITM_10&#39; From Dual Union All
            Select &#39;ITM_12&#39;, &#39;ITM_10&#39; From Dual Union All
            --
            Select &#39;ITM_20&#39;, &#39;ITM_20&#39; From Dual Union All
            Select &#39;ITM_21&#39;, &#39;ITM_20&#39; From Dual Union All
            Select &#39;ITM_22&#39;, &#39;ITM_20&#39; From Dual Union All
            Select &#39;ITM_23&#39;, &#39;ITM_20&#39; From Dual Union All
            --
            Select &#39;ITM_30&#39;, &#39;ITM_30&#39; From Dual Union All
            Select &#39;ITM_31&#39;, &#39;ITM_35&#39; From Dual   -- ITM_35 non existing PARENT_ID
        ),
--
    RootItemIDs AS 
        (
            SELECT DISTINCT PARENT_ID 
            FROM item 
            WHERE PARENT_ID NOT IN (SELECT ITEM_ID FROM item)
        )

With this kind of data you don't need the last join in your SQL - just move the select condition using your cte to the previous JOIN. The code would be like below:

SELECT  Count(*) &quot;COUNT_OF&quot;
FROM    order_header oh
JOIN    order_detail od ON (oh.ORDER_ID = od.ORDER_ID)
JOIN    item i ON (od.ITEM_ID = i.ITEM_ID And i.PARENT_ID IN(SELECT * FROM RootItemIDs) )

... with sample data from above this would return:

  COUNT_OF
----------
         1

... and if you change your Select clause then:

--  SELECT  i.ITEM_ID, i.PARENT_ID
ITEM_ID PARENT_ID
------- ---------
ITM_31  ITM_35   
--  Select * 
  ORDER_ID ORDER_DATE CUSTOMER_ID   ORDER_ID ITEM_ID   QUANTITY      PRICE ITEM_ID PARENT_ID
---------- ---------- ----------- ---------- ------- ---------- ---------- ------- ---------
         3 01-FEB-23          101          3 ITM_31          15         12 ITM_31  ITM_35

Addition after comments
still missing sample data and expected result, so still using above sample data and maybe this could help (pretty sure it could be adjusted to sql-server too):

SELECT  i.ITEM_ID, 
        CASE WHEN i.PARENT_ID IN(SELECT * FROM RootItemIDs) THEN i.ITEM_ID || &#39; - self id&#39; ELSE i.PARENT_ID END &quot;PARENT_OR_SELF_ID&quot;
FROM    order_header oh
JOIN    order_detail od ON (oh.ORDER_ID = od.ORDER_ID)
JOIN    item i ON (od.ITEM_ID = i.ITEM_ID)
--  R e s u l t :
ITEM_ID PARENT_OR_SELF_ID
------- -----------------
ITM_11  ITM_10            
ITM_12  ITM_10            
ITM_21  ITM_20            
ITM_22  ITM_20            
ITM_23  ITM_20            
ITM_31  ITM_31 - self id

ADDITION 2
I added column DESCRIPTION in item table:

    item (ORDER_ID, ITEM_ID, PARENT_ID, DESCRIPTION) AS
        (
            Select 1, &#39;ITM_10&#39;, Null, &#39;I am a parent 10&#39; From Dual Union All
            Select 1, &#39;ITM_11&#39;, &#39;ITM_10&#39;, &#39;Child 11&#39; From Dual Union All
            Select 1, &#39;ITM_12&#39;, &#39;ITM_10&#39;, &#39;Child 12&#39; From Dual Union All
            --
            Select 2, &#39;ITM_20&#39;, Null, &#39;I am a parent 20&#39; From Dual Union All
            Select 2, &#39;ITM_21&#39;, &#39;ITM_20&#39;, &#39;Child 21&#39; From Dual Union All
            Select 2, &#39;ITM_22&#39;, &#39;ITM_20&#39;, &#39;Child 22&#39; From Dual Union All
            Select 2, &#39;ITM_23&#39;, &#39;ITM_20&#39;, &#39;Child 23&#39; From Dual Union All
            --
            Select 3, &#39;ITM_30&#39;, Null, &#39;I am a parent 30&#39; From Dual Union All
            Select 3, &#39;ITM_31&#39;, &#39;ITM_35&#39;, &#39;Child 31&#39; From Dual 
        )

... and now making another join to item table to get the DESCRIPTION

SELECT  i.ITEM_ID, 
        CASE WHEN i.PARENT_ID IN(SELECT * FROM RootItemIDs) THEN i.ITEM_ID || &#39; - self id&#39; ELSE i.PARENT_ID END &quot;PARENT_OR_SELF_ID&quot;,
        i2.DESCRIPTION
FROM    order_header oh
JOIN    order_detail od ON (oh.ORDER_ID = od.ORDER_ID)
JOIN    item i ON (od.ITEM_ID = i.ITEM_ID)
JOIN    item i2 ON(i2.ITEM_ID = CASE WHEN i.PARENT_ID IN(SELECT * FROM RootItemIDs) THEN i.ITEM_ID ELSE i.PARENT_ID END)
ITEM_ID PARENT_OR_SELF_ID DESCRIPTION    
------- ----------------- ----------------
ITM_11  ITM_10            I am a parent 10 
ITM_12  ITM_10            I am a parent 10 
ITM_21  ITM_20            I am a parent 20 
ITM_22  ITM_20            I am a parent 20 
ITM_23  ITM_20            I am a parent 20 
ITM_31  ITM_31 - self id  Child 31

Does this help?

答案2

得分: 0

以下是翻译好的部分：

Interestingly, I had already tried d r's final approach which, in essence, just converts the relevant join condition to a CASE statement.
有趣的是，我已经尝试过d r的最终方法，本质上只是将相关的连接条件转换为CASE语句。
In fact, though I didn't update my post, I had actually updated my own code to look just like this but it didn't change anything.
事实上，虽然我没有更新我的帖子，但我实际上已经更新了我的代码，使它看起来像这样，但没有改变任何东西。
The way I ended up fixing my issue was to hardcode specific value ranges that we wouldn't violate.
我最终解决问题的方法是硬编码特定的值范围，以确保我们不会违反。
The first big lesson I learned was that having "Include Actual Execution Plan" on slowed the query down BIG time.
我学到的第一个重要教训是，打开“包括实际执行计划”会显著减慢查询速度。
The second big lesson I learned is that the order of JOINS can matter.
我学到的第二个重要教训是连接的顺序可能很重要。
Finally, I tried d r's final solution. After rewriting it to work in SQL Server and renaming the tables and such appropriately to match mine, I realized it was identical to a query I had already tried which, at the time, did not perform any better than the original one I posted.
最后，我尝试了d r的最终解决方案。在将其重写以在SQL Server中运行并适当地重命名表以匹配我的表后，我意识到它与我之前尝试过的查询完全相同，在当时的情况下，性能并不比我发布的原始查询好。

英文:

There ended up being multiple answers to this question.

Interestingly, I had already tried d r's final approach which, in essence, just converts the relevant join condition to a CASE statement. In fact, though I didn't update my post, I had actually updated my own code to look just like this but it didn't change anything.

The way I ended up fixing my issue was to hardcode specific value ranges that we wouldn't violate. Not ideal, but it was a reasonable solution that brought the execution time under a second. But I continued working on the problem anyway just because I wanted to learn what the issue was so...

The first big lesson I learned was that having "Include Actual Execution Plan" on slowed the query down BIG time. With it on it took 36 seconds; with it off it took 6 seconds. On this note, I also found an article by Jeff Moden where he had a similar problem when SET STATISTICS TIME ON was significantly slowing down queries with scalar functions. So takeaway is that when you are writing new queries and testing the performance, be sure to test without these tools on too because they can dramatically affect the performance at times.

The second big lesson I learned is that the order of JOINS can matter. Just by flipping the join order around, the query ran in under a second. The reason for this, I found, is likely because SQL will only spend so long trying to find the best plan. So ordering your JOINS such that they are already in or close to the best order can help it find the optimal plan faster before it runs out of time.

Finally, I tried d r's final solution. After rewriting it to work in SQL Server and renaming the tables and such appropriately to match mine, I realized it was identical to a query I had already tried which, at the time, did not perform any better than the original one I posted. Yet, now, it runs in under a second. I can't explain this, unfortunately, except that maybe this is one is "closer" to an optimal plan and whereas previously SQL Server wasn't able to cross the finish line to find that right plan, now it does. Crazy stuff.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何使 SQL 仅在 IN 子句中引用的公共表表达式 (CTE) 中读取一次？

问题

答案1

答案2

使用MySQL中的AS关键字从表格的另一列中选择一个名称。

使用两个相同值的ORDER BY列的DENSE_RANK()

重置所有 WooCommerce 产品库存为 0

Azure Synapse Analytics SQL 查询与变量

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。