两个数据库中对同一查询使用不同的索引

huangapple go评论49阅读模式
英文:

Different indexes being used for the same query in two databases

问题

我有以下查询,当我在两个具有不同数据的不同数据库中运行查询时,执行计划似乎不同。

在数据库A和B中检查执行计划时,A使用了在列<COMPANY, ACCOUNTING_PERIOD, ACCOUNTING_YEAR, VOUCHER_TYPE>上定义的索引,而B使用了在列<PC_ID, COMPANY, ACCOUNTING_YEAR>上定义的索引。两个数据库都定义了这两个索引,但在执行相同查询时,数据库的行为不同。

有人知道为什么会发生这种情况吗?表中的数据是否会对此产生影响?

英文:

I have the below query and when I run the query in two different databases with different data, the explain plan seems to be different.

      SELECT *
      FROM   voucher_row_tab g,
             voucher_type_tab v
      WHERE  g.company            = &#39;12&#39;
      AND    (g.voucher_date BETWEEN to_date(&#39;2022-12-10&#39;, &#39;YYYY-MM-DD&#39;)  AND (&#39;2023-12-10&#39;, &#39;YYYY-MM-DD&#39;))
      AND    (g.account      BETWEEN  &#39;100&#39;    AND &#39;200&#39;)
      AND    g.company            = v.company
      AND    g.voucher_type       = v.voucher_type  
      AND    v.simulation_voucher  != &#39;TRUE&#39;
      AND    &#39;TRUE&#39; = (SELECT TEST_API.CHECK (g.company, g.pc_id) FROM DUAL);

I will call the two databases A and B.

When I check the explain plan in A it uses an index defined on the columns <COMPANY, ACCOUNTING_PERIOD, ACCOUNTING_YEAR, VOUCHER_TYPE> and B uses an index defined on the columns <PC_ID, COMPANY, ACCOUNTING_YEAR>. Both the databases have both the indexes defined, but upon the execution of the same query, the databases behave differently.

Anyone has any idea why this happens? Does the data in the tables have any impact on this?

答案1

得分: 2

是的,数据库中不同的数据会影响索引的使用。对于具有基于成本的查询优化器的数据库来说,这正是期望的行为,因为在所有情况下使用索引并不一定是计算查询的最佳(=最快)选项。举一个这样的情况的例子是当表本身相当小且读取整个表比首先读取索引、找到要读取的表的部分,然后只读取该表更快时。解释足够深入的细节超出了本回答的范围——Oracle数据库文档对这个广泛主题有一个相当不错的介绍。理解不同执行计划的一个好方法是查看两个数据库上的EXPLAIN PLAN输出,并比较每个步骤的预期行数/成本。

英文:

Yes, the different data present in the databases influences the use of indexes.

For databases with cost-based query optimisers, this is exactly the desired behaviour as using an index is not necessarily the best (=fastest) option to compute the query in all situations.

One example for such a situation is when the table itself is fairly small and reading the whole table turns out to be quicker than first reading the index, finding which part of the table to read, and only read the table then.

Explaining the details in sufficient depth is beyond the scope of this answer - the Oracle databases documentation has a reasonably good introduction into this broad topic.

A good place to start understanding the different execution plans is to review the EXPLAIN PLAN output on both databases and compare the expected rows/costs for each step.

答案2

得分: 1

Oracle的基于成本的优化器(CBO)根据各种类型的统计数据(段、系统、(近)实时、动态抽样等)进行复杂计算,考虑到行计数、不同值计数、分布、块计数、高值和低值、直方图等等,试图预测数千种不同计划排列的时间成本,然后选择它认为成本最低的计划。这意味着计划不是稳定的 - 随着数据的变化和环境以各种方式发生变化,计划会在没有警告的情况下发生变化。您可以锁定它们,但通常它们可以随着数据的不断变化和Oracle对数据(统计数据)的了解而自由变化。这是一个程序试图变得智能和适应性的副作用。

因此,您不应该在两个不同的数据库中或者在同一数据库的明天与今天期望相同的计划。然而,由于涉及的复杂性,Oracle(或任何其他RDBMS)经常会犯错误,其计算的成本是错误的,如果是这种情况,计划将是错误的。这是一个非常常见的情况。对于大多数SQL来说,差异是不可检测的,我们不会过多担心它。但有时候,错误会导致糟糕的执行计划,使我们无法满足SLA,这就需要我们的注意。

如果您希望强制执行某些计划步骤,那么可以使用提示来限制优化器的选择范围。虽然提示是不可移植的,而且通常在开发社区中不被鼓励使用,但它们在计划的变化威胁到我们的SLA时做得非常出色。但它们应该只由了解自己在做什么的知识渊博的人来使用,否则它们可能对您产生负面影响。这是一个复杂的主题,不幸的是这里无法详细涵盖。对于无法使用提示或没有足够知识正确使用提示的情况,您可以做的最好的事情是确保参与查询的每个对象(表和索引)具有良好、不过时的统计信息。这将至少解决部分问题,尽管不能解决所有问题。

英文:

Oracle's cost-based optimizer (CBO) does a complex computation based on statistics of various kinds (segment, system, (near)real-time, dynamic sampling, etc..) considering rowcounts, distinct value counts, distribution, block counts, high and low values, histograms, etc.. and attempts to predict the time cost for potentially thousands of different plan permutations, then selects what it believes will be the lowest cost. This means that the plan is not stable - as the data changes and the environment shifts in various ways, plans change without warning. You can lock them down, but normally they are free to shift with the changing reality of data and Oracle's knowledge of that data (statistics). This is the side-effect of a program trying to be intelligent and adaptive.

Therefore, you should never assume or expect the same plan in two different databases, or even in the same database tomorrow vs. today. However, because of the complexity involved Oracle (or any other RDBMS) quite often makes mistakes and its calculated costs are wrong, in which case the plan will be wrong. This is a very common occurance. For most SQLs the difference is undetectable and we don't bother worrying about it. But sometimes the mistake causes really bad execution plans to cause us to miss our SLAs, and those require our attention.

If you wish to force certain plan steps, then utilize hints to limit the optimizer's set of choices. While hints are non-portable and often discouraged in the development community, they do a very good job of stabilizing plans when plan variability threatens our SLAs. But they should be used only by a knowledge person who has a good understanding of what they're doing, or they can work against you. This is complex subject that unfortunately cannot be covered here. For situations where hints cannot be used or there isn't enough knowledge to use them correctly, the best thing you can do is ensure that each object (tables and indexes) involved in a query have good, non-stale statistics. That will resolve at least a good portion of issues, though not all.

答案3

得分: 0

SELECT g., v.
FROM voucher_row_tab AS g
JOIN voucher_type_tab AS v
ON g.company = v.company
AND g.voucher_type = v.voucher_type
WHERE g.company = '12'
AND g.voucher_date BETWEEN '2022-12-10' AND '2023-12-10'
AND g.account BETWEEN '100' AND '200'
AND (SELECT TEST_API.CHECK(g.company, g.pc_id) FROM DUAL) = 'TRUE'
AND v.simulation_voucher != 'TRUE';

问题:'TRUE' 是一个4个字符的字符串吗?还是一个布尔值?

问题:你是想要从两年的12月10日都查询吗?

以下可能有助于提高效率:

g:INDEX(company, voucher_date)
v:INDEX(simulation_voucher, company, voucher_type)

英文:

First let me rewrite it a bit.

SELECT  g.*, v.*
    FROM  voucher_row_tab AS g
    JOIN  voucher_type_tab AS v
       ON  g.company = v.company
      AND  g.voucher_type = v.voucher_type
    WHERE  g.company = &#39;12&#39;
      AND  g.voucher_date BETWEEN &#39;2022-12-10&#39; AND &#39;2023-12-10&#39;
      AND  g.account BETWEEN &#39;100&#39; AND &#39;200&#39;
      AND  ( SELECT  TEST_API.CHECK (g.company, g.pc_id) FROM  DUAL ) = &#39;TRUE&#39;
      AND  v.simulation_voucher != &#39;TRUE&#39;
    ;

Question: Is &#39;TRUE' a 4-character string? Or a Boolean value?

Question: Do you want the 10th of December from both years?

These might help with efficiency:

g:  INDEX(company, voucher_date)
v:  INDEX(simulation_voucher, company,  voucher_type)

huangapple
  • 本文由 发表于 2023年6月5日 18:55:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76405746.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定