2023年4月27日 09:18:43go评论120阅读模式

英文:

PREPARE statement decides to use no index after a few calls even thought it's slower

问题

我已经为产品的Title创建了一个索引。

CREATE INDEX products_title_trgm_idx 
ON products USING gin(title gin_trgm_ops)

我尝试创建一个准备语句，并在多次调用中使用相同的搜索值，它在前4个查询中使用了我的索引，速度非常快，大约20毫秒。但是在第五次调用时，它随机决定不使用索引，导致执行时间为3-4秒。为什么会这样呢？

PREPARE my_test AS SELECT "products"."id","products"."active","products"."title","products"."subtitle","products"."description","products"."isbn","products"."ean","products"."bznr","products"."cover_picture","products"."publication_date","products"."edition","products"."publisher","products"."stock","products"."delivery_time","products"."selling_price","products"."width","products"."height","products"."length","products"."weight" FROM "products" WHERE products.title ILIKE $1 LIMIT 10;

我创建了这个准备语句，然后执行了以下操作：

EXECUTE my_test('%Warum Frauen alles besser wissen - und trotzdem alles falsch machen%');

连续执行了5次，前4次它使用了索引并且非常快，但是在第五次和之后的次数中，它使用了顺序扫描，速度非常慢。为什么会这样呢？我尝试禁用顺序扫描，它会使用索引并且始终在20毫秒内执行，但我知道禁用顺序扫描不是一个好主意。

英文:

I have created an index for Title for products.

CREATE INDEX products_title_trgm_idx 
ON products USING gin(title gin_trgm_ops)

I try to create a prepare statement and when i execute multiple calls on it with the same search value it uses my index for the first 4 queries and it's super fast around 20 ms. But on the fifth time it just randomly decides not to use the index which makes it execute 3-4s. Why is that happening?

PREPARE my_test AS SELECT &quot;products&quot;.&quot;id&quot;,&quot;products&quot;.&quot;active&quot;,&quot;products&quot;.&quot;title&quot;,&quot;products&quot;.&quot;subtitle&quot;,&quot;products&quot;.&quot;description&quot;,&quot;products&quot;.&quot;isbn&quot;,&quot;products&quot;.&quot;ean&quot;,&quot;products&quot;.&quot;bznr&quot;,&quot;products&quot;.&quot;cover_p
icture&quot;,&quot;products&quot;.&quot;publication_date&quot;,&quot;products&quot;.&quot;edition&quot;,&quot;products&quot;.&quot;publisher&quot;,&quot;products&quot;.&quot;stock&quot;,&quot;products&quot;.&quot;delivery_time&quot;,&quot;products&quot;.&quot;selling_price&quot;,&quot;products&quot;.&quot;width&quot;,&quot;products&quot;.&quot;height&quot;,&quot;products&quot;.&quot;lengt
h&quot;,&quot;products&quot;.&quot;weight&quot; FROM &quot;products&quot; WHERE products.title ILIKE $1 LIMIT 10;

I created this prepare statement then did

EXECUTE my_test(&#39;%Warum Frauen alles besser wissen - und trotzdem alles falsch machen%&#39;);

5 times in a row, first 4 times it was using an index and was super fast, but on the fifth time and the times after that it uses seq scan which is super slow? Why is it doing that? I tried disabling seq scan and it would go with the index and execute it in 20 ms always, but I know that's not a good idea to disable.

答案1

得分: 2

在你的情况下，PostgreSQL在执行语句的第五次后切换到通用计划。通常情况下，这是一个不错的选择，因为它可以节省每次执行语句时的计划开销。但在你的情况下，这个选择似乎出了问题。

在执行了前五次后，PostgreSQL会测试通用计划是否能够像之前执行的自定义计划一样高效。在你的情况下，这个测试似乎失败了。所以要么PostgreSQL估计自定义计划的代价太高，要么估计通用计划的代价太低。

在没有看到两个计划的EXPLAIN输出之前，我只能猜测（从PostgreSQL v16开始，你可以使用EXPLAIN (GENERIC_PLAN)来获取通用计划）。我猜测慢的计划使用了并行顺序扫描。默认情况下，PostgreSQL估计索引扫描的代价相对较高，因为random_page_cost是seq_page_cost的四倍。如果你没有使用磁盘，或者你的数据已经被缓存，你应该降低random_page_cost的值来告诉PostgreSQL。这可能足以让PostgreSQL更倾向于使用索引扫描。其他可能的措施包括通过将max_parallel_workers_per_gather设置为0来禁用并行查询。

最后，还有一个开关可以完全禁用通用计划的生成：你可以将plan_cache_mode设置为force_custom_plan。如果其他方法都失败了，可以尝试这个方法。

英文:

In your case, PostgreSQL switches to a generic plan after the fifth execution of the statement. Usually that is a good move, because it saves the overhead of planning the statement every time it is executed. It just happens to go wrong in your case.

After the first five executions, PostgreSQL tests if the generic plan promises to be as efficient as the custom plans executed so far. That seems to go wrong in your case. So either PostgreSQL estimates the custom plans too expensive, or it estimates the generic plan too cheap.

Without seeing the EXPLAIN output for both plans, I have to guess (from PostgreSQL v16 on, you can get the generic plan with EXPLAIN (GENERIC_PLAN)). My guess is that the slow plan uses a parallel sequential scan. By default, PostgreSQL estimates index scans as fairly expensive, because random_page_cost is four times as high as seq_page_cost. If you don't have spinning disks, or if your data are cached, you should reduce random_page_cost to tell PostgreSQL. That may be enough to make PostgreSQL prefer the index scan. Other possible measures could be to disable parallel query by setting max_parallel_workers_per_gather to 0.

Finally, there is a switch to disable the generation of generic plans altogether: you can set plan_cache_mode to force_custom_plan. Try that if all else fails.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

PREPARE语句在几次调用后决定不使用索引，尽管它的速度较慢。

问题

答案1

Go GRPC刷新令牌以进行双向流。

如何使用AWS Go SDK仅更新DynamoDB中的单个字段

在Golang中模拟函数以测试我的HTTP路由。

在Go语言中，是否可以迭代自定义类型？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。