左连接执行方式类似于右连接【如何找到原因?】

huangapple go评论57阅读模式
英文:

Left join execute like right join [how to find the cause?]

问题

PostgreSQL might optimize and rewrite your query plans based on several factors, including the available indexes, statistics, and query complexity. This optimization process is generally designed to improve query performance. Let me summarize the query plans you've provided and address your concerns:

  1. The initial query with a RIGHT JOIN and WHERE clause seems to have performance issues, taking 20-30 seconds to execute. The problem might be related to the large number of records involved in the RIGHT JOIN and the WHERE clause condition. This query could benefit from proper indexing and optimization.

  2. You've mentioned that another query with a LEFT JOIN works perfectly, and the query that combines JOIN and WHERE clauses is causing issues. This could indicate that the specific combination of joins and conditions in the problem query is causing the performance problem.

  3. You've also provided examples of queries with LEFT JOIN and WHERE clauses that perform well. The difference in performance could be due to the specific query plan chosen by the PostgreSQL query optimizer.

  4. PostgreSQL's query planner can sometimes choose different execution plans based on statistics and the distribution of data in your tables. This can lead to different query execution times for seemingly similar queries.

To address your concerns and improve the performance of the problematic query, consider the following steps:

  1. Check Indexing: Ensure that there are appropriate indexes on the columns used in your JOIN and WHERE conditions. Indexes can significantly improve query performance.

  2. Analyze and Vacuum: Run the ANALYZE and VACUUM commands on your tables to update statistics and optimize the table layout. This can help the query planner make better decisions.

  3. Query Rewriting: You can try rewriting the query in different ways to see if the query planner chooses a more efficient execution plan. Experiment with different query structures and check their performance.

  4. Monitor Query Execution: Use tools like EXPLAIN ANALYZE to analyze the query execution plans and identify bottlenecks or areas where optimization is needed. You can also monitor the PostgreSQL logs for any error messages or warnings related to your query.

  5. Adjust PostgreSQL Configuration: Ensure that your PostgreSQL configuration settings are optimized for your workload. This includes settings related to memory, parallelism, and query optimization.

  6. Consider Partitioning: If your tables have a large number of records, consider partitioning them to improve query performance by reducing the amount of data that needs to be scanned.

  7. Update Statistics Regularly: Keep your statistics up to date, as outdated statistics can lead to suboptimal query plans.

If you continue to experience performance issues, you may want to consider consulting with a database administrator or PostgreSQL expert who can provide specific guidance based on your database schema and workload. Additionally, PostgreSQL updates and patches may include improvements in query optimization, so keeping your PostgreSQL version up to date is recommended.

英文:

I have simple query

explain  SELECT * FROM markets LEFT JOIN outcomes ON markets.uuid = outcomes.market_uuid WHERE markets.event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0';

and explain show me

Merge Right Join  (cost=63274.17..905696.60 rows=78271 width=8201)
  Merge Cond: (outcomes.market_uuid = markets.uuid)
  ->  Index Scan using outcomes_market_uuid_index on outcomes  (cost=0.43..802379.27 rows=15654269 width=2776)
  ->  Sort  (cost=63273.73..63336.34 rows=25043 width=5425)
        Sort Key: markets.uuid
        ->  Bitmap Heap Scan on markets  (cost=220.92..27250.08 rows=25043 width=5425)
              Recheck Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
              ->  Bitmap Index Scan on markets_event_uuid_index  (cost=0.00..214.66 rows=25043 width=0)
                    Index Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
JIT:
  Functions: 9
  Options: Inlining true, Optimization true, Expressions true, Deforming true
EXPLAIN (ANALYSE, BUFFERS)  SELECT * FROM markets 
LEFT JOIN outcomes ON markets.uuid = outcomes.market_uuid  
WHERE markets.event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0';
Merge Right Join  (cost=63274.17..905696.60 rows=78271 width=8201) (actual time=21215.656..21215.665 rows=2 loops=1)
  Merge Cond: (outcomes.market_uuid = markets.uuid)
  Buffers: shared hit=7635492 read=600026 written=2031
  ->  Index Scan using outcomes_market_uuid_index on outcomes  (cost=0.43..802379.27 rows=15654269 width=2776) (actual time=0.111..19456.586 rows=8421452 loops=1)
        Buffers: shared hit=7635488 read=600026 written=2031
  ->  Sort  (cost=63273.73..63336.34 rows=25043 width=5425) (actual time=0.047..0.049 rows=1 loops=1)
        Sort Key: markets.uuid
        Sort Method: quicksort  Memory: 26kB
        Buffers: shared hit=4
        ->  Bitmap Heap Scan on markets  (cost=220.92..27250.08 rows=25043 width=5425) (actual time=0.025..0.027 rows=1 loops=1)
              Recheck Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
              Heap Blocks: exact=1
              Buffers: shared hit=4
              ->  Bitmap Index Scan on markets_event_uuid_index  (cost=0.00..214.66 rows=25043 width=0) (actual time=0.018..0.018 rows=1 loops=1)
                    Index Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
                    Buffers: shared hit=3
Planning Time: 0.416 ms
JIT:
  Functions: 9
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 2.989 ms, Inlining 4.691 ms, Optimization 200.586 ms, Emission 117.924 ms, Total 326.192 ms
Execution Time: 21218.828 ms
EXPLAIN ANALYZE SELECT * FROM markets 
LEFT JOIN outcomes ON markets.uuid = outcomes.market_uuid 
WHERE markets.event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0';
Hash Right Join  (cost=98.69..681948.87 rows=269 width=4521) (actual time=31871.835..31871.846 rows=2 loops=1)
  Hash Cond: (outcomes.market_uuid = markets.uuid)
  ->  Seq Scan on outcomes  (cost=0.00..640757.69 rows=15654269 width=2776) (actual time=0.034..29402.201 rows=15654279 loops=1)
  ->  Hash  (cost=97.62..97.62 rows=86 width=1745) (actual time=364.287..364.290 rows=1 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 10kB
        ->  Index Scan using markets_event_uuid_index on markets  (cost=0.43..97.62 rows=86 width=1745) (actual time=0.027..0.032 rows=1 loops=1)
              Index Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
Planning Time: 0.326 ms
JIT:
  Functions: 12
  Options: Inlining true, Optimization true, Expressions true, Deforming true
  Timing: Generation 3.773 ms, Inlining 3.798 ms, Optimization 226.614 ms, Emission 133.456 ms, Total 367.641 ms
Execution Time: 31875.826 ms

Some information:

  • The database has been restored from a backup
  • Postgres has been restarted on the server
  • Another query with Left Join working perfect
  • Just this query above, join + where causes the problem
  • Query time 20-30 s, on right join table with 15 million records
  • Table has primary_key and index
  • Version Postgres 15.2
  • On local base (Version Postgres 14.3) works perfect

Query mixed where/join with time 1-5 ms (works perfect)

EXPLAIN (ANALYSE, BUFFERS)  SELECT * FROM markets 
LEFT JOIN events ON markets.event_uuid = events.uuid 
WHERE markets.event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0';
Buffers: shared hit=8
  ->  Bitmap Heap Scan on markets  (cost=220.92..27250.08 rows=25043 width=5425) (actual time=0.030..0.031 rows=1 loops=1)
        Recheck Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
        Heap Blocks: exact=1
        Buffers: shared hit=4
        ->  Bitmap Index Scan on markets_event_uuid_index  (cost=0.00..214.66 rows=25043 width=0) (actual time=0.019..0.019 rows=1 loops=1)
              Index Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
              Buffers: shared hit=3
  ->  Materialize  (cost=0.42..2.65 rows=1 width=5491) (actual time=0.045..0.045 rows=1 loops=1)
        Buffers: shared hit=4
        ->  Index Scan using events_pkey on events  (cost=0.42..2.64 rows=1 width=5491) (actual time=0.024..0.024 rows=1 loops=1)
              Index Cond: (uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
              Buffers: shared hit=4
Planning Time: 0.384 ms
Execution Time: 0.253 ms
EXPLAIN (ANALYSE, BUFFERS)  SELECT * FROM markets 
LEFT OUTER JOIN outcomes ON markets.uuid = outcomes.market_uuid 
WHERE markets.uuid = '6dc00c42-9b80-43a5-a263-5010648a38fd';
 Buffers: shared hit=3 read=6
  ->  Index Scan using markets_pkey on markets  (cost=0.43..2.65 rows=1 width=5425) (actual time=3.183..3.185 rows=1 loops=1)
        Index Cond: (uuid = '6dc00c42-9b80-43a5-a263-5010648a38fd'::uuid)
        Buffers: shared hit=1 read=3
  ->  Bitmap Heap Scan on outcomes  (cost=781.94..78619.53 rows=78271 width=2776) (actual time=1.713..1.723 rows=2 loops=1)
        Recheck Cond: (market_uuid = '6dc00c42-9b80-43a5-a263-5010648a38fd'::uuid)
        Heap Blocks: exact=2
        Buffers: shared hit=2 read=3
        ->  Bitmap Index Scan on outcomes_market_uuid_index  (cost=0.00..762.37 rows=78271 width=0) (actual time=1.682..1.682 rows=2 loops=1)
              Index Cond: (market_uuid = '6dc00c42-9b80-43a5-a263-5010648a38fd'::uuid)
              Buffers: shared hit=1 read=2
Planning Time: 0.360 ms
Execution Time: 4.975 ms

Why engine Postgres change my query? Cache? Bug? Can you help me?

答案1

得分: 0

SQL是一种查询语言,而不是一种编程语言。不同之处在于,在编程语言中,您编写计算机必须执行的确切代码,而在查询语言中,您只需指定要在结果集中获得什么,而不是如何获得!

因此,像玛丽·沃森女士(福尔摩斯的房东)一样的优化器必须以最聪明的方式完成回答您的请求的任务,这就是为什么它重新编写了您的请求,这是数学上的等效严格,以便更快地回答您的原因...

英文:

SQL is a query language not a programming language. The difference is that in programming language you write the exact code that the computer must execute, and in a query langage yous just specify what you want to have in the result set, not how !

So the optimizer, like Madame Mary Watson (Sherlock Holms landlady) must accomplish the task of answering your request in the smartest way, and that's why he rewrote your request, which is a mathematical equivalent sctict in order to answer you faster...

答案2

得分: 0

将PostgreSQL降级到14.8版本可以解决这个问题。感谢您的帮助。多亏了您,我更好地理解了数据库的工作方式,并学到了分析查询的新方法。

SELECT * FROM markets
LEFT JOIN outcomes ON markets.uuid = outcomes.market_uuid
WHERE markets.event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0';
查询计划
-----------------------------------------------------------------------------------------------------------------------------------------------
 嵌套循环左连接  (cost=0.87..2121.97 rows=441 width=2467) (实际时间=0.050..0.053 =2 循环=1)
   缓冲区: 共享命中=8
   ->   markets 上使用 markets_event_uuid_index 索引扫描  (cost=0.43..159.01 rows=141 width=2245) (实际时间=0.025..0.025 =1 循环=1)
         索引条件: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
         缓冲区: 共享命中=4
   ->   outcomes 上使用 outcomes_market_uuid_index 索引扫描  (cost=0.43..13.81 rows=11 width=222) (实际时间=0.018..0.019 =2 循环=1)
         索引条件: (market_uuid = markets.uuid)
         缓冲区: 共享命中=4
 计划:
   缓冲区: 共享命中=16
 计划时间: 0.440 毫秒
 执行时间: 0.124 毫秒
(12 )
英文:

Downgrade postgres to 14.8 fix the problem. Thank you for your help. Thanks to you I understood a bit better how the database works and I learned new ways to analyze queries.

SELECT * FROM markets
LEFT JOIN outcomes ON markets.uuid = outcomes.market_uuid
WHERE markets.event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0';
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=0.87..2121.97 rows=441 width=2467) (actual time=0.050..0.053 rows=2 loops=1)
   Buffers: shared hit=8
   ->  Index Scan using markets_event_uuid_index on markets  (cost=0.43..159.01 rows=141 width=2245) (actual time=0.025..0.025 rows=1 loops=1)
         Index Cond: (event_uuid = 'f286f78b-9e9c-4052-9785-7c908a8ab2c0'::uuid)
         Buffers: shared hit=4
   ->  Index Scan using outcomes_market_uuid_index on outcomes  (cost=0.43..13.81 rows=11 width=222) (actual time=0.018..0.019 rows=2 loops=1)
         Index Cond: (market_uuid = markets.uuid)
         Buffers: shared hit=4
 Planning:
   Buffers: shared hit=16
 Planning Time: 0.440 ms
 Execution Time: 0.124 ms
(12 rows)

huangapple
  • 本文由 发表于 2023年6月1日 18:13:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76380854.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定