我怎样提取成对的父子列的层次链接

huangapple go评论73阅读模式
英文:

How can I extract hierarchy linkage for paired parent-children columns

问题

以下是修复后的查询代码:

WITH RECURSIVE node_tree AS (
  SELECT aid, bid, CAST(aid AS TEXT) || ',' || CAST(bid AS TEXT) AS path
  FROM tmp
  WHERE aid NOT IN (SELECT bid FROM tmp)
  UNION ALL
  SELECT nt.aid, h.bid, nt.path || ',' || h.bid
  FROM node_tree nt
  JOIN tmp h ON nt.bid = h.aid
)
SELECT DISTINCT path
FROM node_tree
WHERE aid NOT IN (SELECT DISTINCT bid FROM tmp)
ORDER BY path;

这个修复后的代码应该产生你期望的输出,其中每个id只会显示在一行中,而不会显示不必要的路径。

英文:
aid bid
1 2
1 3
2 3
3 4
5 6
7 8
8 10
8 9

I would like to extarct the hierarchy linkage for aid and bid.
No need to show each path, any id should show in one row only.

Expected Output:

path
1,2,3,4
5,6
7,8,9,10

I tried to use recursive function as below.

WITH RECURSIVE node_tree AS (
  SELECT aid, bid, CAST(aid AS TEXT) || ',' || CAST(bid AS TEXT) AS path
  FROM tmp
  WHERE aid NOT IN (SELECT bid FROM tmp)
  UNION ALL
  SELECT nt.aid, h.bid, nt.path || ',' || h.bid
  FROM node_tree nt
  JOIN tmp h ON nt.bid = h.aid
)
SELECT DISTINCT path
FROM node_tree
ORDER BY path;

It returns below.

path
1,2
1,2,3
1,2,3,4
1,3
1,3,4
5,6
7,8
7,8,10
7,8,9

Seems it cannot handle one Id only show in one row:

  • only need to show 1,2,3,4 and don't want to show below.
path
1,2
1,2,3
1,3
1,3,4
...
  • should show 7,8,9,10 instead of below:
path
...
7,8
7,8,10
7,8,9

答案1

得分: 1

你的尝试很不错,尽管有一些不足之处。我会避免在递归中生成字符串,最好是提取每个祖先的每个子元素,记录每个子元素的索引,然后按索引排序并连接到它们自己的祖先上。

WITH RECURSIVE ordered_tab AS (
    SELECT * FROM tab ORDER BY aid, bid
), cte AS (
    SELECT aid, MIN(bid) AS bid, 0 AS idx 
    FROM tab t1
    WHERE NOT EXISTS(SELECT 1 FROM tab t2 WHERE t1.aid = t2.bid)
    GROUP BY aid
  
    UNION ALL 
  
    SELECT cte.aid, tab.bid, idx + 1
    FROM cte
    INNER JOIN ordered_tab tab
            ON cte.bid = tab.aid
)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid

输出

path
1,2,3,4
5,6
7,8,9,10

此处查看演示。

英文:

Your attempt is very good, although it's missing something. I'd avoid to generate the string already in the recursion: probably better to extract each child for all anchestors, by recording the index of each children. And then apply aggregation of the children, ordered by index, and concatenated to their own anchestor.

WITH RECURSIVE ordered_tab AS (
    SELECT * FROM tab ORDER BY aid, bid
), cte AS (
    SELECT aid, MIN(bid) AS bid, 0 AS idx 
    FROM tab t1
    WHERE NOT EXISTS(SELECT 1 FROM tab t2 WHERE t1.aid = t2.bid)
	GROUP BY aid
  
	UNION ALL 
  
    SELECT cte.aid, tab.bid, idx + 1
    FROM       cte
    INNER JOIN ordered_tab tab
            ON cte.bid = tab.aid
)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid

Output:

path
1,2,3,4
5,6
7,8,9,10

Check the demo here.

答案2

得分: 0

以下是翻译好的代码部分:

使用递归查询来实现所需结果有几种方法。以下使用数组来构建路径,并跟踪每次迭代中发现的新节点。

WITH RECURSIVE
  t(aid, bid) AS (VALUES (1, 2),
                         (1, 3),
                         (2, 3),
                         (3, 4),
                         (5, 6),
                         (7, 8),
                         (8, 10),
                         (8, 9)),
  cte AS (SELECT DISTINCT 0 AS depth, ARRAY [t.aid] AS path, ARRAY [t.aid] AS new_nodes
            FROM t
            WHERE t.aid NOT IN (SELECT children.bid FROM t children)
          UNION ALL
          SELECT depth + 1 AS depth, cte.path || c.new_nodes AS path, c.new_nodes
            FROM cte
              CROSS JOIN LATERAL (SELECT ARRAY_AGG(DISTINCT t.bid ORDER BY t.bid) AS new_nodes
                                    FROM t
                                    WHERE t.aid = ANY (cte.new_nodes)
                                      AND NOT t.bid = ANY (cte.path)) c
            WHERE c.new_nodes IS NOT NULL)
SELECT DISTINCT ON (cte.path[1]) (REGEXP_MATCH(cte.path::text, '[^{}]+'))[1] AS path
  FROM cte
  ORDER BY cte.path[1], depth DESC;

递归查询从识别没有父节点的节点开始。这些节点建立路径根,并且还被记忆为新访问的节点。在每次迭代中,前一次迭代中访问的节点的子节点,但尚未被访问的节点,都会被添加到路径中并被记忆为新访问的节点。该过程持续进行,直到所有节点都被访问。

查询的最后部分使用DISTINCT ON,按路径根升序和深度降序对行进行排序,以仅返回每个根节点的完整路径。路径数组被转换为文本,并使用REGEXP_MATCH来删除括号。

英文:

There are several approaches using recursive queries to achieve the desired results. The following uses arrays to build the paths and keep track of the new nodes discovered with each iteration.

WITH RECURSIVE
  t(aid, bid) AS (VALUES (1, 2),
                         (1, 3),
                         (2, 3),
                         (3, 4),
                         (5, 6),
                         (7, 8),
                         (8, 10),
                         (8, 9)),
  cte AS (SELECT DISTINCT 0 AS depth, ARRAY [t.aid] AS path, ARRAY [t.aid] AS new_nodes
            FROM t
            WHERE t.aid NOT IN (SELECT children.bid FROM t children)
          UNION ALL
          SELECT depth + 1 AS depth, cte.path || c.new_nodes AS path, c.new_nodes
            FROM cte
              CROSS JOIN LATERAL (SELECT ARRAY_AGG(DISTINCT t.bid ORDER BY t.bid) AS new_nodes
                                    FROM t
                                    WHERE t.aid = ANY (cte.new_nodes)
                                      AND NOT t.bid = ANY (cte.path)) c
            WHERE c.new_nodes IS NOT NULL)
SELECT DISTINCT ON (cte.path[1]) (REGEXP_MATCH(cte.path::text, '[^{}]+'))[1] AS path
  FROM cte
  ORDER BY cte.path[1], depth DESC;

The recursive query begins by identifying the nodes without parents. These nodes establish the path roots and are also memoized as newly visited nodes. At each iteration, nodes that are children of the nodes visited in the prior iteration, but that have not yet been visited, are added to the path and memoized as newly visited nodes. This process continues until all nodes have been visited.

The final part of the query uses DISTINCT ON with rows sorted in ascending order by the path root and descending by depth to return only the full path from each root node. The path array is converted to text and REGEXP_MATCH is used to remove the enclosing braces.

答案3

得分: 0

下面是Lemon的原始解决方案:

EXPLAIN (ANALYZE, VERBOSE)
  WITH RECURSIVE
    ordered_tab AS (SELECT *
                      FROM test_table
                      ORDER BY aid, bid),
    cte AS (SELECT aid, MIN(bid) AS bid, 0 AS idx
              FROM test_table t1
              WHERE NOT EXISTS(SELECT 1 FROM test_table t2 WHERE t1.aid = t2.bid)
              GROUP BY aid
            UNION ALL
            SELECT cte.aid, tab.bid, idx + 1
              FROM cte
                INNER JOIN ordered_tab tab
                           ON cte.bid = tab.aid)
  SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
    FROM cte
    GROUP BY aid;

以下是运行此查询时产生的输出:

GroupAggregate  (cost=52804.85..52807.63 rows=101 width=36) (actual time=14445.549..14494.047 rows=973 loops=1)
  Output: concat(cte.aid, ',', string_agg(((cte.bid)::character varying)::text, ','::text ORDER BY cte.idx)), cte.aid
  Group Key: cte.aid
  CTE cte
    ->  Recursive Union  (cost=4467.74..52799.47 rows=101 width=12) (actual time=49.735..14381.476 rows=99027 loops=1)
          ->  GroupAggregate  (cost=4467.74..4467.76 rows=1 width=12) (actual time=49.735..50.086 rows=973 loops=1)
                Output: t1.aid, min(t1.bid), 0
                Group Key: t1.aid
                ->  Sort  (cost=4467.74..4467.74 rows=1 width=8) (actual time=49.729..49.799 rows=973 loops=1)
                      Output: t1.aid, t1.bid
                      Sort Key: t1.aid
                      Sort Method: quicksort  Memory: 70kB
                      ->  Hash Anti Join  (cost=2667.11..4467.73 rows=1 width=8) (actual time=24.974..49.532 rows=973 loops=1)
                            Output: t1.aid, t1.bid
                            Hash Cond: (t1.aid = t2.bid)
                            ->  Seq Scan on public.test_table t1  (cost=0.00..1429.27 rows=99027 width=8) (actual time=0.007..8.128 rows=99027 loops=1)
                                  Output: t1.aid, t1.bid
                            ->  Hash  (cost=1429.27..1429.27 rows=99027 width=4) (actual time=24.844..24.845 rows=99027 loops=1)
                                  Output: t2.bid
                                  Buckets: 131072  Batches: 1  Memory Usage: 4506kB
                                  ->  Seq Scan on public.test_table t2  (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.006..10.559 rows=99027 loops=1)
                                        Output: t2.bid
          ->  Merge Join  (cost=0.66..4832.97 rows=10 width=12) (actual time=1.617..22.406 rows=154 loops=638)
                Output: cte_1.aid, test_table.bid, (cte_1.idx + 1)
                Merge Cond: (test_table.aid = cte_1.bid)
                ->  Index Only Scan using test_table_by_aid on public.test_table  (cost=0.29..3594.59 rows=99027 width=8) (actual time=0.008..16.481 rows=87632 loops=638)
                      Output: test_table.aid, test_table.bid
                      Heap Fetches: 55909032
                ->  Sort  (cost=0.37..0.39 rows=10 width=12) (actual time=0.041..0.062 rows=155 loops=638)
                      Output: cte_1.aid, cte_1.idx, cte_1.bid
                      Sort Key: cte_1.bid
                      Sort Method: quicksort  Memory: 25kB
                      ->  WorkTable Scan on cte cte_1  (cost=0.00..0.20 rows=10 width=12) (actual time=0.001..0.019 rows=155 loops=638)
                            Output: c

<details>
<summary>英文:</summary>

After seeing Lemon&#39;s answer, I was curious about the relative performance of Lemon&#39;s approach vs. mine. Below are three queries, arranged from worst performing to best. **Spoiler alert: the final query is about 30 times faster than the first!**

Testing with the small sample from the original post showed no significant difference in performance, so I created a larger sample to test with. The following creates paths for 100,000 nodes, with an average path length of ~100 nodes. Each node occurs on exactly one path.

DROP TABLE IF EXISTS test_table CASCADE;

CREATE TABLE test_table
(
aid INTEGER,
bid INTEGER
);

WITH spans AS (SELECT t.id, LEAD(t.id, 1, 100001) OVER (ORDER BY t.id) AS next_id
FROM UNNEST(ARRAY [1, 23, 29, 32, 48, 63, 71, 78, 220, 250, 382, 492, 601, 624, 693, 761, 840, 947,
1146, 1194, 1309, 1348, 1628, 1989, 2166, 2576, 2609, 2851, 2981, 3134, 3168, 3254,
3296, 3322, 3378, 3430, 3442, 3588, 3625, 3714, 3824, 3917, 3996, 4003, 4072, 4581,
4613, 4683, 4699, 5080, 5088, 5136, 5174, 5195, 5500, 5526, 5618, 5671, 5848, 5958,
6106, 6207, 6255, 6792, 6836, 6887, 7057, 7180, 7303, 7510, 7619, 7622, 7737, 7778,
7832, 8055, 8062, 8091, 8206, 8245, 8275, 8428, 8458, 8525, 8611, 8711, 8730, 8810,
9150, 9163, 9428, 9644, 10015, 10022, 10026, 10102, 10137, 10496, 10575, 10610,
10648, 10815, 10909, 11140, 11284, 11348, 11531, 11580, 11599, 11617, 11819, 11845,
11890, 11893, 11924, 11949, 11957, 12011, 12251, 12311, 12350, 12372, 12391, 12443,
12492, 12660, 12714, 12813, 12903, 13007, 13025, 13214, 13286, 13316, 13362, 13403,
13425, 13606, 13633, 13670, 13732, 13780, 13871, 13894, 14011, 14446, 14468, 14500,
14535, 14821, 14907, 15102, 15192, 15222, 15279, 15291, 15379, 15412, 15421, 15515,
15758, 15831, 16225, 16301, 16315, 16669, 16911, 16936, 16997, 17056, 17082, 17127,
17267, 17301, 17762, 17858, 17880, 17908, 17985, 18178, 18239, 18288, 18333, 18341,
18452, 18518, 18631, 18746, 18809, 18873, 18915, 19026, 19030, 19125, 19147, 19278,
19422, 19641, 19675, 19737, 19757, 19800, 20051, 20248, 20391, 20411, 20484, 20773,
20794, 20904, 20964, 21047, 21319, 21375, 21559, 21658, 21826, 21837, 22178, 22223,
22280, 22361, 22378, 22622, 22831, 22839, 22877, 22903, 22942, 23084, 23201, 23230,
23232, 23247, 23268, 23289, 23438, 23440, 23587, 23717, 23815, 23820, 23854, 23864,
23961, 24027, 24087, 24115, 24240, 24278, 24396, 24527, 24658, 24987, 25017, 25073,
25100, 25139, 25246, 25250, 25281, 25356, 25588, 25597, 25846, 25853, 25861, 25885,
25892, 25941, 26138, 26415, 26436, 26577, 26726, 26823, 26831, 26861, 26992, 27101,
27112, 27184, 27308, 27409, 27502, 27511, 27625, 27754, 27878, 27959, 28077, 28162,
28801, 28875, 29175, 29364, 29380, 29469, 29703, 29714, 29732, 29809, 29925, 29942,
29995, 30056, 30227, 30284, 30383, 30385, 30429, 30516, 30547, 30646, 30716, 30754,
30904, 31190, 31226, 31341, 31374, 31418, 31493, 31643, 31943, 31978, 32448, 32642,
32763, 33081, 33470, 33612, 33640, 33657, 33756, 34234, 34303, 34523, 34600, 34637,
34790, 34992, 35110, 35148, 35229, 35322, 35332, 35543, 35657, 35919, 35952, 36082,
36114, 36227, 36252, 36365, 36377, 36406, 36519, 36525, 36836, 36848, 36933, 37067,
37366, 37436, 37519, 37748, 37840, 37995, 38078, 38102, 38213, 38271, 38298, 38395,
38438, 38559, 38596, 38686, 38728, 39061, 39106, 39129, 39170, 39319, 39374, 39397,
39420, 39464, 39809, 40020, 40056, 40332, 40375, 40412, 40446, 40478, 40547, 40726,
40764, 40904, 41029, 41090, 41313, 41343, 41478, 41515, 41940, 42069, 42238, 42345,
42399, 42401, 42480, 42537, 42569, 42600, 42646, 42770, 43044, 43054, 43298, 43340,
43496, 43570, 43623, 43685, 43730, 43900, 44076, 44279, 44343, 44443, 44681, 44955,
44970, 45262, 45277, 45342, 45367, 45450, 45617, 45626, 45832, 45971, 46126, 46135,
46142, 46164, 46733, 46808, 47050, 47139, 47312, 47410, 47439, 47462, 47497, 47701,
47727, 47736, 47771, 47883, 47953, 47987, 48011, 48117, 48293, 48559, 48608, 48735,
48757, 48822, 48836, 48959, 48998, 49100, 49150, 49155, 49468, 49596, 49739, 49849,
49920, 49970, 50204, 50312, 50372, 50508, 50558, 50593, 50956, 51022, 51070, 51182,
51291, 51926, 52156, 52186, 52267, 52288, 52322, 52535, 52546, 52656, 52669, 52788,
52945, 53005, 53160, 53260, 53266, 53320, 53480, 53517, 53749, 53788, 53929, 54153,
54167, 54246, 54477, 54618, 54656, 54721, 54750, 54821, 55034, 55184, 55238, 55382,
55463, 55532, 55563, 55614, 55660, 55675, 55848, 55924, 55973, 56017, 56080, 56095,
56403, 56517, 56594, 56613, 56748, 56823, 56898, 56920, 56935, 57046, 57153, 57177,
57198, 57202, 57629, 57640, 57720, 57725, 57768, 57920, 57963, 58334, 58422, 58455,
58469, 58494, 58519, 58534, 58536, 58558, 58568, 58900, 58965, 59186, 59326, 59422,
59494, 59659, 59682, 59747, 59864, 59982, 60003, 60210, 60576, 60645, 60827, 60917,
60960, 61088, 61132, 61136, 61345, 61502, 61791, 62184, 62202, 62270, 62284, 62439,
62448, 62456, 62536, 62599, 62756, 62830, 62926, 63050, 63274, 63288, 63315, 63347,
63761, 63794, 63820, 64067, 64146, 64372, 64409, 64475, 64479, 64532, 64945, 65104,
65474, 65495, 65528, 65754, 65989, 66021, 66085, 66179, 66309, 66347, 66378, 66491,
66503, 66534, 66576, 66579, 66582, 66744, 66910, 66965, 67059, 67083, 67091, 67105,
67203, 67323, 67366, 67394, 67397, 67569, 67906, 68033, 68035, 68096, 68423, 68471,
68484, 68544, 68736, 68744, 68976, 69068, 69083, 69138, 69149, 69219, 69305, 69308,
69444, 69565, 69590, 69638, 70119, 70196, 70391, 70447, 70589, 70670, 70733, 70959,
70981, 71022, 71175, 71180, 71264, 71312, 71314, 71399, 71752, 71794, 71867, 71880,
71917, 71972, 72061, 72070, 72091, 72281, 72327, 72637, 72645, 72647, 72699, 72764,
73149, 73389, 73412, 73568, 73613, 73678, 73686, 73732, 73738, 73874, 73912, 74075,
74140, 74217, 74441, 74480, 74502, 74507, 74635, 74667, 74679, 74868, 74873, 74897,
75006, 75116, 75562, 75718, 75848, 75960, 76085, 76171, 76295, 76392, 76400, 76735,
76864, 76906, 77040, 77047, 77235, 77379, 77627, 77888, 78011, 78175, 78372, 78379,
78457, 78695, 78697, 78734, 78759, 78867, 78883, 78947, 78962, 78982, 79053, 79331,
79449, 79592, 79618, 79725, 79797, 80010, 80050, 80052, 80062, 80129, 80147, 80268,
80317, 80471, 80502, 80529, 80603, 80834, 81081, 81109, 81162, 81169, 81256, 81538,
81559, 81737, 81743, 81800, 81812, 81919, 81935, 81976, 82036, 82102, 82162, 82317,
82454, 82668, 82723, 82773, 82776, 83177, 83199, 83368, 83403, 83501, 83594, 83688,
84193, 84353, 84386, 84525, 84658, 84930, 85004, 85202, 85287, 85478, 85751, 85904,
85973, 86186, 86237, 86791, 86820, 86882, 86893, 86971, 87109, 87153, 87382, 87398,
87478, 87657, 87832, 88000, 88005, 88377, 88444, 88456, 88649, 88666, 88790, 88792,
88818, 88851, 89199, 89343, 89346, 89395, 89435, 89480, 89498, 89552, 89588, 89665,
90013, 90183, 90201, 90449, 90457, 90470, 90570, 90727, 90849, 90944, 91149, 91287,
91293, 91363, 91542, 91550, 91581, 91614, 91743, 91840, 92000, 92063, 92303, 92313,
92508, 92601, 92694, 92790, 92831, 92970, 93155, 93181, 93258, 93275, 93294, 93339,
93355, 93610, 93659, 93708, 93971, 94208, 94291, 94458, 94502, 94516, 94624, 94632,
94729, 94901, 94921, 94963, 94974, 94995, 95079, 95173, 95429, 95554, 95618, 95833,
95916, 96037, 96137, 96240, 96293, 96527, 96533, 96708, 96900, 96925, 96940, 97143,
97265, 97295, 97539, 97617, 97659, 97714, 97716, 97731, 97785, 97796, 97944, 98002,
98146, 98279, 98319, 98421, 98436, 98445, 98524, 98597, 98909, 99028, 99046, 99201,
99384, 99511, 99542, 99559, 99659, 99671, 99710, 99741, 99988]::integer[]) t(id))
INSERT INTO test_table(aid, bid)
SELECT g.id - 1 AS aid, g.id AS bid
FROM spans
CROSS JOIN LATERAL GENERATE_SERIES(spans.next_id - 1, spans.id, -1) g(id)
WHERE g.id <> spans.id;

CREATE INDEX test_table_by_aid ON test_table (aid, bid);

ANALYZE test_table;

The following is Lemon&#39;s original solution:

EXPLAIN (ANALYZE, VERBOSE)
WITH RECURSIVE
ordered_tab AS (SELECT *
FROM test_table
ORDER BY aid, bid),
cte AS (SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM test_table t1
WHERE NOT EXISTS(SELECT 1 FROM test_table t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN ordered_tab tab
ON cte.bid = tab.aid)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid;

Running the above produced the following output:

GroupAggregate (cost=52804.85..52807.63 rows=101 width=36) (actual time=14445.549..14494.047 rows=973 loops=1)
Output: concat(cte.aid, ',', string_agg(((cte.bid)::character varying)::text, ','::text ORDER BY cte.idx)), cte.aid
Group Key: cte.aid
CTE cte
-> Recursive Union (cost=4467.74..52799.47 rows=101 width=12) (actual time=49.735..14381.476 rows=99027 loops=1)
-> GroupAggregate (cost=4467.74..4467.76 rows=1 width=12) (actual time=49.735..50.086 rows=973 loops=1)
Output: t1.aid, min(t1.bid), 0
Group Key: t1.aid
-> Sort (cost=4467.74..4467.74 rows=1 width=8) (actual time=49.729..49.799 rows=973 loops=1)
Output: t1.aid, t1.bid
Sort Key: t1.aid
Sort Method: quicksort Memory: 70kB
-> Hash Anti Join (cost=2667.11..4467.73 rows=1 width=8) (actual time=24.974..49.532 rows=973 loops=1)
Output: t1.aid, t1.bid
Hash Cond: (t1.aid = t2.bid)
-> Seq Scan on public.test_table t1 (cost=0.00..1429.27 rows=99027 width=8) (actual time=0.007..8.128 rows=99027 loops=1)
Output: t1.aid, t1.bid
-> Hash (cost=1429.27..1429.27 rows=99027 width=4) (actual time=24.844..24.845 rows=99027 loops=1)
Output: t2.bid
Buckets: 131072 Batches: 1 Memory Usage: 4506kB
-> Seq Scan on public.test_table t2 (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.006..10.559 rows=99027 loops=1)
Output: t2.bid
-> Merge Join (cost=0.66..4832.97 rows=10 width=12) (actual time=1.617..22.406 rows=154 loops=638)
Output: cte_1.aid, test_table.bid, (cte_1.idx + 1)
Merge Cond: (test_table.aid = cte_1.bid)
-> Index Only Scan using test_table_by_aid on public.test_table (cost=0.29..3594.59 rows=99027 width=8) (actual time=0.008..16.481 rows=87632 loops=638)
Output: test_table.aid, test_table.bid
Heap Fetches: 55909032
-> Sort (cost=0.37..0.39 rows=10 width=12) (actual time=0.041..0.062 rows=155 loops=638)
Output: cte_1.aid, cte_1.idx, cte_1.bid
Sort Key: cte_1.bid
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=12) (actual time=0.001..0.019 rows=155 loops=638)
Output: cte_1.aid, cte_1.idx, cte_1.bid
-> Sort (cost=5.38..5.63 rows=101 width=12) (actual time=14445.519..14452.830 rows=99027 loops=1)
Output: cte.aid, cte.bid, cte.idx
Sort Key: cte.aid
Sort Method: quicksort Memory: 7714kB
-> CTE Scan on cte (cost=0.00..2.02 rows=101 width=12) (actual time=49.737..14421.283 rows=99027 loops=1)
Output: cte.aid, cte.bid, cte.idx
Planning Time: 0.187 ms
Execution Time: 14501.446 ms

When run without `EXPLAIN ANALYZE`, the query completes in about 9.1 seconds.

Below is a revised version of my original solution that uses `SUBSTR` instead of `REGEXP_MATCH` to remove the braces from `PATH`. This version is about 40% faster than my original version.

EXPLAIN ANALYZE
WITH RECURSIVE
cte AS (SELECT DISTINCT 0 AS DEPTH, ARRAY [t.aid] AS PATH, ARRAY [t.aid] AS new_nodes
FROM test_table t
WHERE t.aid NOT IN (SELECT children.bid FROM test_table children)
UNION ALL
SELECT DEPTH + 1 AS DEPTH, cte.path || c.new_nodes AS PATH, c.new_nodes
FROM cte
CROSS JOIN LATERAL (SELECT ARRAY_AGG(DISTINCT t.bid ORDER BY t.bid) AS new_nodes
FROM test_table t
WHERE t.aid = ANY (cte.new_nodes)
AND NOT t.bid = ANY (cte.path)) c
WHERE c.new_nodes IS NOT NULL),
paths AS (SELECT DISTINCT ON (cte.path1) cte.path::text AS path
FROM cte
ORDER BY cte.path1, depth DESC)
SELECT SUBSTR(path, 2, LENGTH(path) - 2) AS path
FROM paths;

Running the above produced the following output:

Subquery Scan on paths (cost=313283983.23..313308991.30 rows=200 width=32) (actual time=1531.656..1581.667 rows=973 loops=1)
Output: substr(paths.path, 2, (length(paths.path) - 2))
CTE cte
-> Recursive Union (cost=3725.03..312465769.67 rows=5000914 width=68) (actual time=56.203..647.711 rows=100000 loops=1)
-> Subquery Scan on "SELECT 1" (cost=3725.03..4715.31 rows=49514 width=68) (actual time=56.202..56.655 rows=973 loops=1)
Output: 0, "SELECT 1".path, "SELECT 1".new_nodes
-> HashAggregate (cost=3725.03..4220.17 rows=49514 width=68) (actual time=56.201..56.539 rows=973 loops=1)
Output: (0), (ARRAY[t.aid]), (ARRAY[t.aid])
Group Key: 0, ARRAY[t.aid], ARRAY[t.aid]
Batches: 1 Memory Usage: 1809kB
-> Seq Scan on public.test_table t (cost=1676.84..3353.68 rows=49514 width=68) (actual time=33.802..55.664 rows=973 loops=1)
Output: 0, ARRAY[t.aid], ARRAY[t.aid]
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 98054
SubPlan 1
-> Seq Scan on public.test_table children (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.008..11.400 rows=99027 loops=1)
Output: children.bid
-> Nested Loop (cost=63.03..31236153.12 rows=495140 width=68) (actual time=0.007..0.879 rows=155 loops=639)
Output: (cte_1.depth + 1), (cte_1.path || (array_agg(DISTINCT t_1.bid ORDER BY t_1.bid))), (array_agg(DISTINCT t_1.bid ORDER BY t_1.bid))
-> WorkTable Scan on cte cte_1 (cost=0.00..9902.80 rows=495140 width=68) (actual time=0.000..0.022 rows=156 loops=639)
Output: cte_1.depth, cte_1.path, cte_1.new_nodes
-> Aggregate (cost=63.03..63.04 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
Output: array_agg(DISTINCT t_1.bid ORDER BY t_1.bid)
Filter: (array_agg(DISTINCT t_1.bid ORDER BY t_1.bid) IS NOT NULL)
Rows Removed by Filter: 0
-> Index Only Scan using test_table_by_aid on public.test_table t_1 (cost=0.29..62.98 rows=10 width=4) (actual time=0.003..0.003 rows=1 loops=100000)
Output: t_1.aid, t_1.bid
Index Cond: (t_1.aid = ANY (cte_1.new_nodes))
Filter: (t_1.bid <> ALL (cte_1.path))
Heap Fetches: 99027
-> Unique (cost=818213.56..843218.13 rows=200 width=40) (actual time=1531.651..1577.155 rows=973 loops=1)
Output: ((cte.path)::text), (cte.path1), cte.depth
-> Sort (cost=818213.56..830715.84 rows=5000914 width=40) (actual time=1531.650..1569.094 rows=100000 loops=1)
Output: ((cte.path)::text), (cte.path1), cte.depth
Sort Key: (cte.path1), cte.depth DESC
Sort Method: external merge Disk: 60456kB
-> CTE Scan on cte (cost=0.00..125022.85 rows=5000914 width=40) (actual time=56.210..1318.172 rows=100000 loops=1)
Output: (cte.path)::text, cte.path1, cte.depth
Planning Time: 0.163 ms
Execution Time: 1596.886 ms

This query is much faster, completing in about 1.5 seconds when run without `EXPLAIN ANALYZE`.

This final query is a modified version of Lemon&#39;s original. There are two changes; the subquery for `ordered_tab` has been removed, and the `ORDER BY` in the string aggregation uses `bid` instead of `idx`. The latter change has no measurable impact on performance, but removing the subquery for `ordered_tab` results in a significant improvement.

EXPLAIN ANALYZE
WITH RECURSIVE cte AS (SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM test_table t1
WHERE NOT EXISTS(SELECT 1 FROM test_table t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN test_table tab
ON cte.bid = tab.aid)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY bid)) AS path
FROM cte
GROUP BY aid;

Running the above produced the following output:

GroupAggregate (cost=5309.41..5312.19 rows=101 width=36) (actual time=333.629..440.384 rows=973 loops=1)
Output: concat(cte.aid, ',', string_agg(((cte.bid)::character varying)::text, ','::text ORDER BY cte.bid)), cte.aid
Group Key: cte.aid
CTE cte
-> Recursive Union (cost=4467.74..5304.03 rows=101 width=12) (actual time=50.804..272.663 rows=99027 loops=1)
-> GroupAggregate (cost=4467.74..4467.76 rows=1 width=12) (actual time=50.803..51.153 rows=973 loops=1)
Output: t1.aid, min(t1.bid), 0
Group Key: t1.aid
-> Sort (cost=4467.74..4467.74 rows=1 width=8) (actual time=50.797..50.867 rows=973 loops=1)
Output: t1.aid, t1.bid
Sort Key: t1.aid
Sort Method: quicksort Memory: 70kB
-> Hash Anti Join (cost=2667.11..4467.73 rows=1 width=8) (actual time=24.178..50.525 rows=973 loops=1)
Output: t1.aid, t1.bid
Hash Cond: (t1.aid = t2.bid)
-> Seq Scan on public.test_table t1 (cost=0.00..1429.27 rows=99027 width=8) (actual time=0.007..8.416 rows=99027 loops=1)
Output: t1.aid, t1.bid
-> Hash (cost=1429.27..1429.27 rows=99027 width=4) (actual time=24.117..24.118 rows=99027 loops=1)
Output: t2.bid
Buckets: 131072 Batches: 1 Memory Usage: 4506kB
-> Seq Scan on public.test_table t2 (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.005..10.645 rows=99027 loops=1)
Output: t2.bid
-> Nested Loop (cost=0.29..83.42 rows=10 width=12) (actual time=0.002..0.319 rows=154 loops=638)
Output: cte_1.aid, tab.bid, (cte_1.idx + 1)
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=12) (actual time=0.000..0.015 rows=155 loops=638)
Output: cte_1.aid, cte_1.bid, cte_1.idx
-> Index Only Scan using test_table_by_aid on public.test_table tab (cost=0.29..8.31 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=99027)
Output: tab.aid, tab.bid
Index Cond: (tab.aid = cte_1.bid)
Heap Fetches: 98054
-> Sort (cost=5.38..5.63 rows=101 width=8) (actual time=333.566..343.512 rows=99027 loops=1)
Output: cte.aid, cte.bid
Sort Key: cte.aid
Sort Method: quicksort Memory: 7714kB
-> CTE Scan on cte (cost=0.00..2.02 rows=101 width=8) (actual time=50.806..308.782 rows=99027 loops=1)
Output: cte.aid, cte.bid
Planning Time: 0.189 ms
Execution Time: 448.352 ms

This query is the fastest of the three, completing in about 0.3 seconds when run without `EXPLAIN ANALYZE`. Removing the `ordered_tab` subquery and instead joining directly to `test_table` allows PostgreSQL to use an index only scan instead of a worktable scan.

</details>



huangapple
  • 本文由 发表于 2023年5月25日 03:38:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76326896.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定