英文:
How can I extract hierarchy linkage for paired parent-children columns
问题
以下是修复后的查询代码:
WITH RECURSIVE node_tree AS (
SELECT aid, bid, CAST(aid AS TEXT) || ',' || CAST(bid AS TEXT) AS path
FROM tmp
WHERE aid NOT IN (SELECT bid FROM tmp)
UNION ALL
SELECT nt.aid, h.bid, nt.path || ',' || h.bid
FROM node_tree nt
JOIN tmp h ON nt.bid = h.aid
)
SELECT DISTINCT path
FROM node_tree
WHERE aid NOT IN (SELECT DISTINCT bid FROM tmp)
ORDER BY path;
这个修复后的代码应该产生你期望的输出,其中每个id只会显示在一行中,而不会显示不必要的路径。
英文:
aid | bid |
---|---|
1 | 2 |
1 | 3 |
2 | 3 |
3 | 4 |
5 | 6 |
7 | 8 |
8 | 10 |
8 | 9 |
I would like to extarct the hierarchy linkage for aid and bid.
No need to show each path, any id should show in one row only.
Expected Output:
path |
---|
1,2,3,4 |
5,6 |
7,8,9,10 |
I tried to use recursive function as below.
WITH RECURSIVE node_tree AS (
SELECT aid, bid, CAST(aid AS TEXT) || ',' || CAST(bid AS TEXT) AS path
FROM tmp
WHERE aid NOT IN (SELECT bid FROM tmp)
UNION ALL
SELECT nt.aid, h.bid, nt.path || ',' || h.bid
FROM node_tree nt
JOIN tmp h ON nt.bid = h.aid
)
SELECT DISTINCT path
FROM node_tree
ORDER BY path;
It returns below.
path |
---|
1,2 |
1,2,3 |
1,2,3,4 |
1,3 |
1,3,4 |
5,6 |
7,8 |
7,8,10 |
7,8,9 |
Seems it cannot handle one Id only show in one row:
- only need to show 1,2,3,4 and don't want to show below.
path |
---|
1,2 |
1,2,3 |
1,3 |
1,3,4 |
... |
- should show 7,8,9,10 instead of below:
path |
---|
... |
7,8 |
7,8,10 |
7,8,9 |
答案1
得分: 1
你的尝试很不错,尽管有一些不足之处。我会避免在递归中生成字符串,最好是提取每个祖先的每个子元素,记录每个子元素的索引,然后按索引排序并连接到它们自己的祖先上。
WITH RECURSIVE ordered_tab AS (
SELECT * FROM tab ORDER BY aid, bid
), cte AS (
SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM tab t1
WHERE NOT EXISTS(SELECT 1 FROM tab t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN ordered_tab tab
ON cte.bid = tab.aid
)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid
输出:
path |
---|
1,2,3,4 |
5,6 |
7,8,9,10 |
在此处查看演示。
英文:
Your attempt is very good, although it's missing something. I'd avoid to generate the string already in the recursion: probably better to extract each child for all anchestors, by recording the index of each children. And then apply aggregation of the children, ordered by index, and concatenated to their own anchestor.
WITH RECURSIVE ordered_tab AS (
SELECT * FROM tab ORDER BY aid, bid
), cte AS (
SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM tab t1
WHERE NOT EXISTS(SELECT 1 FROM tab t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN ordered_tab tab
ON cte.bid = tab.aid
)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid
Output:
path |
---|
1,2,3,4 |
5,6 |
7,8,9,10 |
Check the demo here.
答案2
得分: 0
以下是翻译好的代码部分:
使用递归查询来实现所需结果有几种方法。以下使用数组来构建路径,并跟踪每次迭代中发现的新节点。
WITH RECURSIVE
t(aid, bid) AS (VALUES (1, 2),
(1, 3),
(2, 3),
(3, 4),
(5, 6),
(7, 8),
(8, 10),
(8, 9)),
cte AS (SELECT DISTINCT 0 AS depth, ARRAY [t.aid] AS path, ARRAY [t.aid] AS new_nodes
FROM t
WHERE t.aid NOT IN (SELECT children.bid FROM t children)
UNION ALL
SELECT depth + 1 AS depth, cte.path || c.new_nodes AS path, c.new_nodes
FROM cte
CROSS JOIN LATERAL (SELECT ARRAY_AGG(DISTINCT t.bid ORDER BY t.bid) AS new_nodes
FROM t
WHERE t.aid = ANY (cte.new_nodes)
AND NOT t.bid = ANY (cte.path)) c
WHERE c.new_nodes IS NOT NULL)
SELECT DISTINCT ON (cte.path[1]) (REGEXP_MATCH(cte.path::text, '[^{}]+'))[1] AS path
FROM cte
ORDER BY cte.path[1], depth DESC;
递归查询从识别没有父节点的节点开始。这些节点建立路径根,并且还被记忆为新访问的节点。在每次迭代中,前一次迭代中访问的节点的子节点,但尚未被访问的节点,都会被添加到路径中并被记忆为新访问的节点。该过程持续进行,直到所有节点都被访问。
查询的最后部分使用DISTINCT ON
,按路径根升序和深度降序对行进行排序,以仅返回每个根节点的完整路径。路径数组被转换为文本,并使用REGEXP_MATCH
来删除括号。
英文:
There are several approaches using recursive queries to achieve the desired results. The following uses arrays to build the paths and keep track of the new nodes discovered with each iteration.
WITH RECURSIVE
t(aid, bid) AS (VALUES (1, 2),
(1, 3),
(2, 3),
(3, 4),
(5, 6),
(7, 8),
(8, 10),
(8, 9)),
cte AS (SELECT DISTINCT 0 AS depth, ARRAY [t.aid] AS path, ARRAY [t.aid] AS new_nodes
FROM t
WHERE t.aid NOT IN (SELECT children.bid FROM t children)
UNION ALL
SELECT depth + 1 AS depth, cte.path || c.new_nodes AS path, c.new_nodes
FROM cte
CROSS JOIN LATERAL (SELECT ARRAY_AGG(DISTINCT t.bid ORDER BY t.bid) AS new_nodes
FROM t
WHERE t.aid = ANY (cte.new_nodes)
AND NOT t.bid = ANY (cte.path)) c
WHERE c.new_nodes IS NOT NULL)
SELECT DISTINCT ON (cte.path[1]) (REGEXP_MATCH(cte.path::text, '[^{}]+'))[1] AS path
FROM cte
ORDER BY cte.path[1], depth DESC;
The recursive query begins by identifying the nodes without parents. These nodes establish the path roots and are also memoized as newly visited nodes. At each iteration, nodes that are children of the nodes visited in the prior iteration, but that have not yet been visited, are added to the path and memoized as newly visited nodes. This process continues until all nodes have been visited.
The final part of the query uses DISTINCT ON
with rows sorted in ascending order by the path root and descending by depth to return only the full path from each root node. The path array is converted to text and REGEXP_MATCH
is used to remove the enclosing braces.
答案3
得分: 0
下面是Lemon的原始解决方案:
EXPLAIN (ANALYZE, VERBOSE)
WITH RECURSIVE
ordered_tab AS (SELECT *
FROM test_table
ORDER BY aid, bid),
cte AS (SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM test_table t1
WHERE NOT EXISTS(SELECT 1 FROM test_table t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN ordered_tab tab
ON cte.bid = tab.aid)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid;
以下是运行此查询时产生的输出:
GroupAggregate (cost=52804.85..52807.63 rows=101 width=36) (actual time=14445.549..14494.047 rows=973 loops=1)
Output: concat(cte.aid, ',', string_agg(((cte.bid)::character varying)::text, ','::text ORDER BY cte.idx)), cte.aid
Group Key: cte.aid
CTE cte
-> Recursive Union (cost=4467.74..52799.47 rows=101 width=12) (actual time=49.735..14381.476 rows=99027 loops=1)
-> GroupAggregate (cost=4467.74..4467.76 rows=1 width=12) (actual time=49.735..50.086 rows=973 loops=1)
Output: t1.aid, min(t1.bid), 0
Group Key: t1.aid
-> Sort (cost=4467.74..4467.74 rows=1 width=8) (actual time=49.729..49.799 rows=973 loops=1)
Output: t1.aid, t1.bid
Sort Key: t1.aid
Sort Method: quicksort Memory: 70kB
-> Hash Anti Join (cost=2667.11..4467.73 rows=1 width=8) (actual time=24.974..49.532 rows=973 loops=1)
Output: t1.aid, t1.bid
Hash Cond: (t1.aid = t2.bid)
-> Seq Scan on public.test_table t1 (cost=0.00..1429.27 rows=99027 width=8) (actual time=0.007..8.128 rows=99027 loops=1)
Output: t1.aid, t1.bid
-> Hash (cost=1429.27..1429.27 rows=99027 width=4) (actual time=24.844..24.845 rows=99027 loops=1)
Output: t2.bid
Buckets: 131072 Batches: 1 Memory Usage: 4506kB
-> Seq Scan on public.test_table t2 (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.006..10.559 rows=99027 loops=1)
Output: t2.bid
-> Merge Join (cost=0.66..4832.97 rows=10 width=12) (actual time=1.617..22.406 rows=154 loops=638)
Output: cte_1.aid, test_table.bid, (cte_1.idx + 1)
Merge Cond: (test_table.aid = cte_1.bid)
-> Index Only Scan using test_table_by_aid on public.test_table (cost=0.29..3594.59 rows=99027 width=8) (actual time=0.008..16.481 rows=87632 loops=638)
Output: test_table.aid, test_table.bid
Heap Fetches: 55909032
-> Sort (cost=0.37..0.39 rows=10 width=12) (actual time=0.041..0.062 rows=155 loops=638)
Output: cte_1.aid, cte_1.idx, cte_1.bid
Sort Key: cte_1.bid
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=12) (actual time=0.001..0.019 rows=155 loops=638)
Output: c
<details>
<summary>英文:</summary>
After seeing Lemon's answer, I was curious about the relative performance of Lemon's approach vs. mine. Below are three queries, arranged from worst performing to best. **Spoiler alert: the final query is about 30 times faster than the first!**
Testing with the small sample from the original post showed no significant difference in performance, so I created a larger sample to test with. The following creates paths for 100,000 nodes, with an average path length of ~100 nodes. Each node occurs on exactly one path.
DROP TABLE IF EXISTS test_table CASCADE;
CREATE TABLE test_table
(
aid INTEGER,
bid INTEGER
);
WITH spans AS (SELECT t.id, LEAD(t.id, 1, 100001) OVER (ORDER BY t.id) AS next_id
FROM UNNEST(ARRAY [1, 23, 29, 32, 48, 63, 71, 78, 220, 250, 382, 492, 601, 624, 693, 761, 840, 947,
1146, 1194, 1309, 1348, 1628, 1989, 2166, 2576, 2609, 2851, 2981, 3134, 3168, 3254,
3296, 3322, 3378, 3430, 3442, 3588, 3625, 3714, 3824, 3917, 3996, 4003, 4072, 4581,
4613, 4683, 4699, 5080, 5088, 5136, 5174, 5195, 5500, 5526, 5618, 5671, 5848, 5958,
6106, 6207, 6255, 6792, 6836, 6887, 7057, 7180, 7303, 7510, 7619, 7622, 7737, 7778,
7832, 8055, 8062, 8091, 8206, 8245, 8275, 8428, 8458, 8525, 8611, 8711, 8730, 8810,
9150, 9163, 9428, 9644, 10015, 10022, 10026, 10102, 10137, 10496, 10575, 10610,
10648, 10815, 10909, 11140, 11284, 11348, 11531, 11580, 11599, 11617, 11819, 11845,
11890, 11893, 11924, 11949, 11957, 12011, 12251, 12311, 12350, 12372, 12391, 12443,
12492, 12660, 12714, 12813, 12903, 13007, 13025, 13214, 13286, 13316, 13362, 13403,
13425, 13606, 13633, 13670, 13732, 13780, 13871, 13894, 14011, 14446, 14468, 14500,
14535, 14821, 14907, 15102, 15192, 15222, 15279, 15291, 15379, 15412, 15421, 15515,
15758, 15831, 16225, 16301, 16315, 16669, 16911, 16936, 16997, 17056, 17082, 17127,
17267, 17301, 17762, 17858, 17880, 17908, 17985, 18178, 18239, 18288, 18333, 18341,
18452, 18518, 18631, 18746, 18809, 18873, 18915, 19026, 19030, 19125, 19147, 19278,
19422, 19641, 19675, 19737, 19757, 19800, 20051, 20248, 20391, 20411, 20484, 20773,
20794, 20904, 20964, 21047, 21319, 21375, 21559, 21658, 21826, 21837, 22178, 22223,
22280, 22361, 22378, 22622, 22831, 22839, 22877, 22903, 22942, 23084, 23201, 23230,
23232, 23247, 23268, 23289, 23438, 23440, 23587, 23717, 23815, 23820, 23854, 23864,
23961, 24027, 24087, 24115, 24240, 24278, 24396, 24527, 24658, 24987, 25017, 25073,
25100, 25139, 25246, 25250, 25281, 25356, 25588, 25597, 25846, 25853, 25861, 25885,
25892, 25941, 26138, 26415, 26436, 26577, 26726, 26823, 26831, 26861, 26992, 27101,
27112, 27184, 27308, 27409, 27502, 27511, 27625, 27754, 27878, 27959, 28077, 28162,
28801, 28875, 29175, 29364, 29380, 29469, 29703, 29714, 29732, 29809, 29925, 29942,
29995, 30056, 30227, 30284, 30383, 30385, 30429, 30516, 30547, 30646, 30716, 30754,
30904, 31190, 31226, 31341, 31374, 31418, 31493, 31643, 31943, 31978, 32448, 32642,
32763, 33081, 33470, 33612, 33640, 33657, 33756, 34234, 34303, 34523, 34600, 34637,
34790, 34992, 35110, 35148, 35229, 35322, 35332, 35543, 35657, 35919, 35952, 36082,
36114, 36227, 36252, 36365, 36377, 36406, 36519, 36525, 36836, 36848, 36933, 37067,
37366, 37436, 37519, 37748, 37840, 37995, 38078, 38102, 38213, 38271, 38298, 38395,
38438, 38559, 38596, 38686, 38728, 39061, 39106, 39129, 39170, 39319, 39374, 39397,
39420, 39464, 39809, 40020, 40056, 40332, 40375, 40412, 40446, 40478, 40547, 40726,
40764, 40904, 41029, 41090, 41313, 41343, 41478, 41515, 41940, 42069, 42238, 42345,
42399, 42401, 42480, 42537, 42569, 42600, 42646, 42770, 43044, 43054, 43298, 43340,
43496, 43570, 43623, 43685, 43730, 43900, 44076, 44279, 44343, 44443, 44681, 44955,
44970, 45262, 45277, 45342, 45367, 45450, 45617, 45626, 45832, 45971, 46126, 46135,
46142, 46164, 46733, 46808, 47050, 47139, 47312, 47410, 47439, 47462, 47497, 47701,
47727, 47736, 47771, 47883, 47953, 47987, 48011, 48117, 48293, 48559, 48608, 48735,
48757, 48822, 48836, 48959, 48998, 49100, 49150, 49155, 49468, 49596, 49739, 49849,
49920, 49970, 50204, 50312, 50372, 50508, 50558, 50593, 50956, 51022, 51070, 51182,
51291, 51926, 52156, 52186, 52267, 52288, 52322, 52535, 52546, 52656, 52669, 52788,
52945, 53005, 53160, 53260, 53266, 53320, 53480, 53517, 53749, 53788, 53929, 54153,
54167, 54246, 54477, 54618, 54656, 54721, 54750, 54821, 55034, 55184, 55238, 55382,
55463, 55532, 55563, 55614, 55660, 55675, 55848, 55924, 55973, 56017, 56080, 56095,
56403, 56517, 56594, 56613, 56748, 56823, 56898, 56920, 56935, 57046, 57153, 57177,
57198, 57202, 57629, 57640, 57720, 57725, 57768, 57920, 57963, 58334, 58422, 58455,
58469, 58494, 58519, 58534, 58536, 58558, 58568, 58900, 58965, 59186, 59326, 59422,
59494, 59659, 59682, 59747, 59864, 59982, 60003, 60210, 60576, 60645, 60827, 60917,
60960, 61088, 61132, 61136, 61345, 61502, 61791, 62184, 62202, 62270, 62284, 62439,
62448, 62456, 62536, 62599, 62756, 62830, 62926, 63050, 63274, 63288, 63315, 63347,
63761, 63794, 63820, 64067, 64146, 64372, 64409, 64475, 64479, 64532, 64945, 65104,
65474, 65495, 65528, 65754, 65989, 66021, 66085, 66179, 66309, 66347, 66378, 66491,
66503, 66534, 66576, 66579, 66582, 66744, 66910, 66965, 67059, 67083, 67091, 67105,
67203, 67323, 67366, 67394, 67397, 67569, 67906, 68033, 68035, 68096, 68423, 68471,
68484, 68544, 68736, 68744, 68976, 69068, 69083, 69138, 69149, 69219, 69305, 69308,
69444, 69565, 69590, 69638, 70119, 70196, 70391, 70447, 70589, 70670, 70733, 70959,
70981, 71022, 71175, 71180, 71264, 71312, 71314, 71399, 71752, 71794, 71867, 71880,
71917, 71972, 72061, 72070, 72091, 72281, 72327, 72637, 72645, 72647, 72699, 72764,
73149, 73389, 73412, 73568, 73613, 73678, 73686, 73732, 73738, 73874, 73912, 74075,
74140, 74217, 74441, 74480, 74502, 74507, 74635, 74667, 74679, 74868, 74873, 74897,
75006, 75116, 75562, 75718, 75848, 75960, 76085, 76171, 76295, 76392, 76400, 76735,
76864, 76906, 77040, 77047, 77235, 77379, 77627, 77888, 78011, 78175, 78372, 78379,
78457, 78695, 78697, 78734, 78759, 78867, 78883, 78947, 78962, 78982, 79053, 79331,
79449, 79592, 79618, 79725, 79797, 80010, 80050, 80052, 80062, 80129, 80147, 80268,
80317, 80471, 80502, 80529, 80603, 80834, 81081, 81109, 81162, 81169, 81256, 81538,
81559, 81737, 81743, 81800, 81812, 81919, 81935, 81976, 82036, 82102, 82162, 82317,
82454, 82668, 82723, 82773, 82776, 83177, 83199, 83368, 83403, 83501, 83594, 83688,
84193, 84353, 84386, 84525, 84658, 84930, 85004, 85202, 85287, 85478, 85751, 85904,
85973, 86186, 86237, 86791, 86820, 86882, 86893, 86971, 87109, 87153, 87382, 87398,
87478, 87657, 87832, 88000, 88005, 88377, 88444, 88456, 88649, 88666, 88790, 88792,
88818, 88851, 89199, 89343, 89346, 89395, 89435, 89480, 89498, 89552, 89588, 89665,
90013, 90183, 90201, 90449, 90457, 90470, 90570, 90727, 90849, 90944, 91149, 91287,
91293, 91363, 91542, 91550, 91581, 91614, 91743, 91840, 92000, 92063, 92303, 92313,
92508, 92601, 92694, 92790, 92831, 92970, 93155, 93181, 93258, 93275, 93294, 93339,
93355, 93610, 93659, 93708, 93971, 94208, 94291, 94458, 94502, 94516, 94624, 94632,
94729, 94901, 94921, 94963, 94974, 94995, 95079, 95173, 95429, 95554, 95618, 95833,
95916, 96037, 96137, 96240, 96293, 96527, 96533, 96708, 96900, 96925, 96940, 97143,
97265, 97295, 97539, 97617, 97659, 97714, 97716, 97731, 97785, 97796, 97944, 98002,
98146, 98279, 98319, 98421, 98436, 98445, 98524, 98597, 98909, 99028, 99046, 99201,
99384, 99511, 99542, 99559, 99659, 99671, 99710, 99741, 99988]::integer[]) t(id))
INSERT INTO test_table(aid, bid)
SELECT g.id - 1 AS aid, g.id AS bid
FROM spans
CROSS JOIN LATERAL GENERATE_SERIES(spans.next_id - 1, spans.id, -1) g(id)
WHERE g.id <> spans.id;
CREATE INDEX test_table_by_aid ON test_table (aid, bid);
ANALYZE test_table;
The following is Lemon's original solution:
EXPLAIN (ANALYZE, VERBOSE)
WITH RECURSIVE
ordered_tab AS (SELECT *
FROM test_table
ORDER BY aid, bid),
cte AS (SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM test_table t1
WHERE NOT EXISTS(SELECT 1 FROM test_table t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN ordered_tab tab
ON cte.bid = tab.aid)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY idx)) AS path
FROM cte
GROUP BY aid;
Running the above produced the following output:
GroupAggregate (cost=52804.85..52807.63 rows=101 width=36) (actual time=14445.549..14494.047 rows=973 loops=1)
Output: concat(cte.aid, ',', string_agg(((cte.bid)::character varying)::text, ','::text ORDER BY cte.idx)), cte.aid
Group Key: cte.aid
CTE cte
-> Recursive Union (cost=4467.74..52799.47 rows=101 width=12) (actual time=49.735..14381.476 rows=99027 loops=1)
-> GroupAggregate (cost=4467.74..4467.76 rows=1 width=12) (actual time=49.735..50.086 rows=973 loops=1)
Output: t1.aid, min(t1.bid), 0
Group Key: t1.aid
-> Sort (cost=4467.74..4467.74 rows=1 width=8) (actual time=49.729..49.799 rows=973 loops=1)
Output: t1.aid, t1.bid
Sort Key: t1.aid
Sort Method: quicksort Memory: 70kB
-> Hash Anti Join (cost=2667.11..4467.73 rows=1 width=8) (actual time=24.974..49.532 rows=973 loops=1)
Output: t1.aid, t1.bid
Hash Cond: (t1.aid = t2.bid)
-> Seq Scan on public.test_table t1 (cost=0.00..1429.27 rows=99027 width=8) (actual time=0.007..8.128 rows=99027 loops=1)
Output: t1.aid, t1.bid
-> Hash (cost=1429.27..1429.27 rows=99027 width=4) (actual time=24.844..24.845 rows=99027 loops=1)
Output: t2.bid
Buckets: 131072 Batches: 1 Memory Usage: 4506kB
-> Seq Scan on public.test_table t2 (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.006..10.559 rows=99027 loops=1)
Output: t2.bid
-> Merge Join (cost=0.66..4832.97 rows=10 width=12) (actual time=1.617..22.406 rows=154 loops=638)
Output: cte_1.aid, test_table.bid, (cte_1.idx + 1)
Merge Cond: (test_table.aid = cte_1.bid)
-> Index Only Scan using test_table_by_aid on public.test_table (cost=0.29..3594.59 rows=99027 width=8) (actual time=0.008..16.481 rows=87632 loops=638)
Output: test_table.aid, test_table.bid
Heap Fetches: 55909032
-> Sort (cost=0.37..0.39 rows=10 width=12) (actual time=0.041..0.062 rows=155 loops=638)
Output: cte_1.aid, cte_1.idx, cte_1.bid
Sort Key: cte_1.bid
Sort Method: quicksort Memory: 25kB
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=12) (actual time=0.001..0.019 rows=155 loops=638)
Output: cte_1.aid, cte_1.idx, cte_1.bid
-> Sort (cost=5.38..5.63 rows=101 width=12) (actual time=14445.519..14452.830 rows=99027 loops=1)
Output: cte.aid, cte.bid, cte.idx
Sort Key: cte.aid
Sort Method: quicksort Memory: 7714kB
-> CTE Scan on cte (cost=0.00..2.02 rows=101 width=12) (actual time=49.737..14421.283 rows=99027 loops=1)
Output: cte.aid, cte.bid, cte.idx
Planning Time: 0.187 ms
Execution Time: 14501.446 ms
When run without `EXPLAIN ANALYZE`, the query completes in about 9.1 seconds.
Below is a revised version of my original solution that uses `SUBSTR` instead of `REGEXP_MATCH` to remove the braces from `PATH`. This version is about 40% faster than my original version.
EXPLAIN ANALYZE
WITH RECURSIVE
cte AS (SELECT DISTINCT 0 AS DEPTH, ARRAY [t.aid] AS PATH, ARRAY [t.aid] AS new_nodes
FROM test_table t
WHERE t.aid NOT IN (SELECT children.bid FROM test_table children)
UNION ALL
SELECT DEPTH + 1 AS DEPTH, cte.path || c.new_nodes AS PATH, c.new_nodes
FROM cte
CROSS JOIN LATERAL (SELECT ARRAY_AGG(DISTINCT t.bid ORDER BY t.bid) AS new_nodes
FROM test_table t
WHERE t.aid = ANY (cte.new_nodes)
AND NOT t.bid = ANY (cte.path)) c
WHERE c.new_nodes IS NOT NULL),
paths AS (SELECT DISTINCT ON (cte.path1) cte.path::text AS path
FROM cte
ORDER BY cte.path1, depth DESC)
SELECT SUBSTR(path, 2, LENGTH(path) - 2) AS path
FROM paths;
Running the above produced the following output:
Subquery Scan on paths (cost=313283983.23..313308991.30 rows=200 width=32) (actual time=1531.656..1581.667 rows=973 loops=1)
Output: substr(paths.path, 2, (length(paths.path) - 2))
CTE cte
-> Recursive Union (cost=3725.03..312465769.67 rows=5000914 width=68) (actual time=56.203..647.711 rows=100000 loops=1)
-> Subquery Scan on "SELECT 1" (cost=3725.03..4715.31 rows=49514 width=68) (actual time=56.202..56.655 rows=973 loops=1)
Output: 0, "SELECT 1".path, "SELECT 1".new_nodes
-> HashAggregate (cost=3725.03..4220.17 rows=49514 width=68) (actual time=56.201..56.539 rows=973 loops=1)
Output: (0), (ARRAY[t.aid]), (ARRAY[t.aid])
Group Key: 0, ARRAY[t.aid], ARRAY[t.aid]
Batches: 1 Memory Usage: 1809kB
-> Seq Scan on public.test_table t (cost=1676.84..3353.68 rows=49514 width=68) (actual time=33.802..55.664 rows=973 loops=1)
Output: 0, ARRAY[t.aid], ARRAY[t.aid]
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 98054
SubPlan 1
-> Seq Scan on public.test_table children (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.008..11.400 rows=99027 loops=1)
Output: children.bid
-> Nested Loop (cost=63.03..31236153.12 rows=495140 width=68) (actual time=0.007..0.879 rows=155 loops=639)
Output: (cte_1.depth + 1), (cte_1.path || (array_agg(DISTINCT t_1.bid ORDER BY t_1.bid))), (array_agg(DISTINCT t_1.bid ORDER BY t_1.bid))
-> WorkTable Scan on cte cte_1 (cost=0.00..9902.80 rows=495140 width=68) (actual time=0.000..0.022 rows=156 loops=639)
Output: cte_1.depth, cte_1.path, cte_1.new_nodes
-> Aggregate (cost=63.03..63.04 rows=1 width=32) (actual time=0.005..0.005 rows=1 loops=100000)
Output: array_agg(DISTINCT t_1.bid ORDER BY t_1.bid)
Filter: (array_agg(DISTINCT t_1.bid ORDER BY t_1.bid) IS NOT NULL)
Rows Removed by Filter: 0
-> Index Only Scan using test_table_by_aid on public.test_table t_1 (cost=0.29..62.98 rows=10 width=4) (actual time=0.003..0.003 rows=1 loops=100000)
Output: t_1.aid, t_1.bid
Index Cond: (t_1.aid = ANY (cte_1.new_nodes))
Filter: (t_1.bid <> ALL (cte_1.path))
Heap Fetches: 99027
-> Unique (cost=818213.56..843218.13 rows=200 width=40) (actual time=1531.651..1577.155 rows=973 loops=1)
Output: ((cte.path)::text), (cte.path1), cte.depth
-> Sort (cost=818213.56..830715.84 rows=5000914 width=40) (actual time=1531.650..1569.094 rows=100000 loops=1)
Output: ((cte.path)::text), (cte.path1), cte.depth
Sort Key: (cte.path1), cte.depth DESC
Sort Method: external merge Disk: 60456kB
-> CTE Scan on cte (cost=0.00..125022.85 rows=5000914 width=40) (actual time=56.210..1318.172 rows=100000 loops=1)
Output: (cte.path)::text, cte.path1, cte.depth
Planning Time: 0.163 ms
Execution Time: 1596.886 ms
This query is much faster, completing in about 1.5 seconds when run without `EXPLAIN ANALYZE`.
This final query is a modified version of Lemon's original. There are two changes; the subquery for `ordered_tab` has been removed, and the `ORDER BY` in the string aggregation uses `bid` instead of `idx`. The latter change has no measurable impact on performance, but removing the subquery for `ordered_tab` results in a significant improvement.
EXPLAIN ANALYZE
WITH RECURSIVE cte AS (SELECT aid, MIN(bid) AS bid, 0 AS idx
FROM test_table t1
WHERE NOT EXISTS(SELECT 1 FROM test_table t2 WHERE t1.aid = t2.bid)
GROUP BY aid
UNION ALL
SELECT cte.aid, tab.bid, idx + 1
FROM cte
INNER JOIN test_table tab
ON cte.bid = tab.aid)
SELECT CONCAT(aid, ',', STRING_AGG(bid::VARCHAR, ',' ORDER BY bid)) AS path
FROM cte
GROUP BY aid;
Running the above produced the following output:
GroupAggregate (cost=5309.41..5312.19 rows=101 width=36) (actual time=333.629..440.384 rows=973 loops=1)
Output: concat(cte.aid, ',', string_agg(((cte.bid)::character varying)::text, ','::text ORDER BY cte.bid)), cte.aid
Group Key: cte.aid
CTE cte
-> Recursive Union (cost=4467.74..5304.03 rows=101 width=12) (actual time=50.804..272.663 rows=99027 loops=1)
-> GroupAggregate (cost=4467.74..4467.76 rows=1 width=12) (actual time=50.803..51.153 rows=973 loops=1)
Output: t1.aid, min(t1.bid), 0
Group Key: t1.aid
-> Sort (cost=4467.74..4467.74 rows=1 width=8) (actual time=50.797..50.867 rows=973 loops=1)
Output: t1.aid, t1.bid
Sort Key: t1.aid
Sort Method: quicksort Memory: 70kB
-> Hash Anti Join (cost=2667.11..4467.73 rows=1 width=8) (actual time=24.178..50.525 rows=973 loops=1)
Output: t1.aid, t1.bid
Hash Cond: (t1.aid = t2.bid)
-> Seq Scan on public.test_table t1 (cost=0.00..1429.27 rows=99027 width=8) (actual time=0.007..8.416 rows=99027 loops=1)
Output: t1.aid, t1.bid
-> Hash (cost=1429.27..1429.27 rows=99027 width=4) (actual time=24.117..24.118 rows=99027 loops=1)
Output: t2.bid
Buckets: 131072 Batches: 1 Memory Usage: 4506kB
-> Seq Scan on public.test_table t2 (cost=0.00..1429.27 rows=99027 width=4) (actual time=0.005..10.645 rows=99027 loops=1)
Output: t2.bid
-> Nested Loop (cost=0.29..83.42 rows=10 width=12) (actual time=0.002..0.319 rows=154 loops=638)
Output: cte_1.aid, tab.bid, (cte_1.idx + 1)
-> WorkTable Scan on cte cte_1 (cost=0.00..0.20 rows=10 width=12) (actual time=0.000..0.015 rows=155 loops=638)
Output: cte_1.aid, cte_1.bid, cte_1.idx
-> Index Only Scan using test_table_by_aid on public.test_table tab (cost=0.29..8.31 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=99027)
Output: tab.aid, tab.bid
Index Cond: (tab.aid = cte_1.bid)
Heap Fetches: 98054
-> Sort (cost=5.38..5.63 rows=101 width=8) (actual time=333.566..343.512 rows=99027 loops=1)
Output: cte.aid, cte.bid
Sort Key: cte.aid
Sort Method: quicksort Memory: 7714kB
-> CTE Scan on cte (cost=0.00..2.02 rows=101 width=8) (actual time=50.806..308.782 rows=99027 loops=1)
Output: cte.aid, cte.bid
Planning Time: 0.189 ms
Execution Time: 448.352 ms
This query is the fastest of the three, completing in about 0.3 seconds when run without `EXPLAIN ANALYZE`. Removing the `ordered_tab` subquery and instead joining directly to `test_table` allows PostgreSQL to use an index only scan instead of a worktable scan.
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论