合并具有多个引用的用户,并计算其共同资产

huangapple go评论96阅读模式
英文:

Merging users with multiple refs and count their collective assets

问题

以下是您要翻译的内容:

There is a set of users. A person can have multiple users, but ref1 and ref2 might be alike and can therefore link users together. ref1 and ref2 does not overlap, one value in ref1 does not exist in ref2.

A user can own multiple assets. I want to "merge" users that has one or more refs alike and then count how many assets they own together. There could be missing entries in the user table, in that case I just want to propagate the owner into ref2 and set the asset_count and asset_ids.

Here is an example schema to illustrate:

Example assets

SELECT * FROM assets;

id name owner
1 #1 a
2 #2 b
3 #3 c
4 #4 a
5 #5 c
6 #6 d
7 #7 e
8 #8 d
9 #9 a
10 #10 a
11 #11 z

Example users

SELECT * FROM users;

id username ref1 ref2
1 bobo a d
2 toto b e
3 momo c d
4 lolo a f
5 popo c f

What I want to get in the end

SELECT * FROM results;

ids usernames refs1 refs2 asset_ids asset_count
1,3,4,5 bobo,momo,lolo,popo a,c d,f 1,3,4,5,6,8,9,10 8
2 toto b e 2,7 2
z 11 1

I've tried different approaches, but this is what I currently have:

Closest I have got

SELECT
ARRAY_AGG(DISTINCT u.id) AS ids,
ARRAY_AGG(DISTINCT u.username) AS usernames,
ARRAY_AGG(DISTINCT u.ref1) AS refs1,
ARRAY_AGG(DISTINCT u.ref2) AS refs2,
COUNT(DISTINCT a.id) AS asset_count
FROM assets a
JOIN users u ON a.owner = u.ref1 OR a.owner = u.ref2
GROUP BY a.owner
ORDER BY MIN(a.id);

ids usernames refs1 refs2 asset_count
1,4 bobo,lolo a d,f 4
2 toto b e 1
3,5 momo,popo c d,f 2
1,3 bobo,momo a,c d 2
2 toto b e 1

If I merge the above table on ids, I almost get the result I want (without the missing entries in the user table). The merging can easily be done in code, but then I cannot paginate etc. I want to to this in DB layer if possible.

I want either a solution to the problem or a good explanation of why it is not possible to do (with examples).

Please check out my DB Fiddle.

英文:

There is a set of users. A person can have multiple users, but ref1 and ref2 might be alike and can therefore link users together. ref1 and ref2 does not overlap, one value in ref1 does not exist in ref2.

A user can own multiple assets. I want to "merge" users that has one or more refs alike and then count how many assets they own together. There could be missing entries in the user table, in that case I just want to propagate the owner into ref2 and set the asset_count and asset_ids.

Here is an example schema to illustrate:

Example assets

  1. SELECT * FROM assets;
id name owner
1 #1 a
2 #2 b
3 #3 c
4 #4 a
5 #5 c
6 #6 d
7 #7 e
8 #8 d
9 #9 a
10 #10 a
11 #11 z

Example users

  1. SELECT * FROM users;
id username ref1 ref2
1 bobo a d
2 toto b e
3 momo c d
4 lolo a f
5 popo c f

What I want to get in the end

  1. SELECT * FROM results;
ids usernames refs1 refs2 asset_ids asset_count
1,3,4,5 bobo,momo,lolo,popo a,c d,f 1,3,4,5,6,8,9,10 8
2 toto b e 2,7 2
z 11 1

I've tried different approaches, but this is what I currently have:

Closest I have got

  1. SELECT
  2. ARRAY_AGG(DISTINCT u.id) AS ids,
  3. ARRAY_AGG(DISTINCT u.username) AS usernames,
  4. ARRAY_AGG(DISTINCT u.ref1) AS refs1,
  5. ARRAY_AGG(DISTINCT u.ref2) AS refs2,
  6. COUNT(DISTINCT a.id) AS asset_count
  7. FROM assets a
  8. JOIN users u ON a.owner = u.ref1 OR a.owner = u.ref2
  9. GROUP BY a.owner
  10. ORDER BY MIN(a.id);
ids usernames refs1 refs2 asset_count
1,4 bobo,lolo a d,f 4
2 toto b e 1
3,5 momo,popo c d,f 2
1,3 bobo,momo a,c d 2
2 toto b e 1

If I merge the above table on ids, I almost get the result I want (without the missing entries in the user table). The merging can easily be done in code, but then I cannot paginate etc. I want to to this in DB layer if possible.

I want either a solution to the problem or a good explanation of why it is not possible to do (with examples).

Please check out my DB Fiddle.

答案1

得分: 5

以下是您要求的翻译:

Part 1: 图遍历问题

根据共同引用识别用户集群类似于图遍历问题。这在SQL中是一个复杂的任务,需要使用递归查询。模式是将用户的引用解除连接以生成节点,然后识别边缘(具有共同引用的节点),最后在图中遍历(无循环)以生成群组。

在Postgres中,数组非常有用以聚合节点:

  1. with recursive
  2. nodes as (
  3. select u.id, r.ref
  4. from users u
  5. cross join lateral ( values (u.ref1), (u.ref2) ) r(ref)
  6. ),
  7. edges as (
  8. select distinct n1.id as id1, n2.id as id2
  9. from nodes n1
  10. inner join nodes n2 on n1.ref = n2.ref
  11. ),
  12. rcte as (
  13. select id1, id2, array[id1] as visited from edges where id1 = id2
  14. union all
  15. select r.id1, e.id2, r.visited || e.id2
  16. from rcte r
  17. inner join edges e on e.id1 = r.id2
  18. where e.id2 <> all(r.visited)
  19. ),
  20. groups as (
  21. select id1 as id, array_agg(distinct id2 order by id2) as ids
  22. from rcte
  23. group by id1
  24. )
  25. select * from groups order by id
id ids
1 {1,3,4,5}
2 {2}
3 {1,3,4,5}
4 {1,3,4,5}
5 {1,3,4,5}

Part 2: left join和聚合

现在我们已经识别出了群组,我们可以检查资产。由于您希望在结果中包含所有资产,我们从assets表开始,然后通过left join将用户和群组带入。我们仍然可以按用户组进行group by,这将所有孤立资产放在同一组中(组为null) - 这正是我们想要的。

最后一步是数组聚合; 将孤立资产的所有者“传播”到ref2可以使用case表达式处理。

  1. with recursive
  2. nodes as (...),
  3. edges as (...),
  4. rcte as (...),
  5. groups as (...)
  6. select g.ids,
  7. array_agg(distinct u.username) as usernames,
  8. array_agg(distinct u.ref1) as refs1,
  9. case when g.ids is null then array_agg(distinct a.owner) else array_agg(distinct u.ref2) end as refs2,
  10. array_agg(distinct a.id) as asset_ids,
  11. count(distinct a.id) as asset_count
  12. from assets a
  13. left join users u on a.owner in (u.ref1, u.ref2)
  14. left join groups g on g.id = u.id
  15. group by g.ids
ids usernames refs1 refs2 asset_ids asset_count
{1,3,4,5} {bobo,lolo,momo,popo} {a,c} {d,f} {1,3,4,5,6,8,9,10} 8
{2} {toto} {b} {e} {2,7} 2
null {NULL} {NULL} {z} {11} 1

附加内容:图遍历性能

性能在具有大量边缘的网络上可能会受到影响,特别是如果图中只有少数群集,每个群集包含许多用户。我们可以尝试优化查询以适应这种情况;思路是尝试通过在每次迭代时聚合每个用户的所有路径来限制需要遍历的路径数量。

此查询通过具有200个用户且都属于同一群集的情况**通过了您的测试案例**(而第一个查询会耗尽DB Fiddle的资源):

  1. with recursive
  2. nodes as (
  3. select u.id, r.ref
  4. from users u
  5. cross join lateral ( values (u.ref1), (u.ref2) ) r(ref)
  6. ),
  7. edges as (
  8. select distinct n1.id as id1, n2.id as id2
  9. from nodes n1
  10. inner join nodes n2 on n1.ref = n2.ref
  11. ),
  12. rcte as (
  13. select id1 as id, array[id1] as visited from edges where id1 = id2
  14. union all
  15. select id, array_agg(distinct visited) as visited
  16. from (
  17. select r.id, v.visited
  18. from rcte r
  19. inner join edges e on e.id1 = any(r.visited)
  20. cross join lateral unnest(r.visited || e.id2) v(visited)
  21. where e.id2 <> all(r.visited)
  22. ) as x
  23. group by id
  24. ),
  25. groups as (
  26. select distinct on (id) id, visited as ids
  27. from rcte
  28. order by id, array_length(visited, 1) desc
  29. )
  30. select * from groups g order by id

如果需要更多帮助,请告诉我。

英文:

There are two distinct parts to the question:

  • the first one is how to generate groups of users that have common references
  • the second part is how to distribute the assets in the groups, while taking in account orphan assets

Part 1 : a graph-walking problem

Identifying clusters of users based on common references reads like a graph-walking problem. That's a complex task in SQL, and requires a recursive query. The pattern is to unpivot users' references to generate nodes, then identify edges (nodes that have a ref in common), and finally walk through the graph (whitout looping) to generate groups.

In Postgres, arrays come handy to aggregate nodes:

  1. with recursive
  2. nodes as (
  3. select u.id, r.ref
  4. from users u
  5. cross join lateral ( values (u.ref1), (u.ref2) ) r(ref)
  6. ),
  7. edges as (
  8. select distinct n1.id as id1, n2.id as id2
  9. from nodes n1
  10. inner join nodes n2 on n1.ref = n2.ref
  11. ),
  12. rcte as (
  13. select id1, id2, array[id1] as visited from edges where id1 = id2
  14. union all
  15. select r.id1, e.id2, r.visited || e.id2
  16. from rcte r
  17. inner join edges e on e.id1 = r.id2
  18. where e.id2 &lt;&gt; all(r.visited)
  19. ),
  20. groups as (
  21. select id1 as id, array_agg(distinct id2 order by id2) as ids
  22. from rcte
  23. group by id1
  24. )
  25. select * from groups order by id
id ids
1 {1,3,4,5}
2 {2}
3 {1,3,4,5}
4 {1,3,4,5}
5 {1,3,4,5}

Part 2 : left join and aggregation

Now that we identified the groups, we can check for assets. Since you want all assets in the result, we start from the assets table, then bring the users and the groups with left joins. We can still group by the user groups, which puts all orphan assets in the same group (where the group is null) - that's exactly what we want.

The last step is array aggregation; the "propagation" of the owners of orphan assets to ref2 can be handled with a case expression.

  1. with recursive
  2. nodes as (...),
  3. edges as (...),
  4. rcte as (...),
  5. groups as (...)
  6. select g.ids,
  7. array_agg(distinct u.username) as usernames,
  8. array_agg(distinct u.ref1) as refs1,
  9. case when g.ids is null then array_agg(distinct a.owner) else array_agg(distinct u.ref2) end as refs2,
  10. array_agg(distinct a.id) as asset_ids,
  11. count(distinct a.id) as asset_count
  12. from assets a
  13. left join users u on a.owner in (u.ref1, u.ref2)
  14. left join groups g on g.id = u.id
  15. group by g.ids
ids usernames refs1 refs2 asset_ids asset_count
{1,3,4,5} {bobo,lolo,momo,popo} {a,c} {d,f} {1,3,4,5,6,8,9,10} 8
{2} {toto} {b} {e} {2,7} 2
null {NULL} {NULL} {z} {11} 1

Demo on DB Fiddlde


Add-on: Graph-walking performance

Performance will suffer on networks that have a lot of edges, especially if there are just few clusters in the graph, each containing with many users. We can try and optimize the query for such situation; the idea is to try and limit the number of paths that need to be walked, by aggregating all paths of each user at each iteration.

This query passes your test case with 200 users that all belong to the same cluster (whereas the first query exhausts the DB Fiddle resources):

  1. with recursive
  2. nodes as (
  3. select u.id, r.ref
  4. from users u
  5. cross join lateral ( values (u.ref1), (u.ref2) ) r(ref)
  6. ),
  7. edges as (
  8. select distinct n1.id as id1, n2.id as id2
  9. from nodes n1
  10. inner join nodes n2 on n1.ref = n2.ref
  11. ),
  12. rcte as (
  13. select id1 as id, array[id1] as visited from edges where id1 = id2
  14. union all
  15. select id, array_agg(distinct visited) as visited
  16. from (
  17. select r.id, v.visited
  18. from rcte r
  19. inner join edges e on e.id1 = any(r.visited)
  20. cross join lateral unnest(r.visited || e.id2) v(visited)
  21. where e.id2 &lt;&gt; all(r.visited)
  22. ) as x
  23. group by id
  24. ),
  25. groups as (
  26. select distinct on (id) id, visited as ids
  27. from rcte
  28. order by id, array_length(visited, 1) desc
  29. )
  30. select * from groups g order by id

答案2

得分: 2

这是在SQL中仅使用基于集合的操作来解决的情况困难(通常非常慢)。所以我在一个PL/pgSQL函数中循环执行:

  1. CREATE OR REPLACE FUNCTION f_merged_users()
  2. RETURNS TABLE (
  3. ids int[] -- adapt to your actual data types !!!
  4. , usernames text[]
  5. , refs1 text[]
  6. , refs2 text[]
  7. , asset_ids int[]
  8. , asset_count int
  9. )
  10. LANGUAGE plpgsql AS
  11. $func$
  12. BEGIN
  13. /*
  14. -- set enough temp buffers, so temp tables don&#39;t spill to disk (optional)
  15. -- NOTE: this can&#39;t be used after temporary tables have been accessed in the session
  16. -- using 2x size of users table. adapt to your needs!
  17. PERFORM set_config(&#39;temp_buffers&#39;, (pg_table_size(&#39;users&#39;) * 2 / 8096)::text, true)
  18. FROM pg_settings s
  19. WHERE s.name = &#39;temp_buffers&#39;
  20. AND pg_table_size(&#39;users&#39;) * 2 / 8096 &gt; s.setting::bigint
  21. AND s.source = &#39;default&#39;;
  22. */
  23. -- create two temp tables: one for ref1, one for ref2
  24. -- based on your information that ref1 &amp; ref2 don&#39;t overlap
  25. CREATE TEMP TABLE r1 ON COMMIT DROP AS
  26. SELECT ARRAY[ref1] AS r, array_agg(id) AS i
  27. FROM (TABLE users ORDER BY id) t
  28. GROUP BY ref1
  29. ORDER BY ref1;
  30. CREATE TEMP TABLE r2 ON COMMIT DROP AS
  31. SELECT ARRAY[ref2] AS r, array_agg(id) AS i
  32. FROM (TABLE users ORDER BY id) t
  33. GROUP BY ref2
  34. ORDER BY ref2;
  35. -- fold rows where a common link in the other table exists
  36. -- achieve that by deleting rows and inserting the merged results
  37. -- loop back and forth until no more rows can be folded
  38. LOOP
  39. WITH d AS (
  40. DELETE FROM r2
  41. USING r1
  42. WHERE EXISTS (
  43. SELECT FROM r2 r0
  44. WHERE r1.i &amp;&amp; r2.i
  45. AND r1.i &amp;&amp; r0.i
  46. AND r2.ctid &lt;&gt; r0.ctid
  47. )
  48. RETURNING r1.ctid, r2.*
  49. )
  50. INSERT INTO r2
  51. SELECT r.r, i.i
  52. FROM (
  53. SELECT ctid, array_agg(ref ORDER BY ref) AS r
  54. FROM d, unnest(d.r) ref
  55. GROUP BY 1
  56. ) r
  57. JOIN (
  58. SELECT ctid, array_agg(id ORDER BY id) AS i
  59. FROM d, unnest(d.i) id
  60. GROUP BY 1
  61. ) i USING (ctid);
  62. EXIT WHEN NOT FOUND; -- no folding happened, stop loop
  63. WITH d AS (
  64. DELETE FROM r1
  65. USING r2
  66. WHERE EXISTS (
  67. SELECT FROM r1 r0
  68. WHERE r2.i &amp;&amp; r1.i
  69. AND r2.i &amp;&amp; r0.i
  70. AND r1.ctid &lt;&gt; r0.ctid
  71. )
  72. RETURNING r2.ctid, r1.*
  73. )
  74. INSERT INTO r1
  75. SELECT r.r, i.i
  76. FROM (
  77. SELECT ctid, array_agg(ref ORDER BY ref) AS r
  78. FROM d, unnest(d.r) ref
  79. GROUP BY 1
  80. ) r
  81. JOIN (
  82. SELECT ctid, array_agg(id ORDER BY id) AS i
  83. FROM d, unnest(d.i) id
  84. GROUP BY 1
  85. ) i USING (ctid);
  86. EXIT WHEN NOT FOUND; -- no folding happened, stop loop
  87. END LOOP;
  88. -- output result
  89. RETURN QUERY
  90. (
  91. SELECT i -- AS ids
  92. , u.usernames
  93. , r1.r -- AS refs1
  94. , r2.r -- AS refs2
  95. , a.asset_ids
  96. , cardinality(a.asset_ids) -- AS asset_count
  97. FROM r1
  98. JOIN r2 USING (i)
  99. CROSS JOIN LATERAL (
  100. SELECT ARRAY (
  101. SELECT username
  102. FROM users u
  103. WHERE u.id = ANY (i)
  104. ORDER BY u.id
  105. ) AS usernames
  106. ) u
  107. CROSS JOIN LATERAL (
  108. SELECT ARRAY (
  109. SELECT a.id
  110. FROM assets a
  111. WHERE a.owner = ANY (r1.r || r2.r)
  112. ORDER BY a.id
  113. ) AS asset_ids
  114. ) a
  115. ORDER BY i
  116. )
  117. UNION ALL -- add &quot;missing entries in the user table&quot;
  118. (
  119. SELECT null -- AS ids
  120. , null -- AS usernames
  121. , null -- AS refs1
  122. , ARRAY[a.owner] -- AS refs2
  123. , ARRAY[a.id] -- AS ids
  124. , 1 -- AS asset_count
  125. FROM assets a
  126. WHERE NOT EXISTS (
  127. SELECT FROM users u
  128. WHERE a.owner IN (u.ref1, u.ref2)
  129. )
  130. ORDER BY a.id
  131. );
  132. END
  133. $func$;

fiddle

Call:

  1. SELECT * FROM f_merged_users();

查看内联注释以获取解释。

英文:

This is hard (and typically very slow) to solve with only set-based operations in SQL. So I loop in a PL/pgSQL function:

  1. CREATE OR REPLACE FUNCTION f_merged_users()
  2. RETURNS TABLE (
  3. ids int[] -- adapt to your actual data types !!!
  4. , usernames text[]
  5. , refs1 text[]
  6. , refs2 text[]
  7. , asset_ids int[]
  8. , asset_count int
  9. )
  10. LANGUAGE plpgsql AS
  11. $func$
  12. BEGIN
  13. /*
  14. -- set enough temp buffers, so temp tables don&#39;t spill to disk (optional)
  15. -- NOTE: this can&#39;t be used after temporary tables have been accessed in the session
  16. -- using 2x size of users table. adapt to your needs!
  17. -- only fires when the new size is bigger, and it has not been set manually, yet
  18. PERFORM set_config(&#39;temp_buffers&#39;, (pg_table_size(&#39;users&#39;) * 2 / 8096)::text, true)
  19. FROM pg_settings s
  20. WHERE s.name = &#39;temp_buffers&#39;
  21. AND pg_table_size(&#39;users&#39;) * 2 / 8096 &gt; s.setting::bigint
  22. AND s.source = &#39;default&#39;;
  23. */
  24. -- create two temp tables: one for ref1, one for ref2
  25. -- based on your information that ref1 &amp; ref2 don&#39;t overlap
  26. CREATE TEMP TABLE r1 ON COMMIT DROP AS
  27. SELECT ARRAY[ref1] AS r, array_agg(id) AS i
  28. FROM (TABLE users ORDER BY id) t
  29. GROUP BY ref1
  30. ORDER BY ref1;
  31. CREATE TEMP TABLE r2 ON COMMIT DROP AS
  32. SELECT ARRAY[ref2] AS r, array_agg(id) AS i
  33. FROM (TABLE users ORDER BY id) t
  34. GROUP BY ref2
  35. ORDER BY ref2;
  36. -- fold rows where a common link in the other table exists
  37. -- achieve that by deleting rows and inserting the merged results
  38. -- loop back and forth until no more rows can be folded
  39. LOOP
  40. WITH d AS (
  41. DELETE FROM r2
  42. USING r1
  43. WHERE EXISTS (
  44. SELECT FROM r2 r0
  45. WHERE r1.i &amp;&amp; r2.i
  46. AND r1.i &amp;&amp; r0.i
  47. AND r2.ctid &lt;&gt; r0.ctid
  48. )
  49. RETURNING r1.ctid, r2.*
  50. )
  51. INSERT INTO r2
  52. SELECT r.r, i.i
  53. FROM (
  54. SELECT ctid, array_agg(ref ORDER BY ref) AS r
  55. FROM d, unnest(d.r) ref
  56. GROUP BY 1
  57. ) r
  58. JOIN (
  59. SELECT ctid, array_agg(id ORDER BY id) AS i
  60. FROM d, unnest(d.i) id
  61. GROUP BY 1
  62. ) i USING (ctid);
  63. EXIT WHEN NOT FOUND; -- no folding happened, stop loop
  64. WITH d AS (
  65. DELETE FROM r1
  66. USING r2
  67. WHERE EXISTS (
  68. SELECT FROM r1 r0
  69. WHERE r2.i &amp;&amp; r1.i
  70. AND r2.i &amp;&amp; r0.i
  71. AND r1.ctid &lt;&gt; r0.ctid
  72. )
  73. RETURNING r2.ctid, r1.*
  74. )
  75. INSERT INTO r1
  76. SELECT r.r, i.i
  77. FROM (
  78. SELECT ctid, array_agg(ref ORDER BY ref) AS r
  79. FROM d, unnest(d.r) ref
  80. GROUP BY 1
  81. ) r
  82. JOIN (
  83. SELECT ctid, array_agg(id ORDER BY id) AS i
  84. FROM d, unnest(d.i) id
  85. GROUP BY 1
  86. ) i USING (ctid);
  87. EXIT WHEN NOT FOUND; -- no folding happened, stop loop
  88. END LOOP;
  89. -- output result
  90. RETURN QUERY
  91. (
  92. SELECT i -- AS ids
  93. , u.usernames
  94. , r1.r -- AS refs1
  95. , r2.r -- AS refs2
  96. , a.asset_ids
  97. , cardinality(a.asset_ids) -- AS asset_count
  98. FROM r1
  99. JOIN r2 USING (i)
  100. CROSS JOIN LATERAL (
  101. SELECT ARRAY (
  102. SELECT username
  103. FROM users u
  104. WHERE u.id = ANY (i)
  105. ORDER BY u.id
  106. ) AS usernames
  107. ) u
  108. CROSS JOIN LATERAL (
  109. SELECT ARRAY (
  110. SELECT a.id
  111. FROM assets a
  112. WHERE a.owner = ANY (r1.r || r2.r)
  113. ORDER BY a.id
  114. ) AS asset_ids
  115. ) a
  116. ORDER BY i
  117. )
  118. UNION ALL -- add &quot;missing entries in the user table&quot;
  119. (
  120. SELECT null -- AS ids
  121. , null -- AS usernames
  122. , null -- AS refs1
  123. , ARRAY[a.owner] -- AS refs2
  124. , ARRAY[a.id] -- AS ids
  125. , 1 -- AS asset_count
  126. FROM assets a
  127. WHERE NOT EXISTS (
  128. SELECT FROM users u
  129. WHERE a.owner IN (u.ref1, u.ref2)
  130. )
  131. ORDER BY a.id
  132. );
  133. END
  134. $func$;

fiddle

Call:

  1. SELECT * FROM f_merged_users();

See inline comments with explanation.

The general idea is to avoid walking potentially lengthy graphs one step at time, and do as much set-based work as possible. The best solution heavily depends on actual data distribution.

Based on your information that ref1 an ref2 don't overlap, I create two temp tables: one for ref1, one for ref2. The initial queries aggregate all users on the same resource. That takes care of many users on the same resource (same person) right away. You commented that you have a lot of those.

In every iteration of the loop, I merge all rows in the other temp table that are linked together by the same row in this temp table, and vice versa. The two queries are completely symmetrical. (It may be more efficient to start with one or the other.) We are done as soon as one query does not find any more rows to merge.

Then return the final set with all details, and append "missing entries in the user table" like you added in the question.

I have streamlined the workflow, but DELETE and INSERT are comparatively expensive operations. Make sure you allow enough temporary buffers so that the two temp tables can operate in RAM exclusively. I added a bit of clever automatic setting to the top. But you may not need that if your temp_buffer setting is high enough anyway. And the automatic setting can't be used after you have accessed any temp tables in the same session. See:

The two temp tables are dropped at the end of the transaction automatically since created with ON COMMIT DROP.

Performance might be optimized some more with indexes on the temp tables and some more fine-tuning. In this case, it may also help to run ANALYZE (one or more times) on the temp tables. See:

About the ctid I used:

答案3

得分: 2

以下是您要翻译的内容:

"你现在展示的已经是一个有效的图表。使用 pgRouting,您可以直接查询它,如下所示:

  1. select chr(component::int) as &quot;the_group&quot;,
  2. array_agg(chr(node::int)) as &quot;ids&quot;
  3. from pgr_connectedComponents(
  4. &#39;SELECT id,
  5. ascii(ref1) as source,
  6. ascii(ref2) as target,
  7. 1 as cost,
  8. 1 as reverse_cost
  9. FROM users&#39;)
  10. group by component;
  11. -- the_group | ids
  12. -------------+-----------
  13. -- a | {a,c,d,f}
  14. -- b | {b,e}

ascii()chr() 用于将您的单字符 ref 转换为扩展所期望的 integer,并反向转换回去:ascii(&#39;a&#39;)=97chr(&#39;97&#39;)=&#39;a&#39;。在数百万用户组成的链接组合中进行测试,链接数量和长度各不相同,它只需要几秒钟,且在单个 CPU 上占用极少的内存。在底层,它是一个 Boost C++DFS

db<>fiddledb-fiddlesqlfiddle 不提供 pgrouting,这就是为什么我无法提供在线演示的原因,但这里是我用来测试的测试数据生成器。


一旦 SQL:2023SQL/PGQ 成为 PostgreSQL 的一部分,您可以期待有一个内置机制。

英文:

What you're showing is already a valid graph. With pgRouting, you can directly query it as such:

  1. select chr(component::int) as &quot;the_group&quot;,
  2. array_agg(chr(node::int)) as &quot;ids&quot;
  3. from pgr_connectedComponents(
  4. &#39;SELECT id,
  5. ascii(ref1) as source,
  6. ascii(ref2) as target,
  7. 1 as cost,
  8. 1 as reverse_cost
  9. FROM users&#39;)
  10. group by component;
  11. -- the_group | ids
  12. -------------+-----------
  13. -- a | {a,c,d,f}
  14. -- b | {b,e}

ascii() and chr() are there to translate your 1-character refs to integers expected by the extension and back: ascii(&#39;a&#39;)=97, chr(&#39;97&#39;)=&#39;a&#39;. Tested on a combination of a few million users forming links of varying number and length, it takes seconds and negligible amounts of memory on a single CPU. Under the hood, it's a Boost C++ DFS.

db<>fiddle, db-fiddle and sqlfiddle don't offer pgrouting, which is why I couldn't attach an online demo, but here's the test data generator I tortured this with.


Once SQL:2023's SQL/PGQ makes its way into a PostgreSQL release, you can expect a built-in mechanism instead.

huangapple
  • 本文由 发表于 2023年5月26日 16:28:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76339018.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定