PSQL / SQL:是否可以进一步优化此查询,而不需要对数据库进行写访问?

huangapple go评论90阅读模式
英文:

PSQL / SQL: Is it possible to further optimize this query with requiring write access to the database?

问题

以下是您的翻译:

我这里有一个查询,其中使用了四个子查询在单个CTE中,每个子查询都在自身的每一行中扫描另一个CTE的每一行。我认为这样非常低效。

在概念验证完成后,是否有任何SQL优化可以实施?由于我没有对数据库的写入访问权限,所以优化必须在select子句中进行。

查询分析结果:

查询计划
在一个CTE上进行子查询扫描 (cost=40875.45..7501783.95 rows=1000 width=68) (实际时间=1553.471..13613.116 行=231 循环=1)
输出: preprocessed.week_start, (preprocessed.actual_users + preprocessed.cumulative_churned_users), preprocessed.cumulative_churned_users, ((preprocessed.cumulative_churned_users)::double precision / NULLIF(((preprocessed.actual_users + preprocessed.cumulative_churned_users))::double precision, '0'::double precision)), preprocessed.actual_users, preprocessed.weekly_new_users, preprocessed.weekly_churned_users, preprocessed.weekly_active_users, ((preprocessed.weekly_churned_users)::double precision / NULLIF((preprocessed.actual_users)::double precision, '0'::double precision))
缓冲区: 共享命中=287734 读取=1964,临时读取=274840 写入=873
CTE all_users
-> 附加 (cost=0.00..30953.99 rows=70293 width=32) (实际时间=0.099..1313.372 行=71228 循环=1)
缓冲区: 共享命中=285995 读取=1964
-> 顺序扫描 org_accounts.users (cost=0.00..27912.65 rows=70009 width=32) (实际时间=0.099..1289.469 行=70007 循环=1)
输出: users.id, users.registered_at, NULL::timestamp with time zone
过滤条件: ((users.email !~~ '%@mailinator%'::text) AND (users.email !~~ '%@org%'::text) AND (users.email !~~ '%testaccnt%'::text) AND (users.status = 'active'::text) AND ((org_accounts.__user_is_qa(users.id))::text <> 'Y'::text))
过滤后的行数: 9933
缓冲区: 共享命中=285269 读取=1964
-> 顺序扫描 org_accounts.deleted_users (cost=0.00..1986.94 rows=284 width=32) (实际时间=0.014..14.267 行=1221 循环=1)
输出: deleted_users.id, deleted_users.created_at, deleted_users.deleted_at
过滤条件: ((deleted_users.email !~~ '%@mailinator%'::text) AND (deleted_users.email !~~ '%@org%'::text) AND (deleted_users.email !~~ '%testaccnt%'::text) AND (deleted_users.deleter_id = deleted_users.id))
过滤后的行数: 61826
缓冲区: 共享命中=726
-> 左连接合并 (cost=9921.47..7470794.97 rows=1000 width=44) (实际时间=1553.467..13612.496 行=231 循环=1)
输出: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date), (子查询 2), (子查询 3), (子查询 4), (子查询 5), COALESCE(weekly_activity.weekly_active_users, '0'::bigint)
内部唯一: true
合并条件: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date) = weekly_activity.week_start
缓冲区: 共享命中=287734 读取=1964,临时读取=274840 写入=873
-> 排序 (cost=1601.45..1603.95 rows=1000 width=4) (实际时间=10.108..10.250 行=231 循环=1)
输出: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date)
排序键: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date) DESC
排序方法: quicksort 内存: 35kB
缓冲区: 共享命中=726
-> 结果 (cost=1514.10..1541.62 rows=1000 width=4) (实际时间=9.986..10.069 行=231 循环=1)
输出: ((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date
缓冲区: 共享命中=726
初始化计划 6 (返回 $5)
-> 聚合 (cost=1514.09..1514.10 rows=1 width=8) (实际时间=9.974..9.975 行=1 循环=1)
输出: min(deleted_users_1.created_at)
缓冲区: 共享命中=726
-> 顺序扫描 org_accounts.deleted_users deleted_users_1 (cost=0.00..1356.47 rows=63047 width=8) (实际时间=0.006..4.332 行=63047 循环=1)
输出: deleted_users_1.id, deleted_users_1.email, deleted_users_1.created_at, deleted_users_1.deleter_id, deleted_users_1.deleted_at, deleted_users_1.registration_app
缓冲区: 共享命中=726
-> 项目集 (cost=0.00..5.03 rows=1000 width=8) (实际时间=9.984..10.030 行=231 循环=1)
输出:

英文:

I have a query here that uses four subqueries inside a single CTE, and each subquery is scanning every row of another CTE for each row in itself. I would think that this is very inefficient.

Are there any SQL optimizations that I can implement now that the proof of concept is finished?
I don't have write access to the database, so optimizations would be required within the select clause.

  1. WITH datetable AS (
  2. SELECT generate_series(
  3. DATE_TRUNC(&#39;week&#39;, (SELECT MIN(created_at) FROM org_accounts.deleted_users)),
  4. DATE_TRUNC(&#39;week&#39;, now()),
  5. &#39;1 week&#39;::INTERVAL
  6. )::DATE AS week_start
  7. ), all_users AS (
  8. SELECT
  9. id,
  10. registered_at,
  11. NULL AS deleted_at
  12. FROM org_accounts.users
  13. WHERE status = &#39;active&#39;
  14. AND org_accounts.__user_is_qa(id) &lt;&gt; &#39;Y&#39;
  15. AND email NOT LIKE &#39;%@org%&#39;
  16. UNION ALL
  17. SELECT
  18. id,
  19. created_at AS registered_at,
  20. deleted_at
  21. FROM org_accounts.deleted_users
  22. WHERE deleter_id = id
  23. AND email NOT LIKE &#39;%@org%&#39;
  24. ), weekly_activity AS (
  25. SELECT
  26. DATE_TRUNC(&#39;week&#39;, date)::DATE AS week_start,
  27. COUNT(DISTINCT user_id) AS weekly_active_users
  28. FROM (
  29. SELECT user_id, date
  30. FROM org_storage_extra.stats_user_daily_counters
  31. WHERE type in (&#39;created_file&#39;, &#39;created_folder&#39;, &#39;created_secure_fetch&#39;)
  32. UNION ALL
  33. SELECT user_id, date
  34. FROM ipfs_pinning_facility.stats_user_daily_counters
  35. WHERE type &lt;&gt; &#39;shares_viewed_by_others&#39;
  36. ) activity_ids_dates
  37. WHERE EXISTS(SELECT 1 from all_users WHERE id = user_id)
  38. GROUP BY week_start
  39. ), preprocessed AS (
  40. SELECT
  41. week_start,
  42. (
  43. SELECT COUNT(DISTINCT id)
  44. FROM all_users
  45. WHERE registered_at &lt; week_start
  46. AND (deleted_at IS NULL OR deleted_at &gt; week_start)
  47. ) AS actual_users,
  48. (
  49. SELECT COUNT(DISTINCT id)
  50. FROM all_users
  51. WHERE deleted_at &lt; week_start + &#39;1 week&#39;::INTERVAL
  52. ) AS cumulative_churned_users,
  53. (
  54. SELECT COUNT(DISTINCT id)
  55. FROM all_users
  56. WHERE registered_at &gt;= week_start
  57. AND registered_at &lt; week_start + &#39;1 week&#39;::INTERVAL
  58. ) AS weekly_new_users,
  59. (
  60. SELECT COUNT(DISTINCT id)
  61. FROM all_users
  62. WHERE deleted_at &gt;= week_start
  63. AND deleted_at &lt; week_start + &#39;1 week&#39;::INTERVAL
  64. ) AS weekly_churned_users,
  65. COALESCE(weekly_active_users, 0) AS weekly_active_users
  66. FROM datetable dt
  67. LEFT JOIN weekly_activity USING (week_start)
  68. ORDER BY week_start DESC
  69. )
  70. SELECT
  71. week_start AS for_week_of,
  72. actual_users + cumulative_churned_users AS cumulative_users,
  73. cumulative_churned_users,
  74. cumulative_churned_users::FLOAT / NULLIF((actual_users + cumulative_churned_users)::FLOAT, 0) AS cumulated_churn_rate,
  75. actual_users,
  76. weekly_new_users,
  77. weekly_churned_users,
  78. weekly_active_users,
  79. weekly_churned_users::FLOAT / NULLIF(actual_users::FLOAT, 0) AS weekly_churn_rate
  80. FROM preprocessed;

Results of query analysis:

  1. QUERY PLAN
  2. Subquery Scan on preprocessed (cost=40875.45..7501783.95 rows=1000 width=68) (actual time=1553.471..13613.116 rows=231 loops=1)
  3. Output: preprocessed.week_start, (preprocessed.actual_users + preprocessed.cumulative_churned_users), preprocessed.cumulative_churned_users, ((preprocessed.cumulative_churned_users)::double precision / NULLIF(((preprocessed.actual_users + preprocessed.cumulative_churned_users))::double precision, &#39;0&#39;::double precision)), preprocessed.actual_users, preprocessed.weekly_new_users, preprocessed.weekly_churned_users, preprocessed.weekly_active_users, ((preprocessed.weekly_churned_users)::double precision / NULLIF((preprocessed.actual_users)::double precision, &#39;0&#39;::double precision))
  4. Buffers: shared hit=287734 read=1964, temp read=274840 written=873
  5. CTE all_users
  6. -&gt; Append (cost=0.00..30953.99 rows=70293 width=32) (actual time=0.099..1313.372 rows=71228 loops=1)
  7. Buffers: shared hit=285995 read=1964
  8. -&gt; Seq Scan on org_accounts.users (cost=0.00..27912.65 rows=70009 width=32) (actual time=0.099..1289.469 rows=70007 loops=1)
  9. Output: users.id, users.registered_at, NULL::timestamp with time zone
  10. Filter: ((users.email !~~ &#39;%@mailinator%&#39;::text) AND (users.email !~~ &#39;%@org%&#39;::text) AND (users.email !~~ &#39;%testaccnt%&#39;::text) AND (users.status = &#39;active&#39;::text) AND ((org_accounts.__user_is_qa(users.id))::text &lt;&gt; &#39;Y&#39;::text))
  11. Rows Removed by Filter: 9933
  12. Buffers: shared hit=285269 read=1964
  13. -&gt; Seq Scan on org_accounts.deleted_users (cost=0.00..1986.94 rows=284 width=32) (actual time=0.014..14.267 rows=1221 loops=1)
  14. Output: deleted_users.id, deleted_users.created_at, deleted_users.deleted_at
  15. Filter: ((deleted_users.email !~~ &#39;%@mailinator%&#39;::text) AND (deleted_users.email !~~ &#39;%@org%&#39;::text) AND (deleted_users.email !~~ &#39;%testaccnt%&#39;::text) AND (deleted_users.deleter_id = deleted_users.id))
  16. Rows Removed by Filter: 61826
  17. Buffers: shared hit=726
  18. -&gt; Merge Left Join (cost=9921.47..7470794.97 rows=1000 width=44) (actual time=1553.467..13612.496 rows=231 loops=1)
  19. Output: (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date), (SubPlan 2), (SubPlan 3), (SubPlan 4), (SubPlan 5), COALESCE(weekly_activity.weekly_active_users, &#39;0&#39;::bigint)
  20. Inner Unique: true
  21. Merge Cond: ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) = weekly_activity.week_start)
  22. Buffers: shared hit=287734 read=1964, temp read=274840 written=873
  23. -&gt; Sort (cost=1601.45..1603.95 rows=1000 width=4) (actual time=10.108..10.250 rows=231 loops=1)
  24. Output: (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)
  25. Sort Key: (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) DESC
  26. Sort Method: quicksort Memory: 35kB
  27. Buffers: shared hit=726
  28. -&gt; Result (cost=1514.10..1541.62 rows=1000 width=4) (actual time=9.986..10.069 rows=231 loops=1)
  29. Output: ((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date
  30. Buffers: shared hit=726
  31. InitPlan 6 (returns $5)
  32. -&gt; Aggregate (cost=1514.09..1514.10 rows=1 width=8) (actual time=9.974..9.975 rows=1 loops=1)
  33. Output: min(deleted_users_1.created_at)
  34. Buffers: shared hit=726
  35. -&gt; Seq Scan on org_accounts.deleted_users deleted_users_1 (cost=0.00..1356.47 rows=63047 width=8) (actual time=0.006..4.332 rows=63047 loops=1)
  36. Output: deleted_users_1.id, deleted_users_1.email, deleted_users_1.created_at, deleted_users_1.deleter_id, deleted_users_1.deleted_at, deleted_users_1.registration_app
  37. Buffers: shared hit=726
  38. -&gt; ProjectSet (cost=0.00..5.03 rows=1000 width=8) (actual time=9.984..10.030 rows=231 loops=1)
  39. Output: generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)
  40. Buffers: shared hit=726
  41. -&gt; Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
  42. -&gt; Sort (cost=8320.02..8320.52 rows=200 width=12) (actual time=1475.315..1475.418 rows=159 loops=1)
  43. Output: weekly_activity.weekly_active_users, weekly_activity.week_start
  44. Sort Key: weekly_activity.week_start DESC
  45. Sort Method: quicksort Memory: 32kB
  46. Buffers: shared hit=287008 read=1964, temp read=412 written=872
  47. -&gt; Subquery Scan on weekly_activity (cost=8050.90..8312.37 rows=200 width=12) (actual time=1466.686..1475.279 rows=159 loops=1)
  48. Output: weekly_activity.weekly_active_users, weekly_activity.week_start
  49. Buffers: shared hit=287008 read=1964, temp read=412 written=872
  50. -&gt; GroupAggregate (cost=8050.90..8310.37 rows=200 width=12) (actual time=1466.685..1475.254 rows=159 loops=1)
  51. Output: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date), count(DISTINCT &quot;*SELECT* 1&quot;.user_id)
  52. Group Key: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date)
  53. Buffers: shared hit=287008 read=1964, temp read=412 written=872
  54. -&gt; Sort (cost=8050.90..8136.22 rows=34130 width=20) (actual time=1466.668..1468.872 rows=23005 loops=1)
  55. Output: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date), &quot;*SELECT* 1&quot;.user_id
  56. Sort Key: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date)
  57. Sort Method: quicksort Memory: 2566kB
  58. Buffers: shared hit=287008 read=1964, temp read=412 written=872
  59. -&gt; Hash Join (cost=1586.09..5481.12 rows=34130 width=20) (actual time=1411.350..1462.022 rows=23005 loops=1)
  60. Output: (date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date, &quot;*SELECT* 1&quot;.user_id
  61. Inner Unique: true
  62. Hash Cond: (&quot;*SELECT* 1&quot;.user_id = all_users.id)
  63. Buffers: shared hit=287008 read=1964, temp read=412 written=872
  64. -&gt; Append (cost=0.00..3080.17 rows=68261 width=20) (actual time=0.010..25.441 rows=68179 loops=1)
  65. Buffers: shared hit=1013
  66. -&gt; Subquery Scan on &quot;*SELECT* 1&quot; (cost=0.00..1018.43 rows=21568 width=20) (actual time=0.008..7.895 rows=21532 loops=1)
  67. Output: &quot;*SELECT* 1&quot;.date, &quot;*SELECT* 1&quot;.user_id
  68. Buffers: shared hit=372
  69. -&gt; Seq Scan on org_storage_extra.stats_user_daily_counters (cost=0.00..802.75 rows=21568 width=20) (actual time=0.008..5.910 rows=21532 loops=1)
  70. Output: stats_user_daily_counters.user_id, stats_user_daily_counters.date
  71. Filter: (stats_user_daily_counters.type = ANY (&#39;{created_file,created_folder,created_secure_fetch}&#39;::text[]))
  72. Rows Removed by Filter: 9795
  73. Buffers: shared hit=372
  74. -&gt; Subquery Scan on &quot;*SELECT* 2&quot; (cost=0.00..1720.44 rows=46693 width=20) (actual time=0.009..12.460 rows=46647 loops=1)
  75. Output: &quot;*SELECT* 2&quot;.date, &quot;*SELECT* 2&quot;.user_id
  76. Buffers: shared hit=641
  77. -&gt; Seq Scan on ipfs_pinning_facility.stats_user_daily_counters stats_user_daily_counters_1 (cost=0.00..1253.51 rows=46693 width=20) (actual time=0.009..8.209 rows=46647 loops=1)
  78. Output: stats_user_daily_counters_1.user_id, stats_user_daily_counters_1.date
  79. Filter: (stats_user_daily_counters_1.type &lt;&gt; &#39;shares_viewed_by_others&#39;::text)
  80. Rows Removed by Filter: 2354
  81. Buffers: shared hit=641
  82. -&gt; Hash (cost=1583.59..1583.59 rows=200 width=16) (actual time=1411.250..1411.251 rows=71228 loops=1)
  83. Output: all_users.id
  84. Buckets: 131072 (originally 1024) Batches: 2 (originally 1) Memory Usage: 3073kB
  85. Buffers: shared hit=285995 read=1964, temp read=100 written=717
  86. -&gt; HashAggregate (cost=1581.59..1583.59 rows=200 width=16) (actual time=1383.986..1398.270 rows=71228 loops=1)
  87. Output: all_users.id
  88. Group Key: all_users.id
  89. Batches: 5 Memory Usage: 4161kB Disk Usage: 1544kB
  90. Buffers: shared hit=285995 read=1964, temp read=100 written=560
  91. -&gt; CTE Scan on all_users (cost=0.00..1405.86 rows=70293 width=16) (actual time=0.102..1351.241 rows=71228 loops=1)
  92. Output: all_users.id
  93. Buffers: shared hit=285995 read=1964, temp written=296
  94. SubPlan 2
  95. -&gt; Aggregate (cost=1777.05..1777.06 rows=1 width=8) (actual time=20.197..20.197 rows=1 loops=231)
  96. Output: count(DISTINCT all_users_1.id)
  97. Buffers: temp read=68607 written=1
  98. -&gt; CTE Scan on all_users all_users_1 (cost=0.00..1757.33 rows=7888 width=16) (actual time=0.883..10.874 rows=27239 loops=231)
  99. Output: all_users_1.id, all_users_1.registered_at, all_users_1.deleted_at
  100. Filter: ((all_users_1.registered_at &lt; (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)) AND ((all_users_1.deleted_at IS NULL) OR (all_users_1.deleted_at &gt; (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date))))
  101. Rows Removed by Filter: 43989
  102. Buffers: temp read=68607 written=1
  103. SubPlan 3
  104. -&gt; Aggregate (cost=1815.90..1815.91 rows=1 width=8) (actual time=11.215..11.215 rows=1 loops=231)
  105. Output: count(DISTINCT all_users_2.id)
  106. Buffers: temp read=68607
  107. -&gt; CTE Scan on all_users all_users_2 (cost=0.00..1757.33 rows=23431 width=16) (actual time=11.009..11.150 rows=231 loops=231)
  108. Output: all_users_2.id, all_users_2.registered_at, all_users_2.deleted_at
  109. Filter: (all_users_2.deleted_at &lt; ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) + &#39;7 days&#39;::interval))
  110. Rows Removed by Filter: 70997
  111. Buffers: temp read=68607
  112. SubPlan 4
  113. -&gt; Aggregate (cost=1933.94..1933.95 rows=1 width=8) (actual time=14.515..14.515 rows=1 loops=231)
  114. Output: count(DISTINCT all_users_3.id)
  115. Buffers: temp read=68607
  116. -&gt; CTE Scan on all_users all_users_3 (cost=0.00..1933.06 rows=351 width=16) (actual time=2.264..14.424 rows=308 loops=231)
  117. Output: all_users_3.id, all_users_3.registered_at, all_users_3.deleted_at
  118. Filter: ((all_users_3.registered_at &gt;= (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)) AND (all_users_3.registered_at &lt; ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) + &#39;7 days&#39;::interval)))
  119. Rows Removed by Filter: 70920
  120. Buffers: temp read=68607
  121. SubPlan 5
  122. -&gt; Aggregate (cost=1933.94..1933.95 rows=1 width=8) (actual time=6.556..6.556 rows=1 loops=231)
  123. Output: count(DISTINCT all_users_4.id)
  124. Buffers: temp read=68607
  125. -&gt; CTE Scan on all_users all_users_4 (cost=0.00..1933.06 rows=351 width=16) (actual time=6.441..6.547 rows=5 loops=231)
  126. Output: all_users_4.id, all_users_4.registered_at, all_users_4.deleted_at
  127. Filter: ((all_users_4.deleted_at &gt;= (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)) AND (all_users_4.deleted_at &lt; ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) + &#39;7 days&#39;::interval)))
  128. Rows Removed by Filter: 71223
  129. Buffers: temp read=68607
  130. Planning Time: 0.612 ms
  131. Execution Time: 13615.054 ms

答案1

得分: 2

显而易见的优化是消除多余的表扫描。在preprocessed中,没有必要多次从all_users查询。以下查询使用FILTERCOUNT来收集相同的统计信息:

  1. WITH datetable AS (SELECT GENERATE_SERIES(
  2. DATE_TRUNC('week', (SELECT MIN(created_at) FROM org_accounts.deleted_users)),
  3. DATE_TRUNC('week', NOW()),
  4. '1 week'::INTERVAL
  5. )::DATE AS week_start),
  6. all_users AS (SELECT id,
  7. registered_at,
  8. NULL AS deleted_at
  9. FROM org_accounts.users
  10. WHERE status = 'active'
  11. AND org_accounts.__user_is_qa(id) <> 'Y'
  12. AND email NOT LIKE '%@org%'
  13. UNION ALL
  14. SELECT id,
  15. created_at AS registered_at,
  16. deleted_at
  17. FROM org_accounts.deleted_users
  18. WHERE deleter_id = id
  19. AND email NOT LIKE '%@org%'),
  20. weekly_activity AS (SELECT DATE_TRUNC('week', date)::DATE AS week_start,
  21. COUNT(DISTINCT user_id) AS weekly_active_users
  22. FROM (SELECT user_id, date
  23. FROM org_storage_extra.stats_user_daily_counters
  24. WHERE type IN ('created_file', 'created_folder', 'created_secure_fetch')
  25. UNION ALL
  26. SELECT user_id, date
  27. FROM ipfs_pinning_facility.stats_user_daily_counters
  28. WHERE type <> 'shares_viewed_by_others') activity_ids_dates
  29. WHERE EXISTS(SELECT 1 FROM all_users WHERE id = user_id)
  30. GROUP BY week_start),
  31. preprocessed AS (SELECT week_start,
  32. us.actual_users,
  33. us.cumulative_churned_users,
  34. us.weekly_new_users,
  35. us.weekly_churned_users,
  36. COALESCE(weekly_active_users, 0) AS weekly_active_users
  37. FROM datetable dt
  38. CROSS JOIN LATERAL (SELECT
  39. COUNT(DISTINCT u.id) FILTER (WHERE u.registered_at < dt.week_start AND
  40. (u.deleted_at IS NULL OR u.deleted_at > dt.week_start)) AS actual_users,
  41. COUNT(DISTINCT u.id)
  42. FILTER (WHERE u.deleted_at < dt.week_start + '1 week'::INTERVAL) AS cumulative_churned_users,
  43. COUNT(DISTINCT u.id)
  44. FILTER (WHERE u.registered_at >= dt.week_start AND u.registered_at <
  45. dt.week_start +
  46. '1 week'::INTERVAL) AS weekly_new_users,
  47. COUNT(DISTINCT u.id)
  48. FILTER (WHERE u.deleted_at >= dt.week_start AND u.deleted_at <
  49. dt.week_start +
  50. '1 week'::INTERVAL) AS weekly_churned_users
  51. FROM all_users u
  52. WHERE u.registered_at < dt.week_start + '1 week'::INTERVAL
  53. OR (u.deleted_at >= dt.week_start AND
  54. u.deleted_at < dt.week_start + '1 week'::INTERVAL)) us
  55. LEFT JOIN weekly_activity
  56. USING (week_start)
  57. ORDER BY week_start DESC)
  58. SELECT week_start AS for_week_of,
  59. actual_users + cumulative_churned_users AS cumulative_users,
  60. cumulative_churned_users,
  61. cumulative_churned_users::FLOAT /
  62. NULLIF((actual_users + cumulative_churned_users)::FLOAT, 0) AS cumulated_churn_rate,
  63. actual_users,
  64. weekly_new_users,
  65. weekly_churned_users,
  66. weekly_active_users,
  67. weekly_churned_users::FLOAT / NULLIF(actual_users::FLOAT, 0) AS weekly_churn_rate
  68. FROM preprocessed;

可能还有其他优化可能,但这个立即显而易见。

英文:

An obvious optimization is to eliminate redundant table scans. There isn't any need in preprocessed to query from all_users more than once. The following query uses COUNT with FILTER to gather the same statistics:

  1. WITH datetable AS (SELECT GENERATE_SERIES(
  2. DATE_TRUNC(&#39;week&#39;, (SELECT MIN(created_at) FROM org_accounts.deleted_users)),
  3. DATE_TRUNC(&#39;week&#39;, NOW()),
  4. &#39;1 week&#39;::INTERVAL
  5. )::DATE AS week_start),
  6. all_users AS (SELECT id,
  7. registered_at,
  8. NULL AS deleted_at
  9. FROM org_accounts.users
  10. WHERE status = &#39;active&#39;
  11. AND org_accounts.__user_is_qa(id) &lt;&gt; &#39;Y&#39;
  12. AND email NOT LIKE &#39;%@org%&#39;
  13. UNION ALL
  14. SELECT id,
  15. created_at AS registered_at,
  16. deleted_at
  17. FROM org_accounts.deleted_users
  18. WHERE deleter_id = id
  19. AND email NOT LIKE &#39;%@org%&#39;),
  20. weekly_activity AS (SELECT DATE_TRUNC(&#39;week&#39;, date)::DATE AS week_start,
  21. COUNT(DISTINCT user_id) AS weekly_active_users
  22. FROM (SELECT user_id, date
  23. FROM org_storage_extra.stats_user_daily_counters
  24. WHERE type IN (&#39;created_file&#39;, &#39;created_folder&#39;, &#39;created_secure_fetch&#39;)
  25. UNION ALL
  26. SELECT user_id, date
  27. FROM ipfs_pinning_facility.stats_user_daily_counters
  28. WHERE type &lt;&gt; &#39;shares_viewed_by_others&#39;) activity_ids_dates
  29. WHERE EXISTS(SELECT 1 FROM all_users WHERE id = user_id)
  30. GROUP BY week_start),
  31. preprocessed AS (SELECT week_start,
  32. us.actual_users,
  33. us.cumulative_churned_users,
  34. us.weekly_new_users,
  35. us.weekly_churned_users,
  36. COALESCE(weekly_active_users, 0) AS weekly_active_users
  37. FROM datetable dt
  38. CROSS JOIN LATERAL (SELECT
  39. COUNT(DISTINCT u.id) FILTER (WHERE u.registered_at &lt; dt.week_start AND
  40. (u.deleted_at IS NULL OR u.deleted_at &gt; dt.week_start)) AS actual_users,
  41. COUNT(DISTINCT u.id)
  42. FILTER (WHERE u.deleted_at &lt; dt.week_start + &#39;1 week&#39;::INTERVAL) AS cumulative_churned_users,
  43. COUNT(DISTINCT u.id)
  44. FILTER (WHERE u.registered_at &gt;= dt.week_start AND u.registered_at &lt;
  45. dt.week_start +
  46. &#39;1 week&#39;::INTERVAL) AS weekly_new_users,
  47. COUNT(DISTINCT u.id)
  48. FILTER (WHERE u.deleted_at &gt;= dt.week_start AND u.deleted_at &lt;
  49. dt.week_start +
  50. &#39;1 week&#39;::INTERVAL) AS weekly_churned_users
  51. FROM all_users u
  52. WHERE u.registered_at &lt; dt.week_start + &#39;1 week&#39;::INTERVAL
  53. OR (u.deleted_at &gt;= dt.week_start AND
  54. u.deleted_at &lt; dt.week_start + &#39;1 week&#39;::INTERVAL)) us
  55. LEFT JOIN weekly_activity
  56. USING (week_start)
  57. ORDER BY week_start DESC)
  58. SELECT week_start AS for_week_of,
  59. actual_users + cumulative_churned_users AS cumulative_users,
  60. cumulative_churned_users,
  61. cumulative_churned_users::FLOAT /
  62. NULLIF((actual_users + cumulative_churned_users)::FLOAT, 0) AS cumulated_churn_rate,
  63. actual_users,
  64. weekly_new_users,
  65. weekly_churned_users,
  66. weekly_active_users,
  67. weekly_churned_users::FLOAT / NULLIF(actual_users::FLOAT, 0) AS weekly_churn_rate
  68. FROM preprocessed;

There are probably other optimizations possible, but this one was immediately evident.

huangapple
  • 本文由 发表于 2023年6月2日 00:49:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384091.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定