PSQL / SQL:是否可以进一步优化此查询,而不需要对数据库进行写访问?

huangapple go评论52阅读模式
英文:

PSQL / SQL: Is it possible to further optimize this query with requiring write access to the database?

问题

以下是您的翻译:

我这里有一个查询,其中使用了四个子查询在单个CTE中,每个子查询都在自身的每一行中扫描另一个CTE的每一行。我认为这样非常低效。

在概念验证完成后,是否有任何SQL优化可以实施?由于我没有对数据库的写入访问权限,所以优化必须在select子句中进行。

查询分析结果:

查询计划
在一个CTE上进行子查询扫描 (cost=40875.45..7501783.95 rows=1000 width=68) (实际时间=1553.471..13613.116 行=231 循环=1)
输出: preprocessed.week_start, (preprocessed.actual_users + preprocessed.cumulative_churned_users), preprocessed.cumulative_churned_users, ((preprocessed.cumulative_churned_users)::double precision / NULLIF(((preprocessed.actual_users + preprocessed.cumulative_churned_users))::double precision, '0'::double precision)), preprocessed.actual_users, preprocessed.weekly_new_users, preprocessed.weekly_churned_users, preprocessed.weekly_active_users, ((preprocessed.weekly_churned_users)::double precision / NULLIF((preprocessed.actual_users)::double precision, '0'::double precision))
缓冲区: 共享命中=287734 读取=1964,临时读取=274840 写入=873
CTE all_users
-> 附加 (cost=0.00..30953.99 rows=70293 width=32) (实际时间=0.099..1313.372 行=71228 循环=1)
缓冲区: 共享命中=285995 读取=1964
-> 顺序扫描 org_accounts.users (cost=0.00..27912.65 rows=70009 width=32) (实际时间=0.099..1289.469 行=70007 循环=1)
输出: users.id, users.registered_at, NULL::timestamp with time zone
过滤条件: ((users.email !~~ '%@mailinator%'::text) AND (users.email !~~ '%@org%'::text) AND (users.email !~~ '%testaccnt%'::text) AND (users.status = 'active'::text) AND ((org_accounts.__user_is_qa(users.id))::text <> 'Y'::text))
过滤后的行数: 9933
缓冲区: 共享命中=285269 读取=1964
-> 顺序扫描 org_accounts.deleted_users (cost=0.00..1986.94 rows=284 width=32) (实际时间=0.014..14.267 行=1221 循环=1)
输出: deleted_users.id, deleted_users.created_at, deleted_users.deleted_at
过滤条件: ((deleted_users.email !~~ '%@mailinator%'::text) AND (deleted_users.email !~~ '%@org%'::text) AND (deleted_users.email !~~ '%testaccnt%'::text) AND (deleted_users.deleter_id = deleted_users.id))
过滤后的行数: 61826
缓冲区: 共享命中=726
-> 左连接合并 (cost=9921.47..7470794.97 rows=1000 width=44) (实际时间=1553.467..13612.496 行=231 循环=1)
输出: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date), (子查询 2), (子查询 3), (子查询 4), (子查询 5), COALESCE(weekly_activity.weekly_active_users, '0'::bigint)
内部唯一: true
合并条件: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date) = weekly_activity.week_start
缓冲区: 共享命中=287734 读取=1964,临时读取=274840 写入=873
-> 排序 (cost=1601.45..1603.95 rows=1000 width=4) (实际时间=10.108..10.250 行=231 循环=1)
输出: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date)
排序键: (((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date) DESC
排序方法: quicksort 内存: 35kB
缓冲区: 共享命中=726
-> 结果 (cost=1514.10..1541.62 rows=1000 width=4) (实际时间=9.986..10.069 行=231 循环=1)
输出: ((generate_series(date_trunc('week'::text, $5), date_trunc('week'::text, now()), '7 days'::interval)))::date
缓冲区: 共享命中=726
初始化计划 6 (返回 $5)
-> 聚合 (cost=1514.09..1514.10 rows=1 width=8) (实际时间=9.974..9.975 行=1 循环=1)
输出: min(deleted_users_1.created_at)
缓冲区: 共享命中=726
-> 顺序扫描 org_accounts.deleted_users deleted_users_1 (cost=0.00..1356.47 rows=63047 width=8) (实际时间=0.006..4.332 行=63047 循环=1)
输出: deleted_users_1.id, deleted_users_1.email, deleted_users_1.created_at, deleted_users_1.deleter_id, deleted_users_1.deleted_at, deleted_users_1.registration_app
缓冲区: 共享命中=726
-> 项目集 (cost=0.00..5.03 rows=1000 width=8) (实际时间=9.984..10.030 行=231 循环=1)
输出:

英文:

I have a query here that uses four subqueries inside a single CTE, and each subquery is scanning every row of another CTE for each row in itself. I would think that this is very inefficient.

Are there any SQL optimizations that I can implement now that the proof of concept is finished?
I don't have write access to the database, so optimizations would be required within the select clause.

WITH datetable AS (
    SELECT generate_series(
        DATE_TRUNC(&#39;week&#39;, (SELECT MIN(created_at) FROM org_accounts.deleted_users)),
        DATE_TRUNC(&#39;week&#39;, now()),
        &#39;1 week&#39;::INTERVAL
    )::DATE AS week_start
), all_users AS (
    SELECT
        id,
        registered_at,
        NULL AS deleted_at
    FROM org_accounts.users
    WHERE status = &#39;active&#39;
        AND org_accounts.__user_is_qa(id) &lt;&gt; &#39;Y&#39;
        AND email NOT LIKE &#39;%@org%&#39;
    
    UNION ALL
    
    SELECT
        id,
        created_at AS registered_at,
        deleted_at
    FROM org_accounts.deleted_users
    WHERE deleter_id = id
        AND email NOT LIKE &#39;%@org%&#39;
), weekly_activity AS (
    SELECT
        DATE_TRUNC(&#39;week&#39;, date)::DATE AS week_start,
        COUNT(DISTINCT user_id) AS weekly_active_users
    FROM (
      SELECT user_id, date
      FROM org_storage_extra.stats_user_daily_counters 
      WHERE type in (&#39;created_file&#39;, &#39;created_folder&#39;, &#39;created_secure_fetch&#39;)
      
      UNION ALL
      
      SELECT user_id, date
      FROM ipfs_pinning_facility.stats_user_daily_counters
      WHERE type &lt;&gt; &#39;shares_viewed_by_others&#39;
      ) activity_ids_dates
    WHERE EXISTS(SELECT 1 from all_users WHERE id = user_id)
    GROUP BY week_start
), preprocessed AS (
    SELECT
        week_start,
        (
            SELECT COUNT(DISTINCT id)
            FROM all_users
            WHERE registered_at &lt; week_start
                AND (deleted_at IS NULL OR deleted_at &gt; week_start)
        ) AS actual_users,
        (
            SELECT COUNT(DISTINCT id)
            FROM all_users
            WHERE deleted_at &lt; week_start + &#39;1 week&#39;::INTERVAL
        ) AS cumulative_churned_users,
        (
            SELECT COUNT(DISTINCT id)
            FROM all_users
            WHERE registered_at &gt;= week_start
                AND registered_at &lt; week_start + &#39;1 week&#39;::INTERVAL
        ) AS weekly_new_users,
        (
            SELECT COUNT(DISTINCT id)
            FROM all_users
            WHERE deleted_at &gt;= week_start
                AND deleted_at &lt; week_start + &#39;1 week&#39;::INTERVAL
        ) AS weekly_churned_users,
        COALESCE(weekly_active_users, 0) AS weekly_active_users
    FROM datetable dt
    LEFT JOIN weekly_activity USING (week_start)
    ORDER BY week_start DESC
)
SELECT
    week_start AS for_week_of, 
    actual_users + cumulative_churned_users AS cumulative_users,
    cumulative_churned_users,
    cumulative_churned_users::FLOAT / NULLIF((actual_users + cumulative_churned_users)::FLOAT, 0) AS cumulated_churn_rate,
    actual_users,
    weekly_new_users,
    weekly_churned_users,
    weekly_active_users,
    weekly_churned_users::FLOAT / NULLIF(actual_users::FLOAT, 0) AS weekly_churn_rate 
FROM preprocessed;

Results of query analysis:

QUERY PLAN
Subquery Scan on preprocessed  (cost=40875.45..7501783.95 rows=1000 width=68) (actual time=1553.471..13613.116 rows=231 loops=1)
Output: preprocessed.week_start, (preprocessed.actual_users + preprocessed.cumulative_churned_users), preprocessed.cumulative_churned_users, ((preprocessed.cumulative_churned_users)::double precision / NULLIF(((preprocessed.actual_users + preprocessed.cumulative_churned_users))::double precision, &#39;0&#39;::double precision)), preprocessed.actual_users, preprocessed.weekly_new_users, preprocessed.weekly_churned_users, preprocessed.weekly_active_users, ((preprocessed.weekly_churned_users)::double precision / NULLIF((preprocessed.actual_users)::double precision, &#39;0&#39;::double precision))
Buffers: shared hit=287734 read=1964, temp read=274840 written=873
CTE all_users
-&gt;  Append  (cost=0.00..30953.99 rows=70293 width=32) (actual time=0.099..1313.372 rows=71228 loops=1)
Buffers: shared hit=285995 read=1964
-&gt;  Seq Scan on org_accounts.users  (cost=0.00..27912.65 rows=70009 width=32) (actual time=0.099..1289.469 rows=70007 loops=1)
Output: users.id, users.registered_at, NULL::timestamp with time zone
Filter: ((users.email !~~ &#39;%@mailinator%&#39;::text) AND (users.email !~~ &#39;%@org%&#39;::text) AND (users.email !~~ &#39;%testaccnt%&#39;::text) AND (users.status = &#39;active&#39;::text) AND ((org_accounts.__user_is_qa(users.id))::text &lt;&gt; &#39;Y&#39;::text))
Rows Removed by Filter: 9933
Buffers: shared hit=285269 read=1964
-&gt;  Seq Scan on org_accounts.deleted_users  (cost=0.00..1986.94 rows=284 width=32) (actual time=0.014..14.267 rows=1221 loops=1)
Output: deleted_users.id, deleted_users.created_at, deleted_users.deleted_at
Filter: ((deleted_users.email !~~ &#39;%@mailinator%&#39;::text) AND (deleted_users.email !~~ &#39;%@org%&#39;::text) AND (deleted_users.email !~~ &#39;%testaccnt%&#39;::text) AND (deleted_users.deleter_id = deleted_users.id))
Rows Removed by Filter: 61826
Buffers: shared hit=726
-&gt;  Merge Left Join  (cost=9921.47..7470794.97 rows=1000 width=44) (actual time=1553.467..13612.496 rows=231 loops=1)
Output: (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date), (SubPlan 2), (SubPlan 3), (SubPlan 4), (SubPlan 5), COALESCE(weekly_activity.weekly_active_users, &#39;0&#39;::bigint)
Inner Unique: true
Merge Cond: ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) = weekly_activity.week_start)
Buffers: shared hit=287734 read=1964, temp read=274840 written=873
-&gt;  Sort  (cost=1601.45..1603.95 rows=1000 width=4) (actual time=10.108..10.250 rows=231 loops=1)
Output: (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)
Sort Key: (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) DESC
Sort Method: quicksort  Memory: 35kB
Buffers: shared hit=726
-&gt;  Result  (cost=1514.10..1541.62 rows=1000 width=4) (actual time=9.986..10.069 rows=231 loops=1)
Output: ((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date
Buffers: shared hit=726
InitPlan 6 (returns $5)
-&gt;  Aggregate  (cost=1514.09..1514.10 rows=1 width=8) (actual time=9.974..9.975 rows=1 loops=1)
Output: min(deleted_users_1.created_at)
Buffers: shared hit=726
-&gt;  Seq Scan on org_accounts.deleted_users deleted_users_1  (cost=0.00..1356.47 rows=63047 width=8) (actual time=0.006..4.332 rows=63047 loops=1)
Output: deleted_users_1.id, deleted_users_1.email, deleted_users_1.created_at, deleted_users_1.deleter_id, deleted_users_1.deleted_at, deleted_users_1.registration_app
Buffers: shared hit=726
-&gt;  ProjectSet  (cost=0.00..5.03 rows=1000 width=8) (actual time=9.984..10.030 rows=231 loops=1)
Output: generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)
Buffers: shared hit=726
-&gt;  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
-&gt;  Sort  (cost=8320.02..8320.52 rows=200 width=12) (actual time=1475.315..1475.418 rows=159 loops=1)
Output: weekly_activity.weekly_active_users, weekly_activity.week_start
Sort Key: weekly_activity.week_start DESC
Sort Method: quicksort  Memory: 32kB
Buffers: shared hit=287008 read=1964, temp read=412 written=872
-&gt;  Subquery Scan on weekly_activity  (cost=8050.90..8312.37 rows=200 width=12) (actual time=1466.686..1475.279 rows=159 loops=1)
Output: weekly_activity.weekly_active_users, weekly_activity.week_start
Buffers: shared hit=287008 read=1964, temp read=412 written=872
-&gt;  GroupAggregate  (cost=8050.90..8310.37 rows=200 width=12) (actual time=1466.685..1475.254 rows=159 loops=1)
Output: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date), count(DISTINCT &quot;*SELECT* 1&quot;.user_id)
Group Key: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date)
Buffers: shared hit=287008 read=1964, temp read=412 written=872
-&gt;  Sort  (cost=8050.90..8136.22 rows=34130 width=20) (actual time=1466.668..1468.872 rows=23005 loops=1)
Output: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date), &quot;*SELECT* 1&quot;.user_id
Sort Key: ((date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date)
Sort Method: quicksort  Memory: 2566kB
Buffers: shared hit=287008 read=1964, temp read=412 written=872
-&gt;  Hash Join  (cost=1586.09..5481.12 rows=34130 width=20) (actual time=1411.350..1462.022 rows=23005 loops=1)
Output: (date_trunc(&#39;week&#39;::text, (&quot;*SELECT* 1&quot;.date)::timestamp with time zone))::date, &quot;*SELECT* 1&quot;.user_id
Inner Unique: true
Hash Cond: (&quot;*SELECT* 1&quot;.user_id = all_users.id)
Buffers: shared hit=287008 read=1964, temp read=412 written=872
-&gt;  Append  (cost=0.00..3080.17 rows=68261 width=20) (actual time=0.010..25.441 rows=68179 loops=1)
Buffers: shared hit=1013
-&gt;  Subquery Scan on &quot;*SELECT* 1&quot;  (cost=0.00..1018.43 rows=21568 width=20) (actual time=0.008..7.895 rows=21532 loops=1)
Output: &quot;*SELECT* 1&quot;.date, &quot;*SELECT* 1&quot;.user_id
Buffers: shared hit=372
-&gt;  Seq Scan on org_storage_extra.stats_user_daily_counters  (cost=0.00..802.75 rows=21568 width=20) (actual time=0.008..5.910 rows=21532 loops=1)
Output: stats_user_daily_counters.user_id, stats_user_daily_counters.date
Filter: (stats_user_daily_counters.type = ANY (&#39;{created_file,created_folder,created_secure_fetch}&#39;::text[]))
Rows Removed by Filter: 9795
Buffers: shared hit=372
-&gt;  Subquery Scan on &quot;*SELECT* 2&quot;  (cost=0.00..1720.44 rows=46693 width=20) (actual time=0.009..12.460 rows=46647 loops=1)
Output: &quot;*SELECT* 2&quot;.date, &quot;*SELECT* 2&quot;.user_id
Buffers: shared hit=641
-&gt;  Seq Scan on ipfs_pinning_facility.stats_user_daily_counters stats_user_daily_counters_1  (cost=0.00..1253.51 rows=46693 width=20) (actual time=0.009..8.209 rows=46647 loops=1)
Output: stats_user_daily_counters_1.user_id, stats_user_daily_counters_1.date
Filter: (stats_user_daily_counters_1.type &lt;&gt; &#39;shares_viewed_by_others&#39;::text)
Rows Removed by Filter: 2354
Buffers: shared hit=641
-&gt;  Hash  (cost=1583.59..1583.59 rows=200 width=16) (actual time=1411.250..1411.251 rows=71228 loops=1)
Output: all_users.id
Buckets: 131072 (originally 1024)  Batches: 2 (originally 1)  Memory Usage: 3073kB
Buffers: shared hit=285995 read=1964, temp read=100 written=717
-&gt;  HashAggregate  (cost=1581.59..1583.59 rows=200 width=16) (actual time=1383.986..1398.270 rows=71228 loops=1)
Output: all_users.id
Group Key: all_users.id
Batches: 5  Memory Usage: 4161kB  Disk Usage: 1544kB
Buffers: shared hit=285995 read=1964, temp read=100 written=560
-&gt;  CTE Scan on all_users  (cost=0.00..1405.86 rows=70293 width=16) (actual time=0.102..1351.241 rows=71228 loops=1)
Output: all_users.id
Buffers: shared hit=285995 read=1964, temp written=296
SubPlan 2
-&gt;  Aggregate  (cost=1777.05..1777.06 rows=1 width=8) (actual time=20.197..20.197 rows=1 loops=231)
Output: count(DISTINCT all_users_1.id)
Buffers: temp read=68607 written=1
-&gt;  CTE Scan on all_users all_users_1  (cost=0.00..1757.33 rows=7888 width=16) (actual time=0.883..10.874 rows=27239 loops=231)
Output: all_users_1.id, all_users_1.registered_at, all_users_1.deleted_at
Filter: ((all_users_1.registered_at &lt; (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)) AND ((all_users_1.deleted_at IS NULL) OR (all_users_1.deleted_at &gt; (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date))))
Rows Removed by Filter: 43989
Buffers: temp read=68607 written=1
SubPlan 3
-&gt;  Aggregate  (cost=1815.90..1815.91 rows=1 width=8) (actual time=11.215..11.215 rows=1 loops=231)
Output: count(DISTINCT all_users_2.id)
Buffers: temp read=68607
-&gt;  CTE Scan on all_users all_users_2  (cost=0.00..1757.33 rows=23431 width=16) (actual time=11.009..11.150 rows=231 loops=231)
Output: all_users_2.id, all_users_2.registered_at, all_users_2.deleted_at
Filter: (all_users_2.deleted_at &lt; ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) + &#39;7 days&#39;::interval))
Rows Removed by Filter: 70997
Buffers: temp read=68607
SubPlan 4
-&gt;  Aggregate  (cost=1933.94..1933.95 rows=1 width=8) (actual time=14.515..14.515 rows=1 loops=231)
Output: count(DISTINCT all_users_3.id)
Buffers: temp read=68607
-&gt;  CTE Scan on all_users all_users_3  (cost=0.00..1933.06 rows=351 width=16) (actual time=2.264..14.424 rows=308 loops=231)
Output: all_users_3.id, all_users_3.registered_at, all_users_3.deleted_at
Filter: ((all_users_3.registered_at &gt;= (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)) AND (all_users_3.registered_at &lt; ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) + &#39;7 days&#39;::interval)))
Rows Removed by Filter: 70920
Buffers: temp read=68607
SubPlan 5
-&gt;  Aggregate  (cost=1933.94..1933.95 rows=1 width=8) (actual time=6.556..6.556 rows=1 loops=231)
Output: count(DISTINCT all_users_4.id)
Buffers: temp read=68607
-&gt;  CTE Scan on all_users all_users_4  (cost=0.00..1933.06 rows=351 width=16) (actual time=6.441..6.547 rows=5 loops=231)
Output: all_users_4.id, all_users_4.registered_at, all_users_4.deleted_at
Filter: ((all_users_4.deleted_at &gt;= (((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date)) AND (all_users_4.deleted_at &lt; ((((generate_series(date_trunc(&#39;week&#39;::text, $5), date_trunc(&#39;week&#39;::text, now()), &#39;7 days&#39;::interval)))::date) + &#39;7 days&#39;::interval)))
Rows Removed by Filter: 71223
Buffers: temp read=68607
Planning Time: 0.612 ms
Execution Time: 13615.054 ms

答案1

得分: 2

显而易见的优化是消除多余的表扫描。在preprocessed中,没有必要多次从all_users查询。以下查询使用FILTERCOUNT来收集相同的统计信息:

WITH datetable AS (SELECT GENERATE_SERIES(
                            DATE_TRUNC('week', (SELECT MIN(created_at) FROM org_accounts.deleted_users)),
                            DATE_TRUNC('week', NOW()),
                            '1 week'::INTERVAL
                            )::DATE AS week_start),
     all_users AS (SELECT id,
                          registered_at,
                          NULL AS deleted_at
                     FROM org_accounts.users
                     WHERE status = 'active'
                       AND org_accounts.__user_is_qa(id) <> 'Y'
                       AND email NOT LIKE '%@org%'
                   UNION ALL
                   SELECT id,
                          created_at AS registered_at,
                          deleted_at
                     FROM org_accounts.deleted_users
                     WHERE deleter_id = id
                       AND email NOT LIKE '%@org%'),
     weekly_activity AS (SELECT DATE_TRUNC('week', date)::DATE AS week_start,
                                COUNT(DISTINCT user_id)        AS weekly_active_users
                           FROM (SELECT user_id, date
                                   FROM org_storage_extra.stats_user_daily_counters
                                   WHERE type IN ('created_file', 'created_folder', 'created_secure_fetch')
                                 UNION ALL
                                 SELECT user_id, date
                                   FROM ipfs_pinning_facility.stats_user_daily_counters
                                   WHERE type <> 'shares_viewed_by_others') activity_ids_dates
                           WHERE EXISTS(SELECT 1 FROM all_users WHERE id = user_id)
                           GROUP BY week_start),
     preprocessed AS (SELECT week_start,
                             us.actual_users,
                             us.cumulative_churned_users,
                             us.weekly_new_users,
                             us.weekly_churned_users,
                             COALESCE(weekly_active_users, 0) AS weekly_active_users
                        FROM datetable dt
                          CROSS JOIN LATERAL (SELECT
                                                 COUNT(DISTINCT u.id) FILTER (WHERE u.registered_at < dt.week_start AND
                                                                                    (u.deleted_at IS NULL OR u.deleted_at > dt.week_start)) AS actual_users,
                                                 COUNT(DISTINCT u.id)
                                                 FILTER (WHERE u.deleted_at < dt.week_start + '1 week'::INTERVAL)                           AS cumulative_churned_users,
                                                 COUNT(DISTINCT u.id)
                                                 FILTER (WHERE u.registered_at >= dt.week_start AND u.registered_at <
                                                                                                    dt.week_start +
                                                                                                    '1 week'::INTERVAL)                     AS weekly_new_users,
                                                 COUNT(DISTINCT u.id)
                                                 FILTER (WHERE u.deleted_at >= dt.week_start AND u.deleted_at <
                                                                                                 dt.week_start +
                                                                                                 '1 week'::INTERVAL)                        AS weekly_churned_users
                                               FROM all_users u
                                               WHERE u.registered_at < dt.week_start + '1 week'::INTERVAL
                                                  OR (u.deleted_at >= dt.week_start AND
                                                      u.deleted_at < dt.week_start + '1 week'::INTERVAL)) us
                          LEFT JOIN weekly_activity
                                    USING (week_start)
  ORDER BY week_start DESC)
SELECT week_start                                                   AS for_week_of,
       actual_users + cumulative_churned_users                      AS cumulative_users,
       cumulative_churned_users,
       cumulative_churned_users::FLOAT /
       NULLIF((actual_users + cumulative_churned_users)::FLOAT, 0)  AS cumulated_churn_rate,
       actual_users,
       weekly_new_users,
       weekly_churned_users,
       weekly_active_users,
       weekly_churned_users::FLOAT / NULLIF(actual_users::FLOAT, 0) AS weekly_churn_rate
  FROM preprocessed;

可能还有其他优化可能,但这个立即显而易见。

英文:

An obvious optimization is to eliminate redundant table scans. There isn't any need in preprocessed to query from all_users more than once. The following query uses COUNT with FILTER to gather the same statistics:

WITH datetable AS (SELECT GENERATE_SERIES(
DATE_TRUNC(&#39;week&#39;, (SELECT MIN(created_at) FROM org_accounts.deleted_users)),
DATE_TRUNC(&#39;week&#39;, NOW()),
&#39;1 week&#39;::INTERVAL
)::DATE AS week_start),
all_users AS (SELECT id,
registered_at,
NULL AS deleted_at
FROM org_accounts.users
WHERE status = &#39;active&#39;
AND org_accounts.__user_is_qa(id) &lt;&gt; &#39;Y&#39;
AND email NOT LIKE &#39;%@org%&#39;
UNION ALL
SELECT id,
created_at AS registered_at,
deleted_at
FROM org_accounts.deleted_users
WHERE deleter_id = id
AND email NOT LIKE &#39;%@org%&#39;),
weekly_activity AS (SELECT DATE_TRUNC(&#39;week&#39;, date)::DATE AS week_start,
COUNT(DISTINCT user_id)        AS weekly_active_users
FROM (SELECT user_id, date
FROM org_storage_extra.stats_user_daily_counters
WHERE type IN (&#39;created_file&#39;, &#39;created_folder&#39;, &#39;created_secure_fetch&#39;)
UNION ALL
SELECT user_id, date
FROM ipfs_pinning_facility.stats_user_daily_counters
WHERE type &lt;&gt; &#39;shares_viewed_by_others&#39;) activity_ids_dates
WHERE EXISTS(SELECT 1 FROM all_users WHERE id = user_id)
GROUP BY week_start),
preprocessed AS (SELECT week_start,
us.actual_users,
us.cumulative_churned_users,
us.weekly_new_users,
us.weekly_churned_users,
COALESCE(weekly_active_users, 0) AS weekly_active_users
FROM datetable dt
CROSS JOIN LATERAL (SELECT
COUNT(DISTINCT u.id) FILTER (WHERE u.registered_at &lt; dt.week_start AND
(u.deleted_at IS NULL OR u.deleted_at &gt; dt.week_start)) AS actual_users,
COUNT(DISTINCT u.id)
FILTER (WHERE u.deleted_at &lt; dt.week_start + &#39;1 week&#39;::INTERVAL)                           AS cumulative_churned_users,
COUNT(DISTINCT u.id)
FILTER (WHERE u.registered_at &gt;= dt.week_start AND u.registered_at &lt;
dt.week_start +
&#39;1 week&#39;::INTERVAL)                     AS weekly_new_users,
COUNT(DISTINCT u.id)
FILTER (WHERE u.deleted_at &gt;= dt.week_start AND u.deleted_at &lt;
dt.week_start +
&#39;1 week&#39;::INTERVAL)                        AS weekly_churned_users
FROM all_users u
WHERE u.registered_at &lt; dt.week_start + &#39;1 week&#39;::INTERVAL
OR (u.deleted_at &gt;= dt.week_start AND
u.deleted_at &lt; dt.week_start + &#39;1 week&#39;::INTERVAL)) us
LEFT JOIN weekly_activity
USING (week_start)
ORDER BY week_start DESC)
SELECT week_start                                                   AS for_week_of,
actual_users + cumulative_churned_users                      AS cumulative_users,
cumulative_churned_users,
cumulative_churned_users::FLOAT /
NULLIF((actual_users + cumulative_churned_users)::FLOAT, 0)  AS cumulated_churn_rate,
actual_users,
weekly_new_users,
weekly_churned_users,
weekly_active_users,
weekly_churned_users::FLOAT / NULLIF(actual_users::FLOAT, 0) AS weekly_churn_rate
FROM preprocessed;

There are probably other optimizations possible, but this one was immediately evident.

huangapple
  • 本文由 发表于 2023年6月2日 00:49:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384091.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定