两个不同表中事件之间的时间差

huangapple go评论64阅读模式
英文:

time difference between events in two different tables

问题

我有一个 paid_activity 和 free_activity 表格:

paid_activity 模型:

字段 类型 描述
pk 数字 哈希值(activity_date, user_id, product_id)
date 日期 活动的日期
uid 数字 生成活动的用户
pid 数字 与活动相关的产品

free_activity 模型:

字段 类型 描述
pk 数字 哈希值(activity_date, user_id, product_id)
date 日期 活动的日期
uid 数字 生成活动的用户
pid 数字 与活动相关的产品

我需要创建一个 dormancy 表格,其模型如下:

字段 类型 描述
pk 数字 哈希值(activity_date, user_id, product_id)
date 日期 消费活动的日期
uid 数字 生成活动的用户
pid 数字 与活动相关的产品
paid_dormancy 整数 距离用户上次付费活动的天数
paid_product_dormancy 整数 距离用户上次购买相同产品的天数
free_dormancy 整数 距离用户上次免费活动的天数
free_product_dormancy 整数 距离用户上次购买相同产品的天数

dormancy 表格应该与 paid_activity 表格具有一对一的行对应关系。

我首先创建了一个没有 free_<X> 字段的中间 paid_dormancy 表格:

字段 类型 描述
pk 数字 哈希值(activity_date, user_id, product_id)
date 日期 消费活动的日期
uid 数字 生成活动的用户
pid 数字 用户花费的产品
paid_dormancy 整数 距离用户上次付费活动的天数
paid_product_dormancy 整数 距离用户上次购买相同产品的天数

代码:

select
    pk
    , date
    , uid
    , pid
    , date - lag(date) ignore nulls over(
        partition by uid
        order by date) paid_dormancy
    , date - lag(date) ignore nulls over(
        partition by uid, pid
        order by date) paid_product_dormancy
from paid_activity

我有困难确定如何合并 free_activity 来构建 dormancy 表格。

我拼凑了以下查询,但它根本不会产生正确的结果:

with dormancy_union as (
    select
        ppk pk
        , date
        , uid
        , pid
        , paid_dormancy
        , paid_product_dormancy
        , iff(ppk is null, null,
            date - lag(date) ignore nulls over(
                partition by uid
                order by fpk, date
            )) free_dormancy
        , iff(ppk is null, null,
            date - lag(date) ignore nulls over(
                partition by uid, pid
                order by fpk, date
            )) free_product_dormancy
    from (
        select pk ppk, null fpk, * from paid_dormancy
        union all
        select null ppk, pk fpk, *, null, null from free_activity
)
select * 
from dormancy_union
where pk is not null
order by date;
英文:

I have a paid_activity and free_activity table:

paid_activity model:

Field Type Description
pk number hash(activity_date, user_id, product_id)
date date date of the activity
uid number user who generated the activity
pid number product associated with the activity

free_activity model:

Field Type Description
pk number hash(activity_date, user_id, product_id)
date date date of the activity
uid number user who generated the activity
pid number product associated with the activity

I need to produce a dormancy table with the model:

Field Type Description
pk number hash(activity_date, user_id, product_id)
date date date of the spend activity
uid number user who generated the activity
pid number product associated with the activity
paid_dormancy int days since the user's last paid activity
paid_product_dormancy int days since the user's last paid activity with the same product
free_dormancy int days since the user's last free activity
free_product_dormancy int days since the user's last free activity with the same product

The dormancy table should have a 1:1 row correspondence to the paid_activity table.

I started by making an intermediate paid_dormancy table without the free_<X> fields:

Field Type Description
pk number hash(activity_date, user_id, product_id)
date date date of the spend activity
uid number user who generated the activity
pid number product the user spent on
paid_dormancy int days since the user's last paid activity
paid_product_dormancy int days since the user's last free activity with the same product

Code:

select
    pk
    , date
    , uid
    , pid
    , date - lag(date) ignore nulls over(
        partition by uid
        order by date) paid_dormancy
    , date - lag(date) ignore nulls over(
        partition by uid, pid
        order by date) paid_product_dormancy
from paid_activity

I'm having trouble figuring out how to merge in free_activity to build the dormancy table.

I cobbled together the following query which doesn't produce the correct results at all:

with dormancy_union as (
    select
        ppk pk
        , date
        , uid
        , pid
        , paid_dormancy
        , paid_product_dormancy
        , iff(ppk is null, null,
            date - lag(date) ignore nulls over(
                partition by uid
                order by fpk, date
            )) free_dormancy
        , iff(ppk is null, null,
            date - lag(date) ignore nulls over(
                partition by uid, pid
                order by fpk, date
            )) free_product_dormancy
    from (
        select pk ppk, null fpk, * from paid_dormancy
        union all
        select null ppk, pk fpk, *, null, null from free_activity
)
select * 
from dormancy_union
where pk is not null
order by date;

答案1

得分: 1

以下是翻译好的内容:

使用一些示例数据:

with paid_activity(pk, date, uid, pid) as (
    select 
        hash(column1, column2, column3), 
        to_date(column1, 'yyyy-mm-dd'), 
        column2, 
        column3
    from values
    ('2023-02-14', 1, 10),
    ('2023-02-01', 1, 11),
    ('2023-01-13', 1, 10)
), free_activity(pk, date, uid, pid) as (
    select 
        hash(column1, column2, column3), 
        to_date(column1, 'yyyy-mm-dd'), 
        column2, 
        column3
    from values
    ('2023-02-15', 1, 10),
    ('2023-02-11', 1, 10),
    ('2023-01-20', 1, 10),
    ('2023-01-01', 1, 10)
)

我们可以通过日期连接来构建免费产品的先前数据,然后保留每个输入行的“最近”数据。

, with_priors as (
    select z.*
        ,fp.date as fp_date
    from (
        select p.*
            ,f.date as f_date
        from paid_activity as p
        left join free_activity as f 
            on f.date <= p.date 
            and p.uid = f.uid
        qualify row_number() over (partition by p.date, p.uid, p.pid order by f.date desc) = 1
    ) as z
    left join free_activity as fp 
        on fp.date <= z.date 
        and fp.uid = z.uid 
        and fp.pid = z.pid
    qualify row_number() over (partition by z.date, z.uid, z.pid order by fp.date desc) = 1
)
select 
    p.*,
    lag(p.date) ignore nulls over(partition by p.uid order by p.date) as l_p_act,
    lag(p.date) ignore nulls over(partition by p.uid, p.pid order by p.date) as l_pp_act,
    datediff('days', lag(p.date) ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy,
    datediff('days', lag(p.date) ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy,
    datediff('days', p.f_date, p.date) as free_dormancy,
    datediff('days', p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;
PK DATE UID PID F_DATE FP_DATE L_P_ACT L_PP_ACT PAID_DORMANCY PAID_PRODUCT_DORMANCY FREE_DORMANCY FREE_PRODUCT_DORMANCY
9,052,164,364,143,044,634 2023-01-13 1 10 2023-01-01 2023-01-01 12 12
3,292,356,339,691,413,099 2023-02-01 1 11 2023-01-20 2023-01-13 19 12
-3,195,136,054,197,415,933 2023-02-14 1 10 2023-02-11 2023-02-11 2023-02-01 2023-01-13 13 32 3 3

因此,最后的代码块可以更简洁:

select 
    p.pk, p.date, p.uid, p.pid,
    datediff('days', lag(p.date) ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy,
    datediff('days', lag(p.date) ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy,
    datediff('days', p.f_date, p.date) as free_dormancy,
    datediff('days', p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;

给出以下结果:

PK DATE UID PID PAID_DORMANCY PAID_PRODUCT_DORMANCY FREE_DORMANCY FREE_PRODUCT_DORMANCY
9,052,164,364,143,044,634 2023-01-13 1 10 12 12
3,292,356,339,691,413,099 2023-02-01 1 11 19 12
-3,195,136,054,197,415,933 2023-02-14 1 10 13 32 3 3
英文:

So with some example data:

with paid_activity(pk, date, uid, pid) as (
    select 
        hash(column1, column2, column3), 
        to_date(column1, &#39;yyyy-mm-dd&#39;), 
        column2, 
        column3
    from values
    (&#39;2023-02-14&#39;, 1, 10),
    (&#39;2023-02-01&#39;, 1, 11),
    (&#39;2023-01-13&#39;, 1, 10)
), free_activity(pk, date, uid, pid) as (
    select 
        hash(column1, column2, column3), 
        to_date(column1, &#39;yyyy-mm-dd&#39;), 
        column2, 
        column3
    from values
    (&#39;2023-02-15&#39;, 1, 10),
    (&#39;2023-02-11&#39;, 1, 10),
    (&#39;2023-01-20&#39;, 1, 10),
    (&#39;2023-01-01&#39;, 1, 10)
)

we can build the priors for the free products by do a date join, and then keep the "most recent" per input rows..

), with_priors as (
    select z.*
        ,fp.date as fp_date
    from (
        select p.*
            ,f.date as f_date
        from paid_activity as p
        left join free_activity as f 
            on f.date &lt;= p.date 
            and p.uid = f.uid
        qualify row_number() over (partition by p.date, p.uid, p.pid order by f.date desc) = 1
    ) as z
    left join free_activity as fp 
        on fp.date &lt;= z.date 
        and fp.uid = z.uid 
        and fp.pid = z.pid
    qualify row_number() over (partition by z.date, z.uid, z.pid order by fp.date desc) = 1
)
select 
    p.*
    ,lag(p.date)ignore nulls over(partition by p.uid order by p.date) as l_p_act
    ,lag(p.date)ignore nulls over(partition by p.uid, p.pid order by p.date) as l_pp_act
    ,datediff(&#39;days&#39;, lag(p.date)ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy
    ,datediff(&#39;days&#39;, lag(p.date)ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy
    ,datediff(&#39;days&#39;, p.f_date, p.date) as free_dormancy
    ,datediff(&#39;days&#39;, p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;
PK DATE UID PID F_DATE FP_DATE L_P_ACT L_PP_ACT PAID_DORMANCY PAID_PRODUCT_DORMANCY FREE_DORMANCY FREE_PRODUCT_DORMANCY
9,052,164,364,143,044,634 2023-01-13 1 10 2023-01-01 2023-01-01 12 12
3,292,356,339,691,413,099 2023-02-01 1 11 2023-01-20 2023-01-13 19 12
-3,195,136,054,197,415,933 2023-02-14 1 10 2023-02-11 2023-02-11 2023-02-01 2023-01-13 13 32 3 3

so that last block can be cleaner:

select 
    p.pk, p.date, p.uid, p.pid
    ,datediff(&#39;days&#39;, lag(p.date)ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy
    ,datediff(&#39;days&#39;, lag(p.date)ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy
    ,datediff(&#39;days&#39;, p.f_date, p.date) as free_dormancy
    ,datediff(&#39;days&#39;, p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;

giving:

PK DATE UID PID PAID_DORMANCY PAID_PRODUCT_DORMANCY FREE_DORMANCY FREE_PRODUCT_DORMANCY
9,052,164,364,143,044,634 2023-01-13 1 10 12 12
3,292,356,339,691,413,099 2023-02-01 1 11 19 12
-3,195,136,054,197,415,933 2023-02-14 1 10 13 32 3 3

huangapple
  • 本文由 发表于 2023年2月14日 04:43:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75440978.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定