英文:
time difference between events in two different tables
问题
我有一个 paid_activity 和 free_activity 表格:
paid_activity 模型:
字段 | 类型 | 描述 |
---|---|---|
pk | 数字 | 哈希值(activity_date, user_id, product_id) |
date | 日期 | 活动的日期 |
uid | 数字 | 生成活动的用户 |
pid | 数字 | 与活动相关的产品 |
free_activity 模型:
字段 | 类型 | 描述 |
---|---|---|
pk | 数字 | 哈希值(activity_date, user_id, product_id) |
date | 日期 | 活动的日期 |
uid | 数字 | 生成活动的用户 |
pid | 数字 | 与活动相关的产品 |
我需要创建一个 dormancy 表格,其模型如下:
字段 | 类型 | 描述 |
---|---|---|
pk | 数字 | 哈希值(activity_date, user_id, product_id) |
date | 日期 | 消费活动的日期 |
uid | 数字 | 生成活动的用户 |
pid | 数字 | 与活动相关的产品 |
paid_dormancy | 整数 | 距离用户上次付费活动的天数 |
paid_product_dormancy | 整数 | 距离用户上次购买相同产品的天数 |
free_dormancy | 整数 | 距离用户上次免费活动的天数 |
free_product_dormancy | 整数 | 距离用户上次购买相同产品的天数 |
dormancy 表格应该与 paid_activity 表格具有一对一的行对应关系。
我首先创建了一个没有 free_<X> 字段的中间 paid_dormancy 表格:
字段 | 类型 | 描述 |
---|---|---|
pk | 数字 | 哈希值(activity_date, user_id, product_id) |
date | 日期 | 消费活动的日期 |
uid | 数字 | 生成活动的用户 |
pid | 数字 | 用户花费的产品 |
paid_dormancy | 整数 | 距离用户上次付费活动的天数 |
paid_product_dormancy | 整数 | 距离用户上次购买相同产品的天数 |
代码:
select
pk
, date
, uid
, pid
, date - lag(date) ignore nulls over(
partition by uid
order by date) paid_dormancy
, date - lag(date) ignore nulls over(
partition by uid, pid
order by date) paid_product_dormancy
from paid_activity
我有困难确定如何合并 free_activity 来构建 dormancy 表格。
我拼凑了以下查询,但它根本不会产生正确的结果:
with dormancy_union as (
select
ppk pk
, date
, uid
, pid
, paid_dormancy
, paid_product_dormancy
, iff(ppk is null, null,
date - lag(date) ignore nulls over(
partition by uid
order by fpk, date
)) free_dormancy
, iff(ppk is null, null,
date - lag(date) ignore nulls over(
partition by uid, pid
order by fpk, date
)) free_product_dormancy
from (
select pk ppk, null fpk, * from paid_dormancy
union all
select null ppk, pk fpk, *, null, null from free_activity
)
select *
from dormancy_union
where pk is not null
order by date;
英文:
I have a paid_activity and free_activity table:
paid_activity model:
Field | Type | Description |
---|---|---|
pk | number | hash(activity_date, user_id, product_id) |
date | date | date of the activity |
uid | number | user who generated the activity |
pid | number | product associated with the activity |
free_activity model:
Field | Type | Description |
---|---|---|
pk | number | hash(activity_date, user_id, product_id) |
date | date | date of the activity |
uid | number | user who generated the activity |
pid | number | product associated with the activity |
I need to produce a dormancy table with the model:
Field | Type | Description |
---|---|---|
pk | number | hash(activity_date, user_id, product_id) |
date | date | date of the spend activity |
uid | number | user who generated the activity |
pid | number | product associated with the activity |
paid_dormancy | int | days since the user's last paid activity |
paid_product_dormancy | int | days since the user's last paid activity with the same product |
free_dormancy | int | days since the user's last free activity |
free_product_dormancy | int | days since the user's last free activity with the same product |
The dormancy table should have a 1:1 row correspondence to the paid_activity table.
I started by making an intermediate paid_dormancy table without the free_<X> fields:
Field | Type | Description |
---|---|---|
pk | number | hash(activity_date, user_id, product_id) |
date | date | date of the spend activity |
uid | number | user who generated the activity |
pid | number | product the user spent on |
paid_dormancy | int | days since the user's last paid activity |
paid_product_dormancy | int | days since the user's last free activity with the same product |
Code:
select
pk
, date
, uid
, pid
, date - lag(date) ignore nulls over(
partition by uid
order by date) paid_dormancy
, date - lag(date) ignore nulls over(
partition by uid, pid
order by date) paid_product_dormancy
from paid_activity
I'm having trouble figuring out how to merge in free_activity to build the dormancy table.
I cobbled together the following query which doesn't produce the correct results at all:
with dormancy_union as (
select
ppk pk
, date
, uid
, pid
, paid_dormancy
, paid_product_dormancy
, iff(ppk is null, null,
date - lag(date) ignore nulls over(
partition by uid
order by fpk, date
)) free_dormancy
, iff(ppk is null, null,
date - lag(date) ignore nulls over(
partition by uid, pid
order by fpk, date
)) free_product_dormancy
from (
select pk ppk, null fpk, * from paid_dormancy
union all
select null ppk, pk fpk, *, null, null from free_activity
)
select *
from dormancy_union
where pk is not null
order by date;
答案1
得分: 1
以下是翻译好的内容:
使用一些示例数据:
with paid_activity(pk, date, uid, pid) as (
select
hash(column1, column2, column3),
to_date(column1, 'yyyy-mm-dd'),
column2,
column3
from values
('2023-02-14', 1, 10),
('2023-02-01', 1, 11),
('2023-01-13', 1, 10)
), free_activity(pk, date, uid, pid) as (
select
hash(column1, column2, column3),
to_date(column1, 'yyyy-mm-dd'),
column2,
column3
from values
('2023-02-15', 1, 10),
('2023-02-11', 1, 10),
('2023-01-20', 1, 10),
('2023-01-01', 1, 10)
)
我们可以通过日期连接来构建免费产品的先前数据,然后保留每个输入行的“最近”数据。
, with_priors as (
select z.*
,fp.date as fp_date
from (
select p.*
,f.date as f_date
from paid_activity as p
left join free_activity as f
on f.date <= p.date
and p.uid = f.uid
qualify row_number() over (partition by p.date, p.uid, p.pid order by f.date desc) = 1
) as z
left join free_activity as fp
on fp.date <= z.date
and fp.uid = z.uid
and fp.pid = z.pid
qualify row_number() over (partition by z.date, z.uid, z.pid order by fp.date desc) = 1
)
select
p.*,
lag(p.date) ignore nulls over(partition by p.uid order by p.date) as l_p_act,
lag(p.date) ignore nulls over(partition by p.uid, p.pid order by p.date) as l_pp_act,
datediff('days', lag(p.date) ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy,
datediff('days', lag(p.date) ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy,
datediff('days', p.f_date, p.date) as free_dormancy,
datediff('days', p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;
PK | DATE | UID | PID | F_DATE | FP_DATE | L_P_ACT | L_PP_ACT | PAID_DORMANCY | PAID_PRODUCT_DORMANCY | FREE_DORMANCY | FREE_PRODUCT_DORMANCY |
---|---|---|---|---|---|---|---|---|---|---|---|
9,052,164,364,143,044,634 | 2023-01-13 | 1 | 10 | 2023-01-01 | 2023-01-01 | 12 | 12 | ||||
3,292,356,339,691,413,099 | 2023-02-01 | 1 | 11 | 2023-01-20 | 2023-01-13 | 19 | 12 | ||||
-3,195,136,054,197,415,933 | 2023-02-14 | 1 | 10 | 2023-02-11 | 2023-02-11 | 2023-02-01 | 2023-01-13 | 13 | 32 | 3 | 3 |
因此,最后的代码块可以更简洁:
select
p.pk, p.date, p.uid, p.pid,
datediff('days', lag(p.date) ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy,
datediff('days', lag(p.date) ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy,
datediff('days', p.f_date, p.date) as free_dormancy,
datediff('days', p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;
给出以下结果:
PK | DATE | UID | PID | PAID_DORMANCY | PAID_PRODUCT_DORMANCY | FREE_DORMANCY | FREE_PRODUCT_DORMANCY |
---|---|---|---|---|---|---|---|
9,052,164,364,143,044,634 | 2023-01-13 | 1 | 10 | 12 | 12 | ||
3,292,356,339,691,413,099 | 2023-02-01 | 1 | 11 | 19 | 12 | ||
-3,195,136,054,197,415,933 | 2023-02-14 | 1 | 10 | 13 | 32 | 3 | 3 |
英文:
So with some example data:
with paid_activity(pk, date, uid, pid) as (
select
hash(column1, column2, column3),
to_date(column1, 'yyyy-mm-dd'),
column2,
column3
from values
('2023-02-14', 1, 10),
('2023-02-01', 1, 11),
('2023-01-13', 1, 10)
), free_activity(pk, date, uid, pid) as (
select
hash(column1, column2, column3),
to_date(column1, 'yyyy-mm-dd'),
column2,
column3
from values
('2023-02-15', 1, 10),
('2023-02-11', 1, 10),
('2023-01-20', 1, 10),
('2023-01-01', 1, 10)
)
we can build the priors for the free products by do a date join, and then keep the "most recent" per input rows..
), with_priors as (
select z.*
,fp.date as fp_date
from (
select p.*
,f.date as f_date
from paid_activity as p
left join free_activity as f
on f.date <= p.date
and p.uid = f.uid
qualify row_number() over (partition by p.date, p.uid, p.pid order by f.date desc) = 1
) as z
left join free_activity as fp
on fp.date <= z.date
and fp.uid = z.uid
and fp.pid = z.pid
qualify row_number() over (partition by z.date, z.uid, z.pid order by fp.date desc) = 1
)
select
p.*
,lag(p.date)ignore nulls over(partition by p.uid order by p.date) as l_p_act
,lag(p.date)ignore nulls over(partition by p.uid, p.pid order by p.date) as l_pp_act
,datediff('days', lag(p.date)ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy
,datediff('days', lag(p.date)ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy
,datediff('days', p.f_date, p.date) as free_dormancy
,datediff('days', p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;
PK | DATE | UID | PID | F_DATE | FP_DATE | L_P_ACT | L_PP_ACT | PAID_DORMANCY | PAID_PRODUCT_DORMANCY | FREE_DORMANCY | FREE_PRODUCT_DORMANCY |
---|---|---|---|---|---|---|---|---|---|---|---|
9,052,164,364,143,044,634 | 2023-01-13 | 1 | 10 | 2023-01-01 | 2023-01-01 | 12 | 12 | ||||
3,292,356,339,691,413,099 | 2023-02-01 | 1 | 11 | 2023-01-20 | 2023-01-13 | 19 | 12 | ||||
-3,195,136,054,197,415,933 | 2023-02-14 | 1 | 10 | 2023-02-11 | 2023-02-11 | 2023-02-01 | 2023-01-13 | 13 | 32 | 3 | 3 |
so that last block can be cleaner:
select
p.pk, p.date, p.uid, p.pid
,datediff('days', lag(p.date)ignore nulls over(partition by p.uid order by p.date), p.date) as paid_dormancy
,datediff('days', lag(p.date)ignore nulls over(partition by p.uid, p.pid order by p.date), p.date) as paid_product_dormancy
,datediff('days', p.f_date, p.date) as free_dormancy
,datediff('days', p.fp_date, p.date) as free_product_dormancy
from with_priors as p
order by 2,3,4;
giving:
PK | DATE | UID | PID | PAID_DORMANCY | PAID_PRODUCT_DORMANCY | FREE_DORMANCY | FREE_PRODUCT_DORMANCY |
---|---|---|---|---|---|---|---|
9,052,164,364,143,044,634 | 2023-01-13 | 1 | 10 | 12 | 12 | ||
3,292,356,339,691,413,099 | 2023-02-01 | 1 | 11 | 19 | 12 | ||
-3,195,136,054,197,415,933 | 2023-02-14 | 1 | 10 | 13 | 32 | 3 | 3 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论