英文:
Athena - Add a Row with 0 for All Columns if a Value is Not Present
问题
Sure, here's the translated code portion you requested:
我正在尝试编写一个 Athena 查询,该查询将创建可用于在营销电子邮件发送后进行分析的预先和后续数据。
在运行代码后,我得到了以下结果,因为我不知道如何告诉 Athena,如果某个会员在特定时间段内没有交易,请无论如何将该行添加到他们的年龄中,并将 "net_amount" 和 "total_transactions" 设置为零:
| customer_ID | age | net_amount | total_transactions | Period |
|:---- |:------| -----|----|-|
| ABCDRF | 25 | 85.0 |2|Post |
| HDWSAZ | 37 | 23.98 |1|Pre|
| HDWSAZ | 37 | 56.34 |2|Post|
| OAYDI | 42 | 8.2 |1|Pre|
| OAYDI | 42 | 98.29 |4|Post|
| IUSTRA | 31 | 88.62 |1|Pre|
| MAHRYS | 56 | 34.16 |2|Pre|
正如您所看到的,上面的表格中没有 "ABCDRF" 的预期期间行,也没有 "IUSTRA" 和 "MAHRYS" 的后期行,但我想要添加零 "net_amount" 和 "total_transactions" 以及插入正确值的年龄到缺少的期间(pre 或 post)。
我使用的代码是:
select distinct(customer_ID) as customer_ID, age, sum(net_amount) as net_amount, sum(lineitem) as total_visits, (case
when trans_date >= (current_timestamp - interval '18' month) AND trans_date < (current_timestamp - interval '6' month) then 'Post'
when trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '18' month) then 'Pre'
else 'nope'
end) as Period
from trans_history
where trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '6' month)
group by 1, 5
order by 1 desc;
我想要的是:
| customer_ID | age | net_amount | total_transactions | Period |
|:---- |:------| -----|----|-|
| **ABCDRF** | **25** | **0** |**0**|**Pre** |
| ABCDRF | 25 | 85.0 |2|Post |
| HDWSAZ | 37 | 23.98 |1|Pre|
| HDWSAZ | 37 | 56.34 |2|Post|
| OAYDI | 42 | 8.2 |1|Pre|
| OAYDI | 42 | 98.29 |4|Post|
| IUSTRA | 31 | 88.62 |1|Pre|
| **IUSTRA** | **31** | **0** |**0**|**Post** |
| MAHRYS | 56 | 34.16 |2|Pre|
| **MAHRYS** | **56** | **0** |**0**|**Post** |
您有没有办法在 Athena 中使用此代码添加这些行?我觉得这只是我需要在 "case when" 语句中添加的内容,但我无法弄清楚。
谢谢!
Mark
英文:
I am trying to write an Athena query that will create pre and post data that can be used for analysis after a marketing e-mail went out.
After the code is ran, I am getting the following results because I have no idea how to tell Athena that if a member doesn't have a transaction during a particular time period, add that row in anyway with their age and just put zero for "net_amount" and "total_transactions":
customer_ID | age | net_amount | total_transactions | Period |
---|---|---|---|---|
ABCDRF | 25 | 85.0 | 2 | Post |
HDWSAZ | 37 | 23.98 | 1 | Pre |
HDWSAZ | 37 | 56.34 | 2 | Post |
OAYDI | 42 | 8.2 | 1 | Pre |
OAYDI | 42 | 98.29 | 4 | Post |
IUSTRA | 31 | 88.62 | 1 | Pre |
MAHRYS | 56 | 34.16 | 2 | Pre |
As you can see, this table above does not have a pre period row for "ABCDRF" and does not have a post period row for either "IUSTRA" and "MAHRYS" but I want to add that in with zeros for the "net_amount" and "total_transactions" and the age inserted with the right value for the Period that is missing (pre or post).
The code I am using is:
select distinct(customer_ID) as customer_ID, age, sum(net_amount) as net_amount, sum(lineitem) as total_visits, (case
when trans_date >= (current_timestamp - interval '18' month) AND trans_date < (current_timestamp - interval '6' month) then 'Post'
when trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '18' month) then 'Pre'
else 'nope'
end) as Period
from trans_history
where trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '6' month)
group by 1, 5
order by 1 desc;
What I want:
customer_ID | age | net_amount | total_transactions | Period |
---|---|---|---|---|
ABCDRF | 25 | 0 | 0 | Pre |
ABCDRF | 25 | 85.0 | 2 | Post |
HDWSAZ | 37 | 23.98 | 1 | Pre |
HDWSAZ | 37 | 56.34 | 2 | Post |
OAYDI | 42 | 8.2 | 1 | Pre |
OAYDI | 42 | 98.29 | 4 | Post |
IUSTRA | 31 | 88.62 | 1 | Pre |
IUSTRA | 31 | 0 | 0 | Post |
MAHRYS | 56 | 34.16 | 2 | Pre |
MAHRYS | 56 | 0 | 0 | Post |
Any idea how I can get these rows added in with this code in Athena? I feel like it is just something I need to add in the "case when" statement but I can't figure it out.
Thanks!
Mark
答案1
得分: 1
case
表达式在这里不够用,因为一些客户在两个时期都没有数据。
相反,一种方法是将客户列表与固定的、预定义的时期进行cross join
,以生成所有可能的组合。 然后,我们可以通过left join
将表与之关联,然后进行汇总:
select c.customer_id, c.age, p.period,
coalesce(sum(net_amount), 0) as net_amount,
coalesce(sum(lineitem), 0) as total_visits
from (select distinct customer_id, age from trans_history) c
cross join (
select 'Post' as period,
current_timestamp - interval '18' month as start_date,
current_timestamp - interval '6' month as end_date
union all
select 'Pre',
current_timestamp - interval '30' month,
current_timestamp - interval '18' month
) p
left join trans_history t
on t.customer_id = c.customer_id
and t.trans_date >= p.start_date
and t.trans_date < p.end_date
group by c.customer_id, c.age, p.period
英文:
The case
expression is not sufficient here, since some customers do not have data for both periods.
Instead, one approach is to cross join
the list of customers with the fixed, predefined periods in order to generate all possible combinations. Then, we can bring the table with a left join
, then aggregate:
select c.customer_id, c.age, p.period
coalesce(sum(net_amount), 0) as net_amount,
coalesce(sum(lineitem) , 0) as total_visits
from (select distinct customer_id, age from trans_history) c
cross join (
select 'Post' as period
current_timestamp - interval '18' month as start_date,
current_timestamp - interval '6' month as end_date
union all
select 'Pre',
current_timestamp - interval '30' month,
current_timestamp - interval '18' month
) p
left join trans_history t
on t.customer_id = c.customer_id
and t.trans_date >= p.start_date
and t.trans_date < p.end_date
group by c.customer_id, c.age, p.period
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论