Athena – 如果数值不存在,则在所有列中添加一个值为0的行。

huangapple go评论92阅读模式
英文:

Athena - Add a Row with 0 for All Columns if a Value is Not Present

问题

Sure, here's the translated code portion you requested:

我正在尝试编写一个 Athena 查询,该查询将创建可用于在营销电子邮件发送后进行分析的预先和后续数据。

在运行代码后,我得到了以下结果,因为我不知道如何告诉 Athena,如果某个会员在特定时间段内没有交易,请无论如何将该行添加到他们的年龄中,并将 "net_amount" 和 "total_transactions" 设置为零:

| customer_ID | age | net_amount | total_transactions | Period |
|:---- |:------| -----|----|-|
|  ABCDRF | 25   | 85.0 |2|Post |
| HDWSAZ  | 37    | 23.98 |1|Pre|
| HDWSAZ  | 37    | 56.34 |2|Post|
| OAYDI  | 42    | 8.2 |1|Pre|
| OAYDI   | 42    | 98.29 |4|Post|
| IUSTRA  | 31    | 88.62 |1|Pre|
| MAHRYS  | 56    | 34.16 |2|Pre|

正如您所看到的,上面的表格中没有 "ABCDRF" 的预期期间行,也没有 "IUSTRA" 和 "MAHRYS" 的后期行,但我想要添加零 "net_amount" 和 "total_transactions" 以及插入正确值的年龄到缺少的期间(pre 或 post)。

我使用的代码是:

    select distinct(customer_ID) as customer_ID, age, sum(net_amount) as net_amount, sum(lineitem) as total_visits, (case 
            when trans_date >= (current_timestamp - interval '18' month) AND trans_date < (current_timestamp - interval '6' month) then 'Post'
            when trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '18' month) then 'Pre'
            else 'nope'
            end) as Period
    from trans_history
    where trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '6' month)
    group by 1, 5
    order by 1 desc;

我想要的是:

| customer_ID | age | net_amount | total_transactions | Period |
|:---- |:------| -----|----|-|
|  **ABCDRF** | **25**   | **0** |**0**|**Pre** |
|  ABCDRF | 25   | 85.0 |2|Post |
| HDWSAZ  | 37    | 23.98 |1|Pre|
| HDWSAZ  | 37    | 56.34 |2|Post|
| OAYDI  | 42    | 8.2 |1|Pre|
| OAYDI   | 42    | 98.29 |4|Post|
| IUSTRA  | 31    | 88.62 |1|Pre|
|  **IUSTRA** | **31**   | **0** |**0**|**Post** |
| MAHRYS  | 56    | 34.16 |2|Pre|
|  **MAHRYS** | **56**   | **0** |**0**|**Post** |

您有没有办法在 Athena 中使用此代码添加这些行?我觉得这只是我需要在 "case when" 语句中添加的内容,但我无法弄清楚。

谢谢!
Mark
英文:

I am trying to write an Athena query that will create pre and post data that can be used for analysis after a marketing e-mail went out.

After the code is ran, I am getting the following results because I have no idea how to tell Athena that if a member doesn't have a transaction during a particular time period, add that row in anyway with their age and just put zero for "net_amount" and "total_transactions":

customer_ID age net_amount total_transactions Period
ABCDRF 25 85.0 2 Post
HDWSAZ 37 23.98 1 Pre
HDWSAZ 37 56.34 2 Post
OAYDI 42 8.2 1 Pre
OAYDI 42 98.29 4 Post
IUSTRA 31 88.62 1 Pre
MAHRYS 56 34.16 2 Pre

As you can see, this table above does not have a pre period row for "ABCDRF" and does not have a post period row for either "IUSTRA" and "MAHRYS" but I want to add that in with zeros for the "net_amount" and "total_transactions" and the age inserted with the right value for the Period that is missing (pre or post).

The code I am using is:

select distinct(customer_ID) as customer_ID, age, sum(net_amount) as net_amount, sum(lineitem) as total_visits, (case 
        when trans_date >= (current_timestamp - interval '18' month) AND trans_date < (current_timestamp - interval '6' month) then 'Post'
        when trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '18' month) then 'Pre'
        else 'nope'
        end) as Period
from trans_history
where trans_date >= (current_timestamp - interval '30' month) AND trans_date < (current_timestamp - interval '6' month)
group by 1, 5
order by 1 desc;

What I want:

customer_ID age net_amount total_transactions Period
ABCDRF 25 0 0 Pre
ABCDRF 25 85.0 2 Post
HDWSAZ 37 23.98 1 Pre
HDWSAZ 37 56.34 2 Post
OAYDI 42 8.2 1 Pre
OAYDI 42 98.29 4 Post
IUSTRA 31 88.62 1 Pre
IUSTRA 31 0 0 Post
MAHRYS 56 34.16 2 Pre
MAHRYS 56 0 0 Post

Any idea how I can get these rows added in with this code in Athena? I feel like it is just something I need to add in the "case when" statement but I can't figure it out.

Thanks!
Mark

答案1

得分: 1

case表达式在这里不够用,因为一些客户在两个时期都没有数据。

相反,一种方法是将客户列表与固定的、预定义的时期进行cross join,以生成所有可能的组合。 然后,我们可以通过left join将表与之关联,然后进行汇总:

select c.customer_id, c.age, p.period,
    coalesce(sum(net_amount), 0) as net_amount, 
    coalesce(sum(lineitem), 0) as total_visits
from (select distinct customer_id, age from trans_history) c
cross join (
    select 'Post' as period,
        current_timestamp - interval '18' month as start_date, 
        current_timestamp - interval '6' month as end_date
    union all
    select 'Pre', 
        current_timestamp - interval '30' month, 
        current_timestamp - interval '18' month
) p
left join trans_history t
    on  t.customer_id = c.customer_id
    and t.trans_date >= p.start_date
    and t.trans_date < p.end_date
group by c.customer_id, c.age, p.period
英文:

The case expression is not sufficient here, since some customers do not have data for both periods.

Instead, one approach is to cross join the list of customers with the fixed, predefined periods in order to generate all possible combinations. Then, we can bring the table with a left join, then aggregate:

select c.customer_id, c.age, p.period
    coalesce(sum(net_amount), 0) as net_amount, 
    coalesce(sum(lineitem)  , 0) as total_visits
from (select distinct customer_id, age from trans_history) c
cross join (
    select &#39;Post&#39; as period
        current_timestamp - interval &#39;18&#39; month as start_date, 
        current_timestamp - interval &#39;6&#39;  month as end_date
    union all
    select &#39;Pre&#39;, 
        current_timestamp - interval &#39;30&#39; month, 
        current_timestamp - interval &#39;18&#39; month
) p
left join trans_history t
    on  t.customer_id = c.customer_id
    and t.trans_date &gt;= p.start_date
    and t.trans_date &lt;  p.end_date
group by c.customer_id, c.age, p.period

huangapple
  • 本文由 发表于 2023年4月17日 22:55:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76036482.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定