calculate the 7 day sale by different start date (the first day customer purchase), to find average purchase unit of each customer by every 7 days

huangapple go评论127阅读模式
英文:

calculate the 7 day sale by different start date (the first day customer purchase), to find average purchase unit of each customer by every 7 days

问题

From the table, I have the first table now, and trying to get every 7 day sales from the first day of this customer make the purchase. example is table 2

purchase date customer id sales unit
2018-01-01 1 10
2018-01-02 1 5
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 4
2018-01-21 2 5
purchase date customer id sales unit every 7 day cumulative sales
2018-01-01 1 10 10
2018-01-02 1 5 15
2018-01-15 1 10 10
2018-01-05 2 3 3
2018-01-20 2 4 9
2018-01-21 2 5 9

The final table should be like this:

purchase week customer id 7 day sales unit
2018-01-01 1 15
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 4

then I can calculate the average sales per customer

customer id average of every 7 day sales unit Calculation
1 12.5 (15+10) /2
2 3.5 (3+4) /2

the hard part is:

  1. every customer has a different first day of purchase

  2. The purchase dates are not consecutive, so I can't use unbounded or following 6 rows, etc.

  3. There are 5 years in the whole dataset, so I can't manually subtract 7, 14, etc.

  4. I tried to use date_trunc('week', date, min(date) over (partition by customerid))

  5. also tried partition by rows between 6 preceding and current row. But the dates are not consecutive so it doesn't work.

英文:

From the table,I have the first table now, and trying to get every 7 day sales from the first day of this customer make the purchase. example is table 2

purchase date customer id sales unit
2018-01-01 1 10
2018-01-02 1 5
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 4
2018-01-21 2 5
purchase date customer id sales unit every 7 day cumulative sales
2018-01-01 1 10 10
2018-01-02 1 5 15
2018-01-15 1 10 10
2018-01-05 2 3 3
2018-01-20 2 4 9
2018-01-21 2 5 9

The final table shud be like this:

purchase week customer id 7 day sales unit
2018-01-01 1 15
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 4

then I can calculate the average sales per customer

customer id average of every 7 day sales unit Calculation
1 12.5 (15+10) /2
2 3.5 (3+4) /2

the hard part is:

  1. every customer have different first day of purchase

  2. The purchase date are not consequent, so I can't use unbonded or following 6 rows etc.

  3. There're 5 years in the whole dataset, so I can't manually -7, -14, etc

  4. I tried to use date_trunc('week',date, min(date) over (partition by customerid))

  5. also tried parition by rows between 6 proceding and current row. But the date are not consequent so doesn't work

答案1

得分: 0

你可以使用SQL窗口函数的两个步骤来获得你想要的结果:

步骤1:按照每位顾客进行窗口分区,并获取每位顾客的首次购买日期。之后,使用Presto的date_diff()函数来计算从首次购买日期到当前购买日期的日期差,并除以7以获取从首次购买日期开始的周桶。

步骤2:按(customer, customer_sale_week_bucket)分区进行分组,获取销售单位的总和(sales_unit)和最小的购买日期(purchase_date)。

以下是查询:

with orders_with_customer_week_bucket AS 
(
  select
   purchase_date,
   customer_id,
   sales_unit,
   date_diff(day, min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
  from
   orders
 )
 select
     purchase_week,
     customer_id,
     seven_day_sales_unit
 from
     (select
         customer_id,
         customer_sale_week_bucket,
         min(purchase_date) as purchase_week,
         sum(sales_unit) as seven_day_sales_unit
     from
        orders_with_customer_week_bucket
     GROUP BY
        customer_id,
        customer_sale_week_bucket
     ) r
购买周 顾客ID 七天销售单位
2018-01-01 1 15
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 9
英文:

You can use SQL window function with 2 steps to get the result you want:

Step 1. Apply a window parition by per customer and get first_purchase_date for each customer. After that, use Presto date_diff() fucntion to calculate the date difference from first purchase date to current purchase date. Devide it by 7 to get week_bucket from the first date of purchase.

Step 2. group by per (customer, customer_sale_week_bucket) and get sum(sales_unit) and min(purchase_date) in each (customer, customer_sale_week_bucket) parition.

Here is the query:

with orders_with_customer_week_bucket AS 
(
  select
   purchase_date,
   customer_id,
   sales_unit,
   date_diff(day,min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
from
   orders
 )
 select
     purchase_week,
     customer_id,
     seven_day_sales_unit
 from
     (select
         customer_id,
         customer_sale_week_bucket,
         min(purchase_date) as purchase_week,
         sum(sales_unit) as seven_day_sales_unit
     from
        orders_with_customer_week_bucket
     GROUP BY
        customer_id,
        customer_sale_week_bucket
     )r
purchase_week customer_id seven_day_sales_unit
2018-01-01 1 15
2018-01-05 2 3
2018-01-15 1 10
2018-01-20 2 9

答案2

得分: 0

你可以使用一个case语句来根据日期进行操作。我在SQL Server中完成了这个操作,但我相信在Presto中也适用。我认为在Presto中DATEADD可能需要写成"Date_Add"(带引号)。

另外,你提到可能需要14天,所以我添加了一个用于存储天数的列。你可以看到,只需在DateAdd函数中更改天数即可。

FIDDLE

SELECT t1.purchaseDate,
       t1.CustomerID,
       t1.SalesUnit,
       SUM(CASE 
              WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -6, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit 
            END) AS SalesLast7,
       SUM(CASE 
              WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -13, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit 
            END) AS SalesLast14  
  FROM temp t1
LEFT JOIN temp t2 ON t1.customerID = t2.customerID AND t2.purchaseDate IS NOT NULL
 GROUP BY t1.purchaseDate, t1.customerID, t1.salesUnit
英文:

You can do it with a case statement looking at the dates. I did this in SQL Server but I believe it works in Presto. I think DATEADD might need to be "Date_Add" (with the quotes) in Presto.

Also you mentioned you may need 14 days so I added a column for that. You can see it's just a matter of changing the days in the DateAdd function.

FIDDLE

   SELECT t1.purchaseDate,
          t1.CustomerID,
          t1.SalesUnit,
          SUM(CASE 
                 WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -6, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit 
               END) AS SalesLast7,
          SUM(CASE 
                 WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -13, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit 
               END) AS SalesLast14  
     FROM temp t1
LEFT JOIN temp t2 ON t1.customerID = t2.customerID AND t2.purchaseDate IS NOT NULL
 GROUP BY t1.purchaseDate, t1.customerID, t1.salesUnit
purchaseDate CustomerID SalesUnit SalesLast7 SalesLast14
2018-01-01 1 10 10 10
2018-01-02 1 5 15 15
2018-01-05 2 3 3 3
2018-01-15 1 10 10 15
2018-01-20 2 4 4 4

答案3

得分: 0

我不相信你的样本数据计算正确。对我来说不太清楚你是否真的需要每周的累积值,还是只需要直接的总和。我猜想你想要其中两种选项之一。

第一种选项:

with salesweek as (
    select *,
        -- 基于首次购买开始的重复7天计数的周数
        date_diff('day',
            min(purchasedate) over (partition by customerid),
            purchasedate) / 7 + 1 as weeknum -- 整数除法
    from T
)
select customerid, sum(salesunit) * 1.0 / count(distinct weeknum) as avg7daysales
from salesweek
group by customerid;

或者可能是第二种选项:

with firstpurchase as (
    select *,
        min(purchasedate) over (partition by customerid) as firstpurchasedate
    from T
), salesweek as (
    select *,
        -- 基于首次购买开始的重复7天计数的周数
        date_diff('day', firstpurchasedate, purchasedate) / 7 + 1 as weeknum
    from firstpurchase
), weighted as (
    select customerid, weeknum, purchasedate,
        sum(salesunit) * count(1) over (
            partition by customerid, weeknum
            order by purchasedate
            rows between current row and unbounded following
        ) as weightedsales
    from salesweek
    group by customerid, weeknum, purchasedate
)
select customerid, sum(weightedsales) * 1.0 / count(distinct weeknum)
from weighted
group by customerid
order by customerid;

这是在 SQL Server 中,因为我没有访问 Presto 的权限。

英文:

I don't believe your sample data is tallied right. It's not clear to me if you truly need the cumulative values per week or just a straight total. My guess is that you want something in one of these two options.

with salesweek as (
    select *,
        -- weeks based on repeating 7-day counter starting at first purchase
        date_diff('day',
            min(purchasedate) over (partition by customerid),
            purchasedate) / 7 + 1 as weeknum -- integer divide
    from T
)
select customerid, sum(salesunit) * 1.0 / count(distinct weeknum) as avg7daysales
from salesweek
group by customerid;

Or possibly:

with firstpurchase as (
    select *,
        min(purchasedate) over (partition by customerid) as firstpurchasedate
    from T
  ), salesweek as (
    select *,
        -- weeks based on repeating 7-day counter starting at first purchase
        date_diff('day', firstpurchasedate, purchasedate) / 7 + 1 as weeknum
    from firstpurchase
  ), weighted as (
    select customerid, weeknum, purchasedate,
        sum(salesunit) * count(1) over (
            partition by customerid, weeknum
            order by purchasedate
            rows between current row and unbounded following
        ) as weightedsales
    from salesweek
    group by customerid, weeknum, purchasedate
)
select customerid, sum(weightedsales) * 1.0 / count(distinct weeknum)
from weighted
group by customerid
order by customerid;

https://dbfiddle.uk/sgOOd07H

This is in SQL Server since I don't have access to Presto.

huangapple
  • 本文由 发表于 2023年3月7日 03:53:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655250.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定