英文:
calculate the 7 day sale by different start date (the first day customer purchase), to find average purchase unit of each customer by every 7 days
问题
From the table, I have the first table now, and trying to get every 7 day sales from the first day of this customer make the purchase. example is table 2
purchase date | customer id | sales unit |
---|---|---|
2018-01-01 | 1 | 10 |
2018-01-02 | 1 | 5 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
2018-01-21 | 2 | 5 |
purchase date | customer id | sales unit | every 7 day cumulative sales |
---|---|---|---|
2018-01-01 | 1 | 10 | 10 |
2018-01-02 | 1 | 5 | 15 |
2018-01-15 | 1 | 10 | 10 |
2018-01-05 | 2 | 3 | 3 |
2018-01-20 | 2 | 4 | 9 |
2018-01-21 | 2 | 5 | 9 |
The final table should be like this:
purchase week | customer id | 7 day sales unit |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
then I can calculate the average sales per customer
customer id | average of every 7 day sales unit | Calculation |
---|---|---|
1 | 12.5 | (15+10) /2 |
2 | 3.5 | (3+4) /2 |
the hard part is:
-
every customer has a different first day of purchase
-
The purchase dates are not consecutive, so I can't use unbounded or following 6 rows, etc.
-
There are 5 years in the whole dataset, so I can't manually subtract 7, 14, etc.
-
I tried to use date_trunc('week', date, min(date) over (partition by customerid))
-
also tried partition by rows between 6 preceding and current row. But the dates are not consecutive so it doesn't work.
英文:
From the table,I have the first table now, and trying to get every 7 day sales from the first day of this customer make the purchase. example is table 2
purchase date | customer id | sales unit |
---|---|---|
2018-01-01 | 1 | 10 |
2018-01-02 | 1 | 5 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
2018-01-21 | 2 | 5 |
purchase date | customer id | sales unit | every 7 day cumulative sales |
---|---|---|---|
2018-01-01 | 1 | 10 | 10 |
2018-01-02 | 1 | 5 | 15 |
2018-01-15 | 1 | 10 | 10 |
2018-01-05 | 2 | 3 | 3 |
2018-01-20 | 2 | 4 | 9 |
2018-01-21 | 2 | 5 | 9 |
The final table shud be like this:
purchase week | customer id | 7 day sales unit |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 4 |
then I can calculate the average sales per customer
customer id | average of every 7 day sales unit | Calculation |
---|---|---|
1 | 12.5 | (15+10) /2 |
2 | 3.5 | (3+4) /2 |
the hard part is:
-
every customer have different first day of purchase
-
The purchase date are not consequent, so I can't use unbonded or following 6 rows etc.
-
There're 5 years in the whole dataset, so I can't manually -7, -14, etc
-
I tried to use date_trunc('week',date, min(date) over (partition by customerid))
-
also tried parition by rows between 6 proceding and current row. But the date are not consequent so doesn't work
答案1
得分: 0
你可以使用SQL窗口函数的两个步骤来获得你想要的结果:
步骤1:按照每位顾客进行窗口分区,并获取每位顾客的首次购买日期。之后,使用Presto的date_diff()函数来计算从首次购买日期到当前购买日期的日期差,并除以7以获取从首次购买日期开始的周桶。
步骤2:按(customer, customer_sale_week_bucket)分区进行分组,获取销售单位的总和(sales_unit)和最小的购买日期(purchase_date)。
以下是查询:
with orders_with_customer_week_bucket AS
(
select
purchase_date,
customer_id,
sales_unit,
date_diff(day, min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
from
orders
)
select
purchase_week,
customer_id,
seven_day_sales_unit
from
(select
customer_id,
customer_sale_week_bucket,
min(purchase_date) as purchase_week,
sum(sales_unit) as seven_day_sales_unit
from
orders_with_customer_week_bucket
GROUP BY
customer_id,
customer_sale_week_bucket
) r
购买周 | 顾客ID | 七天销售单位 |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 9 |
英文:
You can use SQL window function with 2 steps to get the result you want:
Step 1. Apply a window parition by per customer and get first_purchase_date for each customer. After that, use Presto date_diff() fucntion to calculate the date difference from first purchase date to current purchase date. Devide it by 7 to get week_bucket from the first date of purchase.
Step 2. group by per (customer, customer_sale_week_bucket) and get sum(sales_unit) and min(purchase_date) in each (customer, customer_sale_week_bucket) parition.
Here is the query:
with orders_with_customer_week_bucket AS
(
select
purchase_date,
customer_id,
sales_unit,
date_diff(day,min(purchase_date) over (partition by customer_id), purchase_date) / 7 as customer_sale_week_bucket
from
orders
)
select
purchase_week,
customer_id,
seven_day_sales_unit
from
(select
customer_id,
customer_sale_week_bucket,
min(purchase_date) as purchase_week,
sum(sales_unit) as seven_day_sales_unit
from
orders_with_customer_week_bucket
GROUP BY
customer_id,
customer_sale_week_bucket
)r
purchase_week | customer_id | seven_day_sales_unit |
---|---|---|
2018-01-01 | 1 | 15 |
2018-01-05 | 2 | 3 |
2018-01-15 | 1 | 10 |
2018-01-20 | 2 | 9 |
答案2
得分: 0
你可以使用一个case
语句来根据日期进行操作。我在SQL Server中完成了这个操作,但我相信在Presto中也适用。我认为在Presto中DATEADD
可能需要写成"Date_Add"
(带引号)。
另外,你提到可能需要14天,所以我添加了一个用于存储天数的列。你可以看到,只需在DateAdd
函数中更改天数即可。
SELECT t1.purchaseDate,
t1.CustomerID,
t1.SalesUnit,
SUM(CASE
WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -6, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit
END) AS SalesLast7,
SUM(CASE
WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -13, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit
END) AS SalesLast14
FROM temp t1
LEFT JOIN temp t2 ON t1.customerID = t2.customerID AND t2.purchaseDate IS NOT NULL
GROUP BY t1.purchaseDate, t1.customerID, t1.salesUnit
英文:
You can do it with a case statement looking at the dates. I did this in SQL Server but I believe it works in Presto. I think DATEADD might need to be "Date_Add" (with the quotes) in Presto.
Also you mentioned you may need 14 days so I added a column for that. You can see it's just a matter of changing the days in the DateAdd function.
SELECT t1.purchaseDate,
t1.CustomerID,
t1.SalesUnit,
SUM(CASE
WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -6, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit
END) AS SalesLast7,
SUM(CASE
WHEN t2.purchaseDate BETWEEN DATEADD(DAY, -13, t1.purchaseDate) AND t1.purchaseDate THEN t2.salesUnit
END) AS SalesLast14
FROM temp t1
LEFT JOIN temp t2 ON t1.customerID = t2.customerID AND t2.purchaseDate IS NOT NULL
GROUP BY t1.purchaseDate, t1.customerID, t1.salesUnit
purchaseDate | CustomerID | SalesUnit | SalesLast7 | SalesLast14 |
---|---|---|---|---|
2018-01-01 | 1 | 10 | 10 | 10 |
2018-01-02 | 1 | 5 | 15 | 15 |
2018-01-05 | 2 | 3 | 3 | 3 |
2018-01-15 | 1 | 10 | 10 | 15 |
2018-01-20 | 2 | 4 | 4 | 4 |
答案3
得分: 0
我不相信你的样本数据计算正确。对我来说不太清楚你是否真的需要每周的累积值,还是只需要直接的总和。我猜想你想要其中两种选项之一。
第一种选项:
with salesweek as (
select *,
-- 基于首次购买开始的重复7天计数的周数
date_diff('day',
min(purchasedate) over (partition by customerid),
purchasedate) / 7 + 1 as weeknum -- 整数除法
from T
)
select customerid, sum(salesunit) * 1.0 / count(distinct weeknum) as avg7daysales
from salesweek
group by customerid;
或者可能是第二种选项:
with firstpurchase as (
select *,
min(purchasedate) over (partition by customerid) as firstpurchasedate
from T
), salesweek as (
select *,
-- 基于首次购买开始的重复7天计数的周数
date_diff('day', firstpurchasedate, purchasedate) / 7 + 1 as weeknum
from firstpurchase
), weighted as (
select customerid, weeknum, purchasedate,
sum(salesunit) * count(1) over (
partition by customerid, weeknum
order by purchasedate
rows between current row and unbounded following
) as weightedsales
from salesweek
group by customerid, weeknum, purchasedate
)
select customerid, sum(weightedsales) * 1.0 / count(distinct weeknum)
from weighted
group by customerid
order by customerid;
这是在 SQL Server 中,因为我没有访问 Presto 的权限。
英文:
I don't believe your sample data is tallied right. It's not clear to me if you truly need the cumulative values per week or just a straight total. My guess is that you want something in one of these two options.
with salesweek as (
select *,
-- weeks based on repeating 7-day counter starting at first purchase
date_diff('day',
min(purchasedate) over (partition by customerid),
purchasedate) / 7 + 1 as weeknum -- integer divide
from T
)
select customerid, sum(salesunit) * 1.0 / count(distinct weeknum) as avg7daysales
from salesweek
group by customerid;
Or possibly:
with firstpurchase as (
select *,
min(purchasedate) over (partition by customerid) as firstpurchasedate
from T
), salesweek as (
select *,
-- weeks based on repeating 7-day counter starting at first purchase
date_diff('day', firstpurchasedate, purchasedate) / 7 + 1 as weeknum
from firstpurchase
), weighted as (
select customerid, weeknum, purchasedate,
sum(salesunit) * count(1) over (
partition by customerid, weeknum
order by purchasedate
rows between current row and unbounded following
) as weightedsales
from salesweek
group by customerid, weeknum, purchasedate
)
select customerid, sum(weightedsales) * 1.0 / count(distinct weeknum)
from weighted
group by customerid
order by customerid;
This is in SQL Server since I don't have access to Presto.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论