英文:
Try to optimize/improve my query, where I bring the number of orders and a column for each filter that I apply
问题
我试图获得不同企业的订单数量
所以,我有两个表格:
表格1:
订单编号 | 订单日期 | 商店编号 | 客户编号 | 是否有效 |
---|---|---|---|---|
1 | 2020年01月01日 | 55 | 25 | 有效 |
2 | 2020年05月24日 | 90 | 34 | 无效 |
3 | 2021年09月03日 | 56 | 25 | 有效 |
4 | 2021年10月04日 | 210 | 100 | 无效 |
表格2:
订单编号 | 订单日期 | 商店编号 | 客户编号 | 是否有效 | 产品 | IIM |
---|---|---|---|---|---|---|
1 | 2020年01月01日 | 55 | 25 | 有效 | 可口可乐 | 假 |
1 | 2020年01月01日 | 55 | 25 | 有效 | 一些饮料 | 假 |
1 | 2020年01月01日 | 55 | 25 | 有效 | 一些饮料 | 真 |
1 | 2020年01月01日 | 55 | 25 | 有效 | 一些饮料 | 真 |
2 | 2020年05月24日 | 90 | 34 | 无效 | 一些饮料 | 假 |
3 | 2021年09月03日 | 56 | 25 | 有效 | 一些饮料 | 真 |
3 | 2021年09月03日 | 56 | 25 | 有效 | 一些饮料 | 真 |
4 | 2021年10月04日 | 210 | 100 | 无效 | 一些饮料 | 真 |
4 | 2021年10月04日 | 210 | 100 | 无效 | 一些饮料 | 假 |
所以表格2与表格1非常相似,不同之处在于表格2中,对于每个订单,都创建了一个包含该订单中每个产品的行。此外,我有一个名为IIM的列,如果对于一个订单至少有一个产品的IIM = True,整个订单就被视为IIM = True。
下面是我的问题/问题:
我想在单个查询中获取以下订单数量:
- 商店编号 = 25
- 商店编号 = 25 并且 IIM = True
- 商店编号不包括商店编号 1、2、3
- 商店编号不包括 1、2、3 且 IIM = True
(仅考虑有效订单)
为了实现这个目标,我做了以下操作:
SELECT
yearmonth,
date_,
SUM(cant_store_25) AS cant_store_25,
SUM(cant_store_25_iim) AS cant_store_25_iim,
SUM(cant_store_plus) AS cant_store_plus,
SUM(cant_store_plus_iim) AS cant_store_plus_iim
FROM
(SELECT
yearmonth,
date_,
order_,
MAX(cant_store_25) AS cant_store_25,
MAX(cant_store_25_iim) AS cant_store_25_iim,
MAX(cant_store_plus) AS cant_store_plus,
MAX(cant_store_plus_iim) AS cant_store_plus_iim
FROM
(SELECT DISTINCT
t1.Order_Number AS order_,
t1.Order_Date::DATE AS date_,
YEAR(t1.Order_Date::DATE) * 100 + MONTH(t1.Order_Date::DATE) AS yearmonth,
CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END AS cant_store_25,
CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END AS cant_store_25_iim,
CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END AS cant_store_plus,
CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END AS cant_store_plus_iim,
MAX(IIM) AS IIM_
FROM
Table1 t1
LEFT JOIN
Table2 t2 ON t1.Customer_ID = t2.Customer_ID AND t1.Order_Number = t2.Order_Number
WHERE
t1.Is_Valid = 'Valid'
GROUP BY
t1.Order_Number, date_, yearmonth, t1.Store_ID, IIM)
GROUP BY
1, 2, 3)
WHERE
yearmonth = 202208
GROUP BY
1, 2;
所以是否有更好的方法来实现这个目标?我的数据库很小,所以这个查询对我来说有效,但也许我可以改进一些东西,使其更有效率。老实说,我是新手,我想学习,以防将来需要处理更大的数据库
我不知道是否相关,但我使用的是Snowflake
谢谢!
英文:
I'm trying to get the number of orders for different businesses
So, I have two tables:
Table1:
Order_Number | Order_Date | Store_ID | Customer_ID | Is_Valid |
---|---|---|---|---|
1 | 01-01-2020 | 55 | 25 | Valid |
2 | 24-05-2020 | 90 | 34 | Invalid |
3 | 03-09-2021 | 56 | 25 | Valid |
4 | 04-10-2021 | 210 | 100 | Invalid |
Table2:
Order_Number | Order_Date | Store_ID | Customer_ID | Is_Valid | Product | IIM |
---|---|---|---|---|---|---|
1 | 01-01-2020 | 55 | 25 | Valid | Coca-Cola | False |
1 | 01-01-2020 | 55 | 25 | Valid | some drink | False |
1 | 01-01-2020 | 55 | 25 | Valid | some drink | True |
1 | 01-01-2020 | 55 | 25 | Valid | some drink | True |
2 | 24-05-2020 | 90 | 34 | Invalid | some drink | False |
3 | 03-09-2021 | 56 | 25 | Valid | some drink | True |
3 | 03-09-2021 | 56 | 25 | Valid | some drink | True |
4 | 04-10-2021 | 210 | 100 | Invalid | some drink | True |
4 | 04-10-2021 | 210 | 100 | Invalid | some drink | False |
So table 2 is very similar to table 1, the difference is that in table 2, I have, for an order, a row created for each product that was in that order.
Also, I have the IIM column, where if for an order there is at least one product where IIM = True, the entire order is considered IIM = True
Here comes my question/problem:
I want, in a single query, to get the number of orders where:
- store_id = 25
- store_id = 25 and IIM = True
- store_Id excluding store_Id 1,2,3
- store_id excluding 1,2,3 and IIM = True
(Only taking into account valid orders)
To achieve this, I did this:
SELECT
yearmonth,
date_,
SUM(cant_store_25) AS cant_store_25,
SUM(cant_store_25_iim) AS cant_store_25_iim,
SUM(cant_store_plus) AS cant_store_plus,
SUM(cant_store_plus_iim) AS cant_store_plus_iim
FROM
(SELECT
yearmonth,
date_,
order_,
MAX(cant_store_25) AS cant_store_25,
MAX(cant_store_25_iim) AS cant_store_25_iim,
MAX(cant_store_plus) AS cant_store_plus,
MAX(cant_store_plus_iim) AS cant_store_plus_iim
FROM
(SELECT DISTINCT
t1.Order_Number AS order_,
t1.Order_Date::DATE AS date_,
YEAR(t1.Order_Date::DATE) * 100 + MONTH(t1.Order_Date::DATE) AS yearmonth,
CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END AS cant_store_25,
CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END AS cant_store_25_iim,
CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END AS cant_store_plus,
CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END AS cant_store_plus_iim,
MAX(IIM) AS IIM_
FROM
Table1 t1
LEFT JOIN
Table2 t2 ON t1.Customer_ID = t2.Customer_ID AND t1.Order_Number = t2.Order_Number
WHERE
t1.Is_Valid = 'Valid'
GROUP BY
t1.Order_Number, date_, yearmonth, t1.Store_ID, IIM)
GROUP BY
1, 2, 3)
WHERE
yearmonth = 202208
GROUP BY
1, 2;
So is there a better way to achieve this? My database is small so this query works for me, but maybe I can improve several things to make it more efficient. Honestly, I'm new and I want to learn just in case in the future I have to work with larger databases
I don't know if it's relevant, but I use Snowflake
Thanks!
答案1
得分: 0
难以检查,但第一个查询和子查询可以用以下更简单的方式完成:
SELECT t1.Order_Number AS 订单号,
t1.Order_Date::DATE AS 日期,
MAX(CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END) AS 不能存储25,
MAX(CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END) AS 不能存储25_iim,
MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END) AS 不能存储加,
MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END) AS 不能存储加_iim
FROM 表1 t1
LEFT JOIN 表2 t2
ON t1.Customer_ID = t2.Customer_ID
AND t1.Order_Number = t2.Order_Number
WHERE t1.Is_Valid = 'Valid'
AND YEAR(t1.Order_Date::DATE) = 2022
AND MONTH(t1.Order_Date::DATE) = 8
GROUP BY t1.Order_Number, 日期
然后外部查询将只是对数据进行SUM
。主要思想是你可以在应用MAX
函数之前计算每个订单的条件。好处是我们不需要DISTINCT
,因为我们正在分组。
此外,我删除了MAX(IIM) AS IIM_
,因为它没有用,并提前添加了年月过滤。由于你正在筛选特定的年月,因此没必要计算所有数据然后只获取它们。
英文:
Difficult to check but the first query and subquery can be done with something simpler like the following:
SELECT t1.Order_Number AS order_,
t1.Order_Date::DATE AS date_,
MAX(CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END) AS cant_store_25,
MAX(CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END) AS cant_store_25_iim,
MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END) AS cant_store_plus,
MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END) AS cant_store_plus_iim
FROM Table1 t1
LEFT JOIN Table2 t2
ON t1.Customer_ID = t2.Customer_ID
AND t1.Order_Number = t2.Order_Number
WHERE t1.Is_Valid = 'Valid'
AND YEAR(t1.Order_Date::DATE) = 2022
AND MONTH(t1.Order_Date::DATE) = 8
GROUP BY t1.Order_Number
,date_
and then the outer one will just SUM
the data. The main idea is that you can calculate the conditions for each order earlies applying the MAX
function. The good part is we do not need a DISTINCT
as we are grouping.
Also, I have remove MAX(IIM) AS IIM_
as it is not used, and add the year-month filtering earlier. As you are filtering for particular year-month, there is no point to calculate all the data and then get only its.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论