How can I write an SQL query that (1) groups by average and (2) adds a new column with the average of the entire dataset

huangapple go评论69阅读模式
英文:

How can I write an SQL query that (1) groups by average and (2) adds a new column with the average of the entire dataset

问题

我已经为您翻译好了代码部分,以下是翻译结果:

  1. -- 步骤1:尝试获取不合并的数据(工作得很完美)
  2. -- 获取每个地点的平均销售额
  3. SELECT location_id, AVG(sales)
  4. FROM table
  5. WHERE (date > '2023-01-01')
  6. GROUP BY location_id;
  7. -- 使用窗口函数获取总平均销售额(对所有数据)
  8. SELECT location_id, AVG(sales) OVER () AS total_avg
  9. FROM table
  10. WHERE (date > '2023-01-01');
  11. -- 步骤2:尝试合并 - 这是我遇到困难的地方
  12. -- 尝试合并聚合函数和窗口函数(不可行)
  13. SELECT location_id, AVG(sales) AS location_avg, AVG(sales) OVER () AS total_avg
  14. FROM table
  15. WHERE (date > '2023-01-01');
  16. -- 创建一个CTE(公共表达式)以按地点计算平均时间,然后与窗口函数连接?
  17. -- 这里有两个问题:(1)你不能在CTE中调用窗口函数(a. 不起作用),第二个问题是你不能在第二个函数中使用WHERE子句
  18. WITH CTE AS (
  19. SELECT location_id AS CTE_location_id, AVG(sales) AS avg_sales_by_location
  20. FROM table
  21. WHERE (date > '2023-01-01')
  22. GROUP BY CTE_location_id
  23. )
  24. SELECT a.location_id, AVG(sales) OVER () AS total_avg, b.avg_sales_by_location
  25. FROM table a
  26. WHERE (date > '2023-01-01')
  27. JOIN CTE b
  28. ON b.CTE_location_id = a.location_id;

请注意,我已经将日期格式更改为ISO 8601日期格式('YYYY-MM-DD')以适应SQL查询的要求。

英文:

I've been breaking my head over this for the past day and I cant seem to figure it out. Maybe it's not possible? (please let me know if that's the case).

Basically I'm trying to get the average sales of each location and then add an extra column that shows me the average sales of the entire dataset. The two requirements separately are easy, the problem is combining them.

Current data set -

Location Sales
USA 5
France 10
India 15
USA 3
France 4
India 5

What I would like to produce -

Location Avg.Sales Dataset_Avg.
USA 4 7
France 7 7
India 10 7

-- Step 1 trying to get the data without combining (works perfectly)

-- Gives you the avg sales by location

  1. SELECT location_id, AVG(sales)
  2. FROM table
  3. WHERE (date > '01/01/2023')
  4. GROUP BY location_id;

-- Window function that gives you the total avg time to fill (for all

  1. SELECT location_id, AVG(sales) OVER () AS total_avg
  2. FROM table
  3. WHERE (date > '01/01/2023');

-- Step 2 trying to combine - this is where I'm struggling

-- Tried combining Aggregate and Window function (not possible)

  1. SELECT location_id, AVG(sales) AS location_avg, AVG(sales) OVER () AS total_avg
  2. FROM table
  3. WHERE (date > '01/01/2023');

-- Create a CTE with time to fill by location and then join to window function?
-- 2 issues here (1) you cant call a window function in the CTE (a. doesnt work) and 2nd problem is you cant have a where clause in the 2nd function

  1. WITH CTE AS (
  2. SELECT location_id AS CTE_location_id, AVG(sales) AS avg_sales_by_location
  3. FROM table
  4. WHERE (date > '01/01/2023')
  5. GROUP BY CTE_location_id
  6. )
  7. SELECT a.location_id, a.(AVG(sales) OVER ()) AS total_avg, b.avg_time_to_fill
  8. FROM table a
  9. WHERE (date > '01/01/2023')
  10. JOIN CTE b
  11. ON b.CTE_location_id = a.location_id;

答案1

得分: 2

你只需要使用两个窗口函数,不需要更复杂的东西:

  1. select distinct Location,
  2. Avg(sales) over(partition by location) AvgSales,
  3. Avg(sales) over() DatasetAvg
  4. from t
  5. order by Location;
英文:

You don't need anything more complicated that using two window functions:

  1. select distinct Location,
  2. Avg(sales) over(partition by location) AvgSales,
  3. Avg(sales) over() DatasetAvg
  4. from t
  5. order by Location;

答案2

得分: 1

你可以通过使用子查询或公共表达式(CTE)来计算整体平均销售额,然后将其与按位置分组的平均销售额进行连接来实现所需的结果。以下是编写SQL查询以获取预期输出的方式:

  1. -- 步骤1:按位置获取平均销售额
  2. SELECT location AS 位置, AVG(sales) AS 平均销售额
  3. FROM 表名
  4. WHERE date > '01/01/2023'
  5. GROUP BY location;
  6. -- 步骤2:获取整个数据集的平均销售额(总体平均)
  7. SELECT AVG(sales) AS 数据集平均
  8. FROM 表名
  9. WHERE date > '01/01/2023';
  10. -- 步骤3:使用子查询或CTE组合步骤1和步骤2的结果
  11. WITH 按位置平均销售额 AS (
  12. SELECT location AS 位置, AVG(sales) AS 平均销售额
  13. FROM 表名
  14. WHERE date > '01/01/2023'
  15. GROUP BY location
  16. )
  17. SELECT
  18. A.位置,
  19. A.平均销售额,
  20. B.数据集平均
  21. FROM 按位置平均销售额 A
  22. CROSS JOIN (
  23. SELECT AVG(sales) AS 数据集平均
  24. FROM 表名
  25. WHERE date > '01/01/2023'
  26. ) B;

结果:

  1. | 位置 | 平均销售额 | 数据集平均 |
  2. |----------|----------|-------------|
  3. | 美国 | 4 | 7 |
  4. | 法国 | 7 | 7 |
  5. | 印度 | 10 | 7 |

在此查询中,我们首先分别计算了按位置的平均销售额和单独计算了整体平均销售额的查询。最后,我们使用CTE(按位置平均销售额)获取按位置的平均销售额,并将其与整体平均销售额交叉连接以将结果合并到单个表中。这里使用CROSS JOIN是因为整体平均结果中只有一行,因此它将与按位置平均销售额 CTE 的每一行连接。

这将为您提供所需的输出,其中包括每个位置的平均销售额和整个数据集的平均销售额。

英文:

You can achieve the desired result by using a subquery or a Common Table Expression (CTE) to calculate the overall average sales and then joining it with the grouped average sales by location. Here's how you can write the SQL query to get the expected output:

  1. -- Step 1: Get the average sales by location
  2. SELECT location AS Location, AVG(sales) AS AvgSales
  3. FROM table_name
  4. WHERE date > '01/01/2023'
  5. GROUP BY location;
  6. -- Step 2: Get the average sales for the entire dataset (overall average)
  7. SELECT AVG(sales) AS Dataset_Avg
  8. FROM table_name
  9. WHERE date > '01/01/2023';
  10. -- Step 3: Combine the results from Step 1 and Step 2 using a subquery or CTE
  11. WITH AvgSalesByLocation AS (
  12. SELECT location AS Location, AVG(sales) AS AvgSales
  13. FROM table_name
  14. WHERE date > '01/01/2023'
  15. GROUP BY location
  16. )
  17. SELECT
  18. A.Location,
  19. A.AvgSales,
  20. B.Dataset_Avg
  21. FROM AvgSalesByLocation A
  22. CROSS JOIN (
  23. SELECT AVG(sales) AS Dataset_Avg
  24. FROM table_name
  25. WHERE date > '01/01/2023'
  26. ) B;

Result:

  1. | Location | AvgSales | Dataset_Avg |
  2. |----------|----------|-------------|
  3. | USA | 4 | 7 |
  4. | France | 7 | 7 |
  5. | India | 10 | 7 |

In this query, we first calculate the average sales by location and then calculate the overall average sales in separate queries. Finally, we use a CTE (AvgSalesByLocation) to get the average sales by location and cross join it with the overall average sales to combine the results in a single table. The CROSS JOIN here is used because there is only one row in the overall average result, so it will join with every row of the AvgSalesByLocation CTE.

This will give you the desired output with the average sales for each location and the overall dataset average in the same table.

huangapple
  • 本文由 发表于 2023年7月28日 05:04:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76783402.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定