SQL 填充每组中的缺失值为平均值

huangapple go评论91阅读模式
英文:

SQL to fill missing value by AVG in each Group

问题

Step 2 填充缺失的日期和价格是通过以下方式完成的,只计算相邻的一个较小值和一个较大值的平均值,而不是计算所有行的平均值:

  1. SELECT d.CreateDate,
  2. COALESCE(Price,
  3. AVG(CASE WHEN m.Price IS NOT NULL THEN m.Price END) OVER (ORDER BY d.CreateDate ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING)
  4. ) as New_Price,
  5. m.*
  6. From #DateRange d
  7. LEFT OUTER JOIN #mytable m on d.CreateDate = m.CreateDate

这将为每个产品组计算相邻的一个较小值和一个较大值的平均价格,而无需显式设置产品为'BB'。

英文:

I want to fill missing date and Price with Average for below table

SQL 填充每组中的缺失值为平均值

Sample data

  1. if OBJECT_ID('tempdb..#mytable') is not null DROP TABLE #mytable
  2. CREATE TABLE #mytable (CreateDate datetime, Product Varchar(4), Price money)
  3. INSERT INTO #mytable VALUES
  4. ('2023-06-09 17:01:00.000','Tree',1),
  5. ('2023-06-09 17:02:00.000','Tree',2),
  6. ('2023-06-09 17:03:00.000','Tree',3),
  7. ('2023-06-09 17:04:00.000','Tree',4),
  8. ('2023-06-09 17:01:00.000','BB',20),
  9. ('2023-06-09 17:02:00.000','BB',40),
  10. ('2023-06-09 17:04:00.000','BB',60),
  11. ('2023-06-09 17:01:00.000','Car',20),
  12. ('2023-06-09 17:03:00.000','Car',30),
  13. ('2023-06-09 17:04:00.000','Car',50)

Step 1 is to create a complete Datetime set

  1. if OBJECT_ID('tempdb..#DateRange') is not null DROP TABLE #DateRange
  2. Create Table #DateRange(CreateDate datetime Primary key Clustered)
  3. GO
  4. Declare @startdate datetime = '2023-06-09 17:01:00', @endtime datetime = '2023-06-09 17:04:00'
  5. While (@startdate <= @endtime)
  6. BEGIN
  7. Insert into #DateRange values (@startdate)
  8. set @startdate = DATEADD(MINUTE, 1, @startdate)
  9. END

Step 2 to fill NULL with average of Only ONE upper and Only ONE lower number , not AVG of all rows

  1. SELECT d.CreateDate,
  2. COALESCE(Price, AVG(Price) OVER ()) as New_Price,
  3. m.*
  4. From #DateRange d
  5. LEFT OUTER JOIN #mytable m on d.CreateDate = m.CreateDate
  6. and Product = 'BB'

My question is how loop through each Product Group so I don't need to set product = BB explicitly?

My expected result is

SQL 填充每组中的缺失值为平均值

答案1

得分: 2

尝试以下(如果您正在使用 SQL Server 2022):

  1. declare @start_time datetime = '2023-06-09 17:01:00';
  2. declare @end_time datetime = '2023-06-09 17:04:00';
  3. with Calendar as
  4. (-- 步骤1:根据起始时间和结束时间构建日历表。
  5. select top(datediff(minute, @start_time, @end_time) +1)
  6. dateadd(minute, row_number() over (order by number) -1, @start_time) dt
  7. from master..spt_values
  8. )
  9. select all_dts.dt as CreateDate,
  10. p.product,
  11. -- 使用 first_value last_value 函数以及 (ignore nulls) 选项获取下一个和前一个非空价格值。
  12. isnull(tbl.price,
  13. (isnull(first_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between current row and unbounded following), 0) +
  14. isnull(last_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between unbounded preceding and current row), 0)) /2.0
  15. ) as price
  16. from Calendar all_dts
  17. -- 步骤2:在步骤1生成的日期和产品的不同值之间执行交叉连接,以获取所有可能的组合(日期和产品)
  18. cross join (select distinct product from #mytable) p
  19. -- 步骤3:与表进行左连接,以获取缺失的日期
  20. left join #mytable tbl
  21. on all_dts.dt = tbl.CreateDate and
  22. p.product = tbl.product
  23. order by p.product, all_dts.dt

对于 SQL Server 的早期版本,我们可以使用子查询代替 first_valuelast_value 函数以及 ignore null

  1. declare @start_time datetime = '2023-06-09 17:01:00';
  2. declare @end_time datetime = '2023-06-09 17:04:00';
  3. with Calendar as
  4. (-- 步骤1:根据起始时间和结束时间构建日历表。
  5. select top(datediff(minute, @start_time, @end_time) +1)
  6. dateadd(minute, row_number() over (order by number) -1, @start_time) dt
  7. from master..spt_values
  8. )
  9. select all_dts.dt as CreateDate,
  10. p.product,
  11. -- 使用子查询获取下一个和前一个非空价格值。
  12. isnull(tbl.price,
  13. (isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate > all_dts.dt order by CreateDate), 0) +
  14. isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate < all_dts.dt order by CreateDate desc), 0)) /2.0
  15. ) as price
  16. from Calendar all_dts
  17. -- 步骤2:在步骤1生成的日期和产品的不同值之间执行交叉连接,以获取所有可能的组合(日期和产品)
  18. cross join (select distinct product from #mytable) p
  19. -- 步骤3:与表进行左连接,以获取缺失的日期
  20. left join #mytable tbl
  21. on all_dts.dt = tbl.CreateDate and
  22. p.product = tbl.product
  23. order by p.product, all_dts.dt
英文:

Try the following (if you are using SQL Server 2022):

  1. declare @start_time datetime = &#39;2023-06-09 17:01:00&#39;;
  2. declare @end_time datetime = &#39;2023-06-09 17:04:00&#39;;
  3. with Calendar as
  4. (-- step1: build a calender table based on your start and end time.
  5. select top(datediff(minute, @start_time, @end_time) +1)
  6. dateadd(minute, row_number() over (order by number) -1, @start_time) dt
  7. from master..spt_values
  8. )
  9. select all_dts.dt as CreateDate,
  10. p.product,
  11. -- use first_value and last_value functions with the (ignore nulls) option to get the next and previous not null price value.
  12. isnull(tbl.price,
  13. (isnull(first_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between current row and unbounded following), 0) +
  14. isnull(last_value(tbl.price) ignore nulls over (partition by p.product order by all_dts.dt range between unbounded preceding and current row), 0)) /2.0
  15. ) as price
  16. from Calendar all_dts
  17. -- step2: do a cross join between the generated dates in step1 and the distinct values of products to get all possisble combinations of (dates and products)
  18. cross join (select distinct product from #mytable) p
  19. -- step3: do a left join with the table to get the missing dates
  20. left join #mytable tbl
  21. on all_dts.dt = tbl.CreateDate and
  22. p.product = tbl.product
  23. order by p.product, all_dts.dt

demo

For previous versions of the SQL server, we could implement a subquery instead of the first_value and last_value functions with the ignore null:

  1. declare @start_time datetime = &#39;2023-06-09 17:01:00&#39;;
  2. declare @end_time datetime = &#39;2023-06-09 17:04:00&#39;;
  3. with Calendar as
  4. (-- step1: build a calender table based on your start and end time.
  5. select top(datediff(minute, @start_time, @end_time) +1)
  6. dateadd(minute, row_number() over (order by number) -1, @start_time) dt
  7. from master..spt_values
  8. )
  9. select all_dts.dt as CreateDate,
  10. p.product,
  11. -- Use subquery to get the next and previous not null price value.
  12. isnull(tbl.price,
  13. (isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate &gt; all_dts.dt order by CreateDate), 0) +
  14. isnull((select top 1 price from #mytable mtbl where mtbl.product = p.product and mtbl.CreateDate &lt; all_dts.dt order by CreateDate desc), 0)) /2.0
  15. ) as price
  16. from Calendar all_dts
  17. -- step2: do a cross join between the generated dates in step1 and the distinct values of products to get all possisble combinations of (dates and products)
  18. cross join (select distinct product from #mytable) p
  19. -- step3: do a left join with the table to get the missing dates
  20. left join #mytable tbl
  21. on all_dts.dt = tbl.CreateDate and
  22. p.product = tbl.product
  23. order by p.product, all_dts.dt

demo

答案2

得分: 1

以下是您要翻译的内容:

你可以使用一个公用表表达式(CTE)。
首先创建日期列表和创建独特产品列表。

然后,将日期独特产品列表与名为mytable的表连接,以查找日期之间的间隙。

  1. 声明 @startdate datetime = (选择 #mytable 中的最小(CreateDate))
  2. , @endtime datetime = (选择 #mytable 中的最大(CreateDate))
  3. ;使用 Listunique 作为 (
  4. 选择 distinct Product
  5. #mytable
  6. ),
  7. _List (date_,Product) 作为 (
  8. 选择 @startdate 作为 date_,Product
  9. (选择 distinct Product Listunique)a
  10. union all
  11. 选择 DATEADD(minute,1, date_) 作为 date_,Product
  12. _List
  13. 其中 date_ < @endtime
  14. )
  15. 选择 a.*,f.price
  16. _List a
  17. full join #mytable b on a.date_=b.CreateDate a.Product=b.Product
  18. outer apply (
  19. 选择 sum(ISNULL(p,0) +ISNULL(n,0)) /2 作为 price
  20. (
  21. 选择 Price 作为 p,0 作为 n #mytable l
  22. 其中 l.Product=a.Product l.CreateDate=DATEADD(minute,-1, a.date_)
  23. union
  24. 选择 0 作为 p,Price 作为 n #mytable l
  25. 其中 l.Product=a.Product l.CreateDate=DATEADD(minute,1, a.date_)
  26. )d
  27. )f
  28. 其中 b.CreateDate is null

演示

英文:

You can use a CTE.
First create list date and create list unique Product

Then join List Date_uniqueProduct with table mytable for find gap with date

  1. Declare @startdate datetime = (select min(CreateDate) from #mytable)
  2. , @endtime datetime =(select max(CreateDate) from #mytable)
  3. ;with Listunique as (
  4. select distinct Product
  5. from #mytable
  6. ),
  7. _List (date_,Product) as (
  8. select @startdate as date_,Product
  9. from (select distinct Product from Listunique)a
  10. union all
  11. select DATEADD(minute,1, date_) as date_,Product
  12. from _List
  13. where date_ &lt;@endtime
  14. )
  15. select a.*,f.price
  16. from _List a
  17. full join #mytable b on a.date_=b.CreateDate and a.Product=b.Product
  18. outer apply (
  19. select sum(ISNULL(p,0) +ISNULL(n,0)) /2 as price
  20. from (
  21. select Price as p,0 as n from #mytable l
  22. where l.Product=a.Product and l.CreateDate=DATEADD(minute,-1, a.date_)
  23. union
  24. select 0 as p,Price as n from #mytable l
  25. where l.Product=a.Product and l.CreateDate=DATEADD(minute,1, a.date_)
  26. )d
  27. )f
  28. where b.CreateDate is null

Demo

答案3

得分: 1

使用GENERATE_SERIES创建一个日期范围,将这个日期列表与表连接以识别缺失的值,然后使用每个组的第一个和最后一个记录的平均值(使用FIRST_VALUELAST_VALUE)填充缺失的值:

  1. Declare @startdate datetime = (select min(CreateDate) from #mytable)
  2. ,@endtime datetime =(select max(CreateDate) from #mytable);
  3. with daterange as (
  4. SELECT DATEADD(minute,value,@startdate) as CreateDate, Product
  5. FROM GENERATE_SERIES(datepart(minute, @startdate) - 1,
  6. datepart(minute, @endtime) - 1,1)
  7. CROSS JOIN (SELECT DISTINCT Product FROM #mytable) p
  8. ),
  9. cte as (
  10. SELECT d.*, t.Price
  11. FROM daterange d
  12. LEFT JOIN #mytable t on d.CreateDate = t.CreateDate AND d.Product = t.Product
  13. ),
  14. avg_cte as (
  15. select *,(COALESCE(first_value(price) ignore nulls over (partition by product order by CreateDate range between current row and unbounded following), 0) +
  16. COALESCE(last_value(price) ignore nulls over (partition by product order by CreateDate range between unbounded preceding and current row), 0)) /2.0 as avg_price
  17. from cte
  18. )
  19. select CreateDate, Product, avg_price
  20. from avg_cte
  21. where price is null

结果:

  1. CreateDate Product avg_price
  2. 2023-06-09 17:03:00.000 BB 50.0000
  3. 2023-06-09 17:02:00.000 Car 25.0000

演示链接

英文:

Use GENERATE_SERIES to create a date range, join this list of dates to the table to identify the missing values, and then fill in the missing values using the average of only the first and last record of each group using (FIRST_VALUE and LAST_VALUE):

  1. Declare @startdate datetime = (select min(CreateDate) from #mytable)
  2. ,@endtime datetime =(select max(CreateDate) from #mytable);
  3. with daterange as (
  4. SELECT DATEADD(minute,value,@startdate) as CreateDate, Product
  5. FROM GENERATE_SERIES(datepart(minute, @startdate) - 1,
  6. datepart(minute, @endtime) - 1,1)
  7. CROSS JOIN (SELECT DISTINCT Product FROM #mytable) p
  8. ),
  9. cte as (
  10. SELECT d.*, t.Price
  11. FROM daterange d
  12. LEFT JOIN #mytable t on d.CreateDate = t.CreateDate AND d.Product = t.Product
  13. ),
  14. avg_cte as (
  15. select *,(COALESCE(first_value(price) ignore nulls over (partition by product order by CreateDate range between current row and unbounded following), 0) +
  16. COALESCE(last_value(price) ignore nulls over (partition by product order by CreateDate range between unbounded preceding and current row), 0)) /2.0 as avg_price
  17. from cte
  18. )
  19. select CreateDate, Product, avg_price
  20. from avg_cte
  21. where price is null

Result :

  1. CreateDate Product Price
  2. 2023-06-09 17:03:00.000 BB 50.0000
  3. 2023-06-09 17:02:00.000 Car 25.0000

Demo here

huangapple
  • 本文由 发表于 2023年6月12日 06:24:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76452746.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定