生成日期和小时的系列,直到下一个事件。

huangapple go评论65阅读模式
英文:

Generate series of dates and hours until next event

问题

SELECT
  Delivery_Area_ID,
  datetime,
  date,
  hour_of_day,
  current_Delivery_Radius_Meters,
  event_started_timestamp,
  event_ended_time,
  event_duration_hours,
  radius_life_of_that_hour,
  is_default
FROM
  temp_table;
英文:

    CREATE TABLE delivery_data (
      Delivery_Area_ID INT,
      current_Delivery_Radius_Meters INT,
      Event_Started_Timestamp TIMESTAMP,
      event_started_date DATE,
      event_started_hour INT,
      event_started_mins INT,
      event_ended_time TIMESTAMP,
      prev_delivery_radius INT
    );
    
    
    INSERT INTO delivery_data (
      Delivery_Area_ID,
      current_Delivery_Radius_Meters,
      Event_Started_Timestamp,
      event_started_date,
      event_started_hour,
      event_started_mins,
      event_ended_time,
      prev_delivery_radius
    )
    VALUES
      (1, 3500, '2022-01-15 19:46:37.995951 UTC', '2022-01-15', 19, 46, '2022-01-15 20:05:29.049375 UTC', NULL),
      (1, 6500, '2022-01-15 20:05:29.049375 UTC', '2022-01-15', 20, 5, '2022-01-16 12:31:22.778229 UTC', 3500),
      (1, 3500, '2022-01-16 12:31:22.778229 UTC', '2022-01-16', 12, 31, '2022-01-16 12:50:12.562042 UTC', 6500),
      (1, 6500, '2022-01-16 12:50:12.562042 UTC', '2022-01-16', 12, 50, '2022-01-18 20:46:41.937279 UTC', 3500),
      (1, 3500, '2022-01-18 20:46:41.937279 UTC', '2022-01-18', 20, 46, '2022-01-18 20:58:55.794286 UTC', 6500);

I have the above table.
I am expecting the below output.
生成日期和小时的系列,直到下一个事件。

I want to measure the lifespan of a radius for every hour in a day.If you look at the image the first radius 3500 was on '2022-01-15 19:46:37.995951 UTC' at 19:46, meaning life of 3500 for 19th hour is 14, and at '2022-01-15 20:05:29.049375 UTC' the radius was changed to 6500, until 20:05 the radius was still 3500, meaning the life of 3500 at 19th hour is 5 minutes and for 6500 is 55mins at 20th hour.
I have tried below query, but i am not getting accurate results.
fields in my dim_date: (datetime, hour_of_day,date, weekday, week, month, quarter)


with  dim_date AS(
SELECT 
 *
FROM 
 `dim_date` 
WHERE 
 DATE BETWEEN '2022-01-01' AND '2022-12-31'
)
, delivery_radius_log_data AS (
 SELECT 
   Delivery_Area_ID,
   Delivery_Radius_Meters as current_Delivery_Radius_Meters,
   Event_Started_Timestamp,
   --extracting event started date and hour because i want to see how many times the events have happened in a particular hour of a day.
   EXTRACT(DATE FROM Event_Started_Timestamp) AS event_started_date,
   EXTRACT(HOUR FROM Event_Started_Timestamp) AS event_started_hour,
   EXTRACT(MINUTE FROM Event_Started_Timestamp) AS event_started_mins,
  -- getting the next Event_Started_Timestamp , this gives me the current event ended time. 
   LEAD(Event_Started_Timestamp) OVER (
     PARTITION BY Delivery_Area_ID ORDER BY Event_Started_Timestamp ASC
   ) AS event_ended_time,
   -- getting the previous delivery radius, taking it as a helper to check the correctness of execution. 
   LAG(Delivery_Radius_Meters) OVER (
     PARTITION BY Delivery_Area_ID ORDER BY Event_Started_Timestamp ASC
   ) AS prev_delivery_radius
 FROM 
   `radius_data` 
 WHERE 
   DATE(Event_Started_Timestamp) BETWEEN '2022-01-01' AND '2022-12-31'
   AND delivery_area_id ='1'
)
,temp_table as(
select
   dr.delivery_area_id AS delivery_area_id,
   dd.datetime,
   dd.date,
   dd.hour_of_day,
   LEAD(dd.hour_of_day) OVER (PARTITION BY dd.date ORDER BY dd.datetime ASC) AS next_hour_of_day,
   --dr.prev_delivery_radius,
   dr.current_delivery_radius_meters,
   dr.event_started_timestamp,
   event_started_date,
   event_started_hour,
   event_started_mins,
   dr.event_ended_time,
   EXTRACT(DATE FROM dr.event_ended_time) AS event_ended_date,
   EXTRACT(HOUR FROM dr.event_ended_time) AS event_ended_hour,
   EXTRACT(MINUTE FROM dr.event_ended_time) AS event_ended_mins,
   --finding the time difference between the event
   ROUND(TIMESTAMP_DIFF(dr.event_ended_time, dr.event_started_timestamp, second)/3600,2) AS event_duration_hours,
   CASE 
     WHEN 
       TIMESTAMP_DIFF(dr.event_ended_time, dr.event_started_timestamp, second)/3600 >= 24 THEN 'Default'
     ELSE 'Not Default'
   END AS is_default
FROM
  dim_date dd 
  LEFT JOIN delivery_radius_log_data dr 
   ON DATE(dd.date) = DATE(dr.event_started_date) 
  AND TRIM(CAST(dd.hour_of_day AS STRING))  = TRIM(CAST(dr.event_started_hour AS STRING)) 
ORDER BY
 2
)
SELECT
 *,
 CASE
   WHEN event_started_date = event_ended_date and event_started_hour = event_ended_hour
     THEN event_ended_mins - event_started_mins
   WHEN Event_started_date = event_ended_date and event_started_hour < event_ended_hour 
     THEN  60 - event_started_mins --TIMESTAMP_DIFF(event_ended_time, event_started_timestamp, MINUTE)
   WHEN Event_started_date = event_ended_date and event_started_hour < event_ended_hour 
     THEN  60 - event_started_mins
   WHEN event_started_date < event_ended_date
     THEN 60 - event_started_mins
   ELSE 60
 END AS radius_life_of_that_hour
FROM
 temp_table

I want the output table as described in the image. i.e. for every date i need to have 0-23 hours i.e. at least 24 records per day with the respective lifespan of the radius at that hour.
Thank you:)

答案1

得分: 1

I think what you want to do first is build what is called a DateSpine. By choosing a date column, you basically will expand the dataset to include a row for every time component you choose, in your case, every hour.

我认为您首先要做的是构建所谓的DateSpine。通过选择日期列,您基本上会扩展数据集,以包括您选择的每个时间组件,例如每小时。

I hate BigQuery, and couldn't figure out how to make this HOUR instead of DAY, but it's definitely possible by messing around w/ the generate_date_array to create your span of hours.

我不喜欢BigQuery,而且无法弄清楚如何将其更改为每小时而不是每天,但通过在generate_date_array中进行调整,确实可以实现这一点,以创建您的小时范围。

If you can get this to work, you'll be adding a row for every hour if a row does not exist, and then from there you can work on doing your date math to see the relative time from each hour that you want to calculate.

如果您能使这个工作,您将为每个小时添加一行(如果不存在行),然后从那里可以开始进行日期数学运算,以查看您想要计算的每个小时的相对时间。

Sorry for not a complete answer, but hopefully a step in the right direction.

抱歉,回答不完整,但希望是朝着正确方向的一步。

英文:

I think what you want to do first is build what is called a DateSpine. By choosing a date column, you basically will expand the dataset to include a row for every time component you choose, in your case, every hour.

I hate BigQuery, and couldn't figure out how to make this HOUR instead of DAY, but its definitely possible by messing around w/ the generate_date_array to create your span of hours.

with calendar as (
  select 
    date_day 
  from 
    unnest(
      generate_date_array(
        (
          SELECT MIN(prediction_point) 
          FROM your_table
        ), 
        (
          SELECT MAX(prediction_point) 
          FROM your_table
        )
      )
    ) as date_day
), 
spine as (
  select 
    distinct date_DAY as period 
  from 
    calendar
) 
select 
  cast(spine.period as timestamp) as prediction_point_SPINE_START, 
  timestamp_add(
    cast(
      date_add(spine.period, INTERVAL 1 DAY) as timestamp
    ), 
    INTERVAL -1 second
  ) as prediction_point_SPINE_END, 
  st.* 
from 
  spine 
  left outer your_table st on cast(
    date_trunc(
      cast(st.prediction_point as date), 
      DAY
    ) as date
  ) = spine.period

If you can get this to work, you'll be adding a row for every hour if a row does not exist, and then from there you can work on doing your date math to see the relative time from each hour that you want to calculate.

Sorry for not a complete answer, but hopefully a step in the right direction.

huangapple
  • 本文由 发表于 2023年4月13日 22:06:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006409.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定