英文:
Generate series of dates and hours until next event
问题
SELECT
Delivery_Area_ID,
datetime,
date,
hour_of_day,
current_Delivery_Radius_Meters,
event_started_timestamp,
event_ended_time,
event_duration_hours,
radius_life_of_that_hour,
is_default
FROM
temp_table;
英文:
CREATE TABLE delivery_data (
Delivery_Area_ID INT,
current_Delivery_Radius_Meters INT,
Event_Started_Timestamp TIMESTAMP,
event_started_date DATE,
event_started_hour INT,
event_started_mins INT,
event_ended_time TIMESTAMP,
prev_delivery_radius INT
);
INSERT INTO delivery_data (
Delivery_Area_ID,
current_Delivery_Radius_Meters,
Event_Started_Timestamp,
event_started_date,
event_started_hour,
event_started_mins,
event_ended_time,
prev_delivery_radius
)
VALUES
(1, 3500, '2022-01-15 19:46:37.995951 UTC', '2022-01-15', 19, 46, '2022-01-15 20:05:29.049375 UTC', NULL),
(1, 6500, '2022-01-15 20:05:29.049375 UTC', '2022-01-15', 20, 5, '2022-01-16 12:31:22.778229 UTC', 3500),
(1, 3500, '2022-01-16 12:31:22.778229 UTC', '2022-01-16', 12, 31, '2022-01-16 12:50:12.562042 UTC', 6500),
(1, 6500, '2022-01-16 12:50:12.562042 UTC', '2022-01-16', 12, 50, '2022-01-18 20:46:41.937279 UTC', 3500),
(1, 3500, '2022-01-18 20:46:41.937279 UTC', '2022-01-18', 20, 46, '2022-01-18 20:58:55.794286 UTC', 6500);
I have the above table.
I am expecting the below output.
I want to measure the lifespan of a radius for every hour in a day.If you look at the image the first radius 3500 was on '2022-01-15 19:46:37.995951 UTC' at 19:46, meaning life of 3500 for 19th hour is 14, and at '2022-01-15 20:05:29.049375 UTC' the radius was changed to 6500, until 20:05 the radius was still 3500, meaning the life of 3500 at 19th hour is 5 minutes and for 6500 is 55mins at 20th hour.
I have tried below query, but i am not getting accurate results.
fields in my dim_date: (datetime, hour_of_day,date
, weekday, week, month, quarter)
with dim_date AS(
SELECT
*
FROM
`dim_date`
WHERE
DATE BETWEEN '2022-01-01' AND '2022-12-31'
)
, delivery_radius_log_data AS (
SELECT
Delivery_Area_ID,
Delivery_Radius_Meters as current_Delivery_Radius_Meters,
Event_Started_Timestamp,
--extracting event started date and hour because i want to see how many times the events have happened in a particular hour of a day.
EXTRACT(DATE FROM Event_Started_Timestamp) AS event_started_date,
EXTRACT(HOUR FROM Event_Started_Timestamp) AS event_started_hour,
EXTRACT(MINUTE FROM Event_Started_Timestamp) AS event_started_mins,
-- getting the next Event_Started_Timestamp , this gives me the current event ended time.
LEAD(Event_Started_Timestamp) OVER (
PARTITION BY Delivery_Area_ID ORDER BY Event_Started_Timestamp ASC
) AS event_ended_time,
-- getting the previous delivery radius, taking it as a helper to check the correctness of execution.
LAG(Delivery_Radius_Meters) OVER (
PARTITION BY Delivery_Area_ID ORDER BY Event_Started_Timestamp ASC
) AS prev_delivery_radius
FROM
`radius_data`
WHERE
DATE(Event_Started_Timestamp) BETWEEN '2022-01-01' AND '2022-12-31'
AND delivery_area_id ='1'
)
,temp_table as(
select
dr.delivery_area_id AS delivery_area_id,
dd.datetime,
dd.date,
dd.hour_of_day,
LEAD(dd.hour_of_day) OVER (PARTITION BY dd.date ORDER BY dd.datetime ASC) AS next_hour_of_day,
--dr.prev_delivery_radius,
dr.current_delivery_radius_meters,
dr.event_started_timestamp,
event_started_date,
event_started_hour,
event_started_mins,
dr.event_ended_time,
EXTRACT(DATE FROM dr.event_ended_time) AS event_ended_date,
EXTRACT(HOUR FROM dr.event_ended_time) AS event_ended_hour,
EXTRACT(MINUTE FROM dr.event_ended_time) AS event_ended_mins,
--finding the time difference between the event
ROUND(TIMESTAMP_DIFF(dr.event_ended_time, dr.event_started_timestamp, second)/3600,2) AS event_duration_hours,
CASE
WHEN
TIMESTAMP_DIFF(dr.event_ended_time, dr.event_started_timestamp, second)/3600 >= 24 THEN 'Default'
ELSE 'Not Default'
END AS is_default
FROM
dim_date dd
LEFT JOIN delivery_radius_log_data dr
ON DATE(dd.date) = DATE(dr.event_started_date)
AND TRIM(CAST(dd.hour_of_day AS STRING)) = TRIM(CAST(dr.event_started_hour AS STRING))
ORDER BY
2
)
SELECT
*,
CASE
WHEN event_started_date = event_ended_date and event_started_hour = event_ended_hour
THEN event_ended_mins - event_started_mins
WHEN Event_started_date = event_ended_date and event_started_hour < event_ended_hour
THEN 60 - event_started_mins --TIMESTAMP_DIFF(event_ended_time, event_started_timestamp, MINUTE)
WHEN Event_started_date = event_ended_date and event_started_hour < event_ended_hour
THEN 60 - event_started_mins
WHEN event_started_date < event_ended_date
THEN 60 - event_started_mins
ELSE 60
END AS radius_life_of_that_hour
FROM
temp_table
I want the output table as described in the image. i.e. for every date i need to have 0-23 hours i.e. at least 24 records per day with the respective lifespan of the radius at that hour.
Thank you:)
答案1
得分: 1
I think what you want to do first is build what is called a DateSpine. By choosing a date column, you basically will expand the dataset to include a row for every time component you choose, in your case, every hour.
我认为您首先要做的是构建所谓的DateSpine。通过选择日期列,您基本上会扩展数据集,以包括您选择的每个时间组件,例如每小时。
I hate BigQuery, and couldn't figure out how to make this HOUR instead of DAY, but it's definitely possible by messing around w/ the generate_date_array to create your span of hours.
我不喜欢BigQuery,而且无法弄清楚如何将其更改为每小时而不是每天,但通过在generate_date_array中进行调整,确实可以实现这一点,以创建您的小时范围。
If you can get this to work, you'll be adding a row for every hour if a row does not exist, and then from there you can work on doing your date math to see the relative time from each hour that you want to calculate.
如果您能使这个工作,您将为每个小时添加一行(如果不存在行),然后从那里可以开始进行日期数学运算,以查看您想要计算的每个小时的相对时间。
Sorry for not a complete answer, but hopefully a step in the right direction.
抱歉,回答不完整,但希望是朝着正确方向的一步。
英文:
I think what you want to do first is build what is called a DateSpine. By choosing a date column, you basically will expand the dataset to include a row for every time component you choose, in your case, every hour.
I hate BigQuery, and couldn't figure out how to make this HOUR instead of DAY, but its definitely possible by messing around w/ the generate_date_array to create your span of hours.
with calendar as (
select
date_day
from
unnest(
generate_date_array(
(
SELECT MIN(prediction_point)
FROM your_table
),
(
SELECT MAX(prediction_point)
FROM your_table
)
)
) as date_day
),
spine as (
select
distinct date_DAY as period
from
calendar
)
select
cast(spine.period as timestamp) as prediction_point_SPINE_START,
timestamp_add(
cast(
date_add(spine.period, INTERVAL 1 DAY) as timestamp
),
INTERVAL -1 second
) as prediction_point_SPINE_END,
st.*
from
spine
left outer your_table st on cast(
date_trunc(
cast(st.prediction_point as date),
DAY
) as date
) = spine.period
If you can get this to work, you'll be adding a row for every hour if a row does not exist, and then from there you can work on doing your date math to see the relative time from each hour that you want to calculate.
Sorry for not a complete answer, but hopefully a step in the right direction.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论