英文:
SQL query to check the count
问题
我有一个在BigQuery中的列,其格式如下所示。时间(时间以UTC为准)以15分钟的间隔增加(从00:10开始),并持续一整天直到23:......
现在我想使用查询来检查数据是否及时更新。也就是说,在每个时间点,我们应该有多个条目(按时间分组时计数应该大于0)。如果在特定时间点没有计数,我的查询应该返回一个引发该情况的标志。
如果我们在01:05运行查询,我们知道应该有四个点(00:10、00:25、00:40、00:55)的计数不为零。
如何使用查询进行比较和确保这一点?
select timecolumn as time, count(*) as cnt
from 'project.dataset.table'
group by 1
order by 1
将会得到以下结果:
00:10 5
00:25 5
00:40 10
....
英文:
I have a column in bigquery which is time in the below format. The time(time is in utc) increases in 15 minute interval(starting from 00:10) and it continues for the whole day till 23:.....
00:10
00:25
00:40
...
now i want to check if we have data uptodate using a query. That is every time point we should have multiple entries(count should be >0 when grouped by time). If there is no count at a particular point my query should be returning a flag whih raises that.
if we are running the query at 01:05 we know that there should be four points(00:10 00:25 00:40 00:55) with non zero count
How do we compare and ensure this using a query?
select timecolumn as time, count(*) as cnt
from 'project.dataset.table'
group by 1
order by 1
will give the below
00:10 5
00:25 5
00:40 10
....
select timecolumn as time, count(*) as cnt
from 'project.dataset.table'
答案1
得分: 0
I believe you can skip group by
and do something like shown below.
required_times
返回的行数将在一天内从0增长到最多24*4条记录。
my_data_table
仅用于模拟您的数据表。如果它与 required_times
的值不完全匹配,您可能需要应用 timestamp_trunc(timecolumn, MINUTE)
。
with required_times as (
select * from unnest(GENERATE_TIMESTAMP_ARRAY(
date_add(timestamp_trunc(current_timestamp(), DAY), INTERVAL 10 MINUTE),
current_timestamp(),
INTERVAL 15 MINUTE)) as timecolumn
),
my_data_table as (
select * from unnest(GENERATE_TIMESTAMP_ARRAY(
date_add(timestamp_trunc(current_timestamp(), DAY), INTERVAL 10 MINUTE),
date_add(current_timestamp(), INTERVAL -1 HOUR),
INTERVAL 15 MINUTE)) as timecolumn
)
select
required_times.timecolumn
from
required_times
left join my_data_table on required_times.timecolumn = my_data_table.timecolumn
where
my_data_table.timecolumn is null
将返回类似以下的结果,因为缺少一天中最后一个小时的记录在 my_data_table
中。
英文:
I believe you can skip group by
and do something like shown below.
Number of rows returned by required_times
will grow from 0 to a max of 24*4 records during the day.
my_data_table
is just for emulating your data table. You may need to apply timestamp_trunc(timecolumn, MINUTE)
for your data if it does not match exactly with values from required_times
.
with required_times as (
select * from unnest(GENERATE_TIMESTAMP_ARRAY(
date_add(timestamp_trunc(current_timestamp(), DAY), INTERVAL 10 MINUTE),
current_timestamp(),
INTERVAL 15 MINUTE)) as timecolumn
),
my_data_table as (
select * from unnest(GENERATE_TIMESTAMP_ARRAY(
date_add(timestamp_trunc(current_timestamp(), DAy), INTERVAL 10 MINUTE),
date_add(current_timestamp(), INTERVAL -1 HOUR),
INTERVAL 15 MINUTE)) as timecolumn
)
select
required_times.timecolumn
from
required_times
left join my_data_table on required_times.timecolumn = my_data_table.timecolumn
where
my_data_table.timecolumn is null
will return e.g. like this because records for the last hour of day is missing from my_data_table
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论