英文:
Query across partitions in partitioned table in BigQuery SQL
问题
我有一个按天分区的BigQuery表格,我需要返回属于一个月的所有分区列的平均值。换句话说,我的期望输出将有一个年-月的列和一个同一列值在一个月内的平均值的列:`|年-月|平均值|`。如果你想要,可以使用分区的group_by。
我已经成功地获取了单个分区的这些信息,例如对于2023年2月:
SELECT *
FROM `name_of_the_table`
WHERE EXTRACT(YEAR FROM DATE(_PARTITIONTIME)) = 2023
AND EXTRACT(MONTH FROM DATE(_PARTITIONTIME)) = 02
)
通过循环运行这个查询,我可以得到期望的输出,但我希望学到一个更智能的解决方案。
我还成功地获取了分区信息:
SELECT table_name, partition_id, total_rows
FROM `dataset_name.INFORMATION_SCHEMA.PARTITIONS`
WHERE partition_id IS NOT NULL
AND table_name = 'table_name'
ORDER BY partition_id DESC
但是我不知道如何合并这些信息。提前谢谢。
英文:
I got a BigQuery table partitioned on a daily basis and i need to return the average of a columns for all the partition that belongs to a month. In other words, my desired output would have a column for year-month and a column with the average of the values of the same columns over a month: |year-month|avg_value|
. A group_by partition, if you want.
I managed to to access those informations for a single partition, e.g. for february 2023:
SELECT *
FROM `name_of_the_table`
WHERE EXTRACT(YEAR FROM DATE(_PARTITIONTIME))= 2023
AND EXTRACT(MONTH FROM DATE(_PARTITIONTIME)) = 02
)
and by looping over this query i can reach my desired output, but i look forward to learn a smarter solution.
I also have managed to access the partition infos:
SELECT table_name, partition_id, total_rows
FROM `dataset_name.INFORMATION_SCHEMA.PARTITIONS`
WHERE partition_id IS NOT NULL
and table_name = 'table_name'
order by partition_id desc
but have no clues on how to merge those infos. Thank you in advance.
答案1
得分: 1
以下是您要的代码部分的翻译:
SELECT
FORMAT_TIMESTAMP("%Y%m", _PARTITIONTIME) year_month,
COUNT(*)
FROM
`table`
GROUP BY
1
ORDER BY
1 DESC
英文:
Sounds like you're looking for a group by.
SELECT
FORMAT_TIMESTAMP("%Y%m", _PARTITIONTIME) year_month,
COUNT(*)
FROM
`table`
GROUP BY
1
ORDER BY
1 DESC
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论