英文:
Discrepancy between two similar presto SQL queries
问题
以下是您要翻译的内容:
Query 1:
SELECT visitor_id, array_agg(timestamp) as time, array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Query 2:
SELECT visitor_id ,array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Here uri_args is simply a json file which under the key 'offset' contains the value of the offset for the particular API response. This is from response log table of a server.
Although the two queries are similar and according to me ought to return the same values in the offset_list column i find the following discrepancy :
I will consider a particular visitor_id to convey it clearly, for a visitor_id ='12345' query i returns the following row in the offset_list col
[0, 0, 0, 10, 0, 10, 20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 57, 77, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41]
and for query 2 the output is as follows :
[20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 77, 57, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41, 0, 0, 0, 10, 0, 10]
I can observe the the two are circular permutations of each other but fail to see why this is happening. Please help me understand what the difference is the inner working of each query. The first reply suits the intent of my quest which is to capture the visitors journey on the homepage.
英文:
I have the following SQL queries that should written similar results wrt offset column, but dont.
Query 1:
SELECT visitor_id, array_agg(timestamp) as time, array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Query 2:
SELECT visitor_id ,array_agg(offset) as offset_list
from
(SELECT * FROM
(
SELECT visitor_id, timestamp,
cast(json_extract(uri_args, '$.offset') AS int) as offset
FROM table_t
where year = 2023 and month = 1 and day = 27 and
request_uri = '/home_page')
order by visitor_id, timestamp)
group by visitor_id
order by cardinality(offset_list) desc
Here uri_args is simply a json file which under the key 'offset' contains the value of the offset for the particular API response. This is from response log table of a server.
Although the two queries are similar and according to me ought to return the same values in the offset_list column i find the following discrepancy :
I will consider a particular visitor_id to convey it clearly, for a visitor_id ='12345' query i returns the following row in the offset_list col
[0, 0, 0, 10, 0, 10, 20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 57, 77, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41]
and for query 2 the output is as follows :
[20, 32, 42, 0, 0, 20, 53, 77, 57, 0, 10, 20, 31, 10, 41, 0, 10, 41, 54, 77, 0, 10, 31, 41, 54, 10, 31, 54, 77, 57, 10, 20, 32, 0, 10, 21, 33, 44, 72, 52, 0, 10, 20, 31, 41, 0, 0, 0, 10, 0, 10]
I can observe the the two are circular permutations of each other but fail to see why this is happening. Please help me understand what the difference is the inner working of each query. The first reply suits the intent of my quest which is to capture the visitors journey on the homepage.
答案1
得分: 2
如果需要确定性排序的数组元素,请在聚合函数中指定 order by
子句,如文档中所述。
一些聚合函数,如
array_agg()
,根据输入值的顺序产生不同的结果。可以通过在聚合函数内部编写ORDER BY
子句来指定排序。
SELECT visitor_id, array_agg(offset ORDER BY timestamp) AS offset_list
-- ...
英文:
If you need your array elements deterministically ordered specify order by
clause for aggregate function as mentioned in the docs
> Some aggregate functions such as array_agg()
produce different results depending on the order of input values. This ordering can be specified by writing an ORDER BY
Clause within the aggregate function
SELECT visitor_id ,array_agg(offset order by timestamp) as offset_list
-- ...
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论