英文:
How to convert cross join lateral statement to big query?
问题
我的SQL语句是在PostgreSQL中使用交叉连接lateral编写的。但我想在Google BigQuery中执行此SQL语句,而BigQuery不支持交叉连接lateral。我该如何重写这个SQL以在BigQuery引擎中运行?
SELECT
left_table.*,
x.*
FROM
data left_table
JOIN (
SELECT
right_table.source_ip,
STRING_AGG(right_table.session_id, ',') AS session_ids
FROM
data right_table
WHERE
left_table.source_ip = right_table.source_ip
AND (
(right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
OR
(right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
)
GROUP BY
right_table.source_ip
) AS x
ON
true;
请注意,我已经将交叉连接lateral替换为BigQuery支持的JOIN子查询。这应该可以在BigQuery中正常运行。
英文:
My sql statement is written using cross join lateral in postgres sql. But i want to execute this sql statement in google bigquery and bigquery doesn't support cross join lateral. How can i rewrite this sql for bigquery engine?
select
left_table.*,x.*
from
data left_table
cross join lateral
(
select string_agg(right_table.session_id,',')
from
data right_table
where left_table.source_ip = right_table.source_ip
and
((right_table.session_start_time >= left_table.session_start_time and right_table.session_start_time <= left_table.session_end_time)
or
(right_table.session_end_time >= left_table.session_start_time and right_table.session_end_time <= left_table.session_end_time))
group by right_table.source_ip
) x
答案1
得分: 1
没有侧连接,"等效"的内容在选择子句中是一个"相关子查询",通常这会影响性能,因为相关子查询会逐行执行结果集:
SELECT
left_table.*
, (SELECT string_agg(right_table.session_id, ',')
FROM data right_table
WHERE left_table.source_ip = right_table.source_ip
AND (
(
right_table.session_start_time >= left_table.session_start_time
AND right_table.session_start_time <= left_table.session_end_time
)
OR (
right_table.session_end_time >= left_table.session_start_time
AND right_table.session_end_time <= left_table.session_end_time
)
)
GROUP BY right_table.source_ip) x
FROM data left_table
备注:侧连接在FROM子句中以更高效的方式执行。
英文:
Without lateral joins the "equivalent" is a "correlated subquery" in the select clause, and these are often a performance issue as the correlated subquery is executed row by row though the resultset:
SELECT
left_table.*
, (SELECT string_agg(right_table.session_id, ',')
FROM data right_table
WHERE left_table.source_ip = right_table.source_ip
AND (
(
right_table.session_start_time >= left_table.session_start_time
AND right_table.session_start_time <= left_table.session_end_time
)
OR (
right_table.session_end_time >= left_table.session_start_time
AND right_table.session_end_time <= left_table.session_end_time
)
)
GROUP BY right_table.source_ip) x
FROM data left_table
nb: The lateral join is executed in a more efficient manner (as part of the from clause).
答案2
得分: 1
以下是已翻译的内容:
你可以尝试这段代码。在这段代码中,没有使用CROSS JOIN LATERAL,而是使用相关子查询来实现相同的结果。
WITH AggregatedSessions AS (
SELECT
left_table.*,
(
SELECT STRING_AGG(right_table.session_id, ',')
FROM data AS right_table
WHERE
left_table.source_ip = right_table.source_ip
AND (
(right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
OR
(right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
)
GROUP BY right_table.source_ip
) AS session_ids
FROM data AS left_table
)
SELECT * FROM AggregatedSessions;
英文:
You can try this code. In this code CROSS JOIN LATERAL is not used, instead correlated sub-queries are used to achieve the same result.
WITH AggregatedSessions AS (
SELECT
left_table.*,
(
SELECT STRING_AGG(right_table.session_id, ',')
FROM data AS right_table
WHERE
left_table.source_ip = right_table.source_ip
AND (
(right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
OR
(right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
)
GROUP BY right_table.source_ip
) AS session_ids
FROM data AS left_table
)
SELECT * FROM AggregatedSessions;
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论