如何将交叉连接横向语句转换为BigQuery?

huangapple go评论169阅读模式
英文:

How to convert cross join lateral statement to big query?

问题

我的SQL语句是在PostgreSQL中使用交叉连接lateral编写的。但我想在Google BigQuery中执行此SQL语句,而BigQuery不支持交叉连接lateral。我该如何重写这个SQL以在BigQuery引擎中运行?

  1. SELECT
  2. left_table.*,
  3. x.*
  4. FROM
  5. data left_table
  6. JOIN (
  7. SELECT
  8. right_table.source_ip,
  9. STRING_AGG(right_table.session_id, ',') AS session_ids
  10. FROM
  11. data right_table
  12. WHERE
  13. left_table.source_ip = right_table.source_ip
  14. AND (
  15. (right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
  16. OR
  17. (right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
  18. )
  19. GROUP BY
  20. right_table.source_ip
  21. ) AS x
  22. ON
  23. true;

请注意,我已经将交叉连接lateral替换为BigQuery支持的JOIN子查询。这应该可以在BigQuery中正常运行。

英文:

My sql statement is written using cross join lateral in postgres sql. But i want to execute this sql statement in google bigquery and bigquery doesn't support cross join lateral. How can i rewrite this sql for bigquery engine?

  1. select
  2. left_table.*,x.*
  3. from
  4. data left_table
  5. cross join lateral
  6. (
  7. select string_agg(right_table.session_id,&#39;,&#39;)
  8. from
  9. data right_table
  10. where left_table.source_ip = right_table.source_ip
  11. and
  12. ((right_table.session_start_time &gt;= left_table.session_start_time and right_table.session_start_time &lt;= left_table.session_end_time)
  13. or
  14. (right_table.session_end_time &gt;= left_table.session_start_time and right_table.session_end_time &lt;= left_table.session_end_time))
  15. group by right_table.source_ip
  16. ) x

答案1

得分: 1

没有侧连接,"等效"的内容在选择子句中是一个"相关子查询",通常这会影响性能,因为相关子查询会逐行执行结果集:

  1. SELECT
  2. left_table.*
  3. , (SELECT string_agg(right_table.session_id, ',')
  4. FROM data right_table
  5. WHERE left_table.source_ip = right_table.source_ip
  6. AND (
  7. (
  8. right_table.session_start_time >= left_table.session_start_time
  9. AND right_table.session_start_time <= left_table.session_end_time
  10. )
  11. OR (
  12. right_table.session_end_time >= left_table.session_start_time
  13. AND right_table.session_end_time <= left_table.session_end_time
  14. )
  15. )
  16. GROUP BY right_table.source_ip) x
  17. FROM data left_table

备注:侧连接在FROM子句中以更高效的方式执行。

英文:

Without lateral joins the "equivalent" is a "correlated subquery" in the select clause, and these are often a performance issue as the correlated subquery is executed row by row though the resultset:

  1. SELECT
  2. left_table.*
  3. , (SELECT string_agg(right_table.session_id, &#39;,&#39;)
  4. FROM data right_table
  5. WHERE left_table.source_ip = right_table.source_ip
  6. AND (
  7. (
  8. right_table.session_start_time &gt;= left_table.session_start_time
  9. AND right_table.session_start_time &lt;= left_table.session_end_time
  10. )
  11. OR (
  12. right_table.session_end_time &gt;= left_table.session_start_time
  13. AND right_table.session_end_time &lt;= left_table.session_end_time
  14. )
  15. )
  16. GROUP BY right_table.source_ip) x
  17. FROM data left_table

nb: The lateral join is executed in a more efficient manner (as part of the from clause).

答案2

得分: 1

以下是已翻译的内容:

你可以尝试这段代码。在这段代码中,没有使用CROSS JOIN LATERAL,而是使用相关子查询来实现相同的结果。

  1. WITH AggregatedSessions AS (
  2. SELECT
  3. left_table.*,
  4. (
  5. SELECT STRING_AGG(right_table.session_id, ',')
  6. FROM data AS right_table
  7. WHERE
  8. left_table.source_ip = right_table.source_ip
  9. AND (
  10. (right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
  11. OR
  12. (right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
  13. )
  14. GROUP BY right_table.source_ip
  15. ) AS session_ids
  16. FROM data AS left_table
  17. )
  18. SELECT * FROM AggregatedSessions;
英文:

You can try this code. In this code CROSS JOIN LATERAL is not used, instead correlated sub-queries are used to achieve the same result.

  1. WITH AggregatedSessions AS (
  2. SELECT
  3. left_table.*,
  4. (
  5. SELECT STRING_AGG(right_table.session_id, &#39;,&#39;)
  6. FROM data AS right_table
  7. WHERE
  8. left_table.source_ip = right_table.source_ip
  9. AND (
  10. (right_table.session_start_time &gt;= left_table.session_start_time AND right_table.session_start_time &lt;= left_table.session_end_time)
  11. OR
  12. (right_table.session_end_time &gt;= left_table.session_start_time AND right_table.session_end_time &lt;= left_table.session_end_time)
  13. )
  14. GROUP BY right_table.source_ip
  15. ) AS session_ids
  16. FROM data AS left_table
  17. )
  18. SELECT * FROM AggregatedSessions;

huangapple
  • 本文由 发表于 2023年8月10日 10:32:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76872310.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定