如何将交叉连接横向语句转换为BigQuery?

huangapple go评论115阅读模式
英文:

How to convert cross join lateral statement to big query?

问题

我的SQL语句是在PostgreSQL中使用交叉连接lateral编写的。但我想在Google BigQuery中执行此SQL语句,而BigQuery不支持交叉连接lateral。我该如何重写这个SQL以在BigQuery引擎中运行?

SELECT
  left_table.*,
  x.*
FROM
  data left_table
JOIN (
  SELECT
    right_table.source_ip,
    STRING_AGG(right_table.session_id, ',') AS session_ids
  FROM
    data right_table
  WHERE
    left_table.source_ip = right_table.source_ip
    AND (
      (right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
      OR
      (right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
    )
  GROUP BY
    right_table.source_ip
) AS x
ON
  true;

请注意,我已经将交叉连接lateral替换为BigQuery支持的JOIN子查询。这应该可以在BigQuery中正常运行。

英文:

My sql statement is written using cross join lateral in postgres sql. But i want to execute this sql statement in google bigquery and bigquery doesn't support cross join lateral. How can i rewrite this sql for bigquery engine?

select 
left_table.*,x.*
from 
	data left_table
cross join lateral 
(

select string_agg(right_table.session_id,&#39;,&#39;)
from 
data right_table 
where left_table.source_ip = right_table.source_ip 
and 
	((right_table.session_start_time &gt;= left_table.session_start_time and right_table.session_start_time &lt;= left_table.session_end_time)
	or 
	(right_table.session_end_time &gt;= left_table.session_start_time and right_table.session_end_time &lt;= left_table.session_end_time))
group by right_table.source_ip
) x

答案1

得分: 1

没有侧连接,"等效"的内容在选择子句中是一个"相关子查询",通常这会影响性能,因为相关子查询会逐行执行结果集:

SELECT
      left_table.*
	, (SELECT string_agg(right_table.session_id, ',') 
       FROM data right_table
       WHERE left_table.source_ip = right_table.source_ip
	   AND (
			(
				right_table.session_start_time >= left_table.session_start_time
				AND right_table.session_start_time <= left_table.session_end_time
				)
			OR (
				right_table.session_end_time >= left_table.session_start_time
				AND right_table.session_end_time <= left_table.session_end_time
				)
			)
       GROUP BY right_table.source_ip)  x
FROM data left_table

备注:侧连接在FROM子句中以更高效的方式执行。

英文:

Without lateral joins the "equivalent" is a "correlated subquery" in the select clause, and these are often a performance issue as the correlated subquery is executed row by row though the resultset:

SELECT
      left_table.*
	, (SELECT string_agg(right_table.session_id, &#39;,&#39;) 
       FROM data right_table
       WHERE left_table.source_ip = right_table.source_ip
	   AND (
			(
				right_table.session_start_time &gt;= left_table.session_start_time
				AND right_table.session_start_time &lt;= left_table.session_end_time
				)
			OR (
				right_table.session_end_time &gt;= left_table.session_start_time
				AND right_table.session_end_time &lt;= left_table.session_end_time
				)
			)
       GROUP BY right_table.source_ip)  x
FROM data left_table

nb: The lateral join is executed in a more efficient manner (as part of the from clause).

答案2

得分: 1

以下是已翻译的内容:

你可以尝试这段代码。在这段代码中,没有使用CROSS JOIN LATERAL,而是使用相关子查询来实现相同的结果。

WITH AggregatedSessions AS (
  SELECT
    left_table.*,
    (
      SELECT STRING_AGG(right_table.session_id, ',')
      FROM data AS right_table
      WHERE
        left_table.source_ip = right_table.source_ip
        AND (
          (right_table.session_start_time >= left_table.session_start_time AND right_table.session_start_time <= left_table.session_end_time)
          OR
          (right_table.session_end_time >= left_table.session_start_time AND right_table.session_end_time <= left_table.session_end_time)
        )
      GROUP BY right_table.source_ip
    ) AS session_ids
  FROM data AS left_table
)
SELECT * FROM AggregatedSessions;
英文:

You can try this code. In this code CROSS JOIN LATERAL is not used, instead correlated sub-queries are used to achieve the same result.

WITH AggregatedSessions AS (
      SELECT
        left_table.*,
        (
          SELECT STRING_AGG(right_table.session_id, &#39;,&#39;)
          FROM data AS right_table
          WHERE
            left_table.source_ip = right_table.source_ip
            AND (
              (right_table.session_start_time &gt;= left_table.session_start_time AND right_table.session_start_time &lt;= left_table.session_end_time)
              OR
              (right_table.session_end_time &gt;= left_table.session_start_time AND right_table.session_end_time &lt;= left_table.session_end_time)
            )
          GROUP BY right_table.source_ip
        ) AS session_ids
      FROM data AS left_table
    )
    SELECT * FROM AggregatedSessions;

huangapple
  • 本文由 发表于 2023年8月10日 10:32:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76872310.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定