BigQuery的Cross Join运行时间似乎无限长。

huangapple go评论74阅读模式
英文:

BigQuery Cross Join run like forever

问题

我想询问有关在BigQuery上执行的交叉连接。我执行交叉连接,其中两个表实际上非常庞大(假设为200万和100万),查询运行时间似乎永远都不会结束。是否有任何方式或替代方法来进行交叉连接?

英文:

I want to ask regarding Cross Join on BigQuery. I perform Cross Join where the 2 tables is actually huge (let's say 2Million and 1Million) the query is running like forever. Is there any way or alternative when it comes to cross join ?

答案1

得分: 1

如果第一个表有M行,第二个表有N行,结果将有M * N行。
在您的情况下,1M * 1M会非常庞大,查询将永远运行下去。
请查看下面的链接,其中定义了交叉连接以避免反模式:
https://cloud.google.com/bigquery/docs/best-practices-performance-patterns

或者,您可以提供您正在处理的具体问题,这里的人们可能能够帮助您。

英文:

If there are M rows from the first and N rows from the second, the result is M * N rows.
In your case, 1M *1M would be huge and the query would run forever.
Please go through the below link which defines cross-join to avoid anti-patterns:
https://cloud.google.com/bigquery/docs/best-practices-performance-patterns

Or, you can write specific problem which you are working on and people here might be able to help you.

答案2

得分: 1

跨连接意味着服务器需要将表A的每一行与表B的每一行进行映射,以获取所有可能的数据组合。在您的情况下,这将导致服务器生成2百万 x 1百万 = 2×10^12行!如果您确实需要类似于交叉连接的操作,请使用random()函数对大表进行抽样,以获取合理大小的测试集,然后稍后在它们上执行交叉连接,以减少输出集。

英文:

Cross join means that server needs to map table A each row with table B each row to get all possible data combinations. In your case this results in 2 Million x 1 Million = 2×10¹² rows for server to generate ! If you really need something like cross join - sample your big tables with random() functions to get a reasonable small test sets and then perform cross join on them later, to reduce output set.

答案3

得分: 1

在需要将一个表中的每个元素与另一个表中的每个元素关联时,必须使用交叉连接。在这种情况下,如果您使用交叉连接,您将得到2万亿条记录的答案。

您可以在此页面中找到BigQuery中的所有查询类型及其用法。

尝试更好地描述您的问题,这样我可以帮助您找到一个可行的解决方案,因为对于这种情况,交叉连接并不是一个好的选择。

英文:

You must use cross join in cases where you need to associate each element from a table with each element from the other table. In this case, if you use a cross join you will get a 2 trillion records answer.
In this page you can find all the kinds of query in BigQuery and its usage.

Try to describe your problem better so I can help you to find a feasible solution since cross join is not a good possibility for this case.

huangapple
  • 本文由 发表于 2020年1月3日 14:57:58
  • 转载请务必保留本文链接:https://go.coder-hub.com/59574410.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定