英文:
BigQuery Cross Join run like forever
问题
我想询问有关在BigQuery上执行的交叉连接。我执行交叉连接,其中两个表实际上非常庞大(假设为200万和100万),查询运行时间似乎永远都不会结束。是否有任何方式或替代方法来进行交叉连接?
英文:
I want to ask regarding Cross Join on BigQuery. I perform Cross Join where the 2 tables is actually huge (let's say 2Million and 1Million) the query is running like forever. Is there any way or alternative when it comes to cross join ?
答案1
得分: 1
如果第一个表有M行,第二个表有N行,结果将有M * N行。
在您的情况下,1M * 1M会非常庞大,查询将永远运行下去。
请查看下面的链接,其中定义了交叉连接以避免反模式:
https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
或者,您可以提供您正在处理的具体问题,这里的人们可能能够帮助您。
英文:
If there are M rows from the first and N rows from the second, the result is M * N rows.
In your case, 1M *1M would be huge and the query would run forever.
Please go through the below link which defines cross-join to avoid anti-patterns:
https://cloud.google.com/bigquery/docs/best-practices-performance-patterns
Or, you can write specific problem which you are working on and people here might be able to help you.
答案2
得分: 1
跨连接意味着服务器需要将表A的每一行与表B的每一行进行映射,以获取所有可能的数据组合。在您的情况下,这将导致服务器生成2百万 x 1百万 = 2×10^12
行!如果您确实需要类似于交叉连接的操作,请使用random()
函数对大表进行抽样,以获取合理大小的测试集,然后稍后在它们上执行交叉连接,以减少输出集。
英文:
Cross join means that server needs to map table A each row with table B each row to get all possible data combinations. In your case this results in 2 Million x 1 Million = 2×10¹²
rows for server to generate ! If you really need something like cross join - sample your big tables with random()
functions to get a reasonable small test sets and then perform cross join on them later, to reduce output set.
答案3
得分: 1
在需要将一个表中的每个元素与另一个表中的每个元素关联时,必须使用交叉连接。在这种情况下,如果您使用交叉连接,您将得到2万亿条记录的答案。
您可以在此页面中找到BigQuery中的所有查询类型及其用法。
尝试更好地描述您的问题,这样我可以帮助您找到一个可行的解决方案,因为对于这种情况,交叉连接并不是一个好的选择。
英文:
You must use cross join in cases where you need to associate each element from a table with each element from the other table. In this case, if you use a cross join you will get a 2 trillion records answer.
In this page you can find all the kinds of query in BigQuery and its usage.
Try to describe your problem better so I can help you to find a feasible solution since cross join is not a good possibility for this case.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论