英文:
Using Big Query and OSM to find road type closest to a coordinate
问题
假设我在BigQuery中有一个名为gps_points的表:
ID | lon | lat
并且使用OSM,对于gps_points表中的每个点,我想查看道路类型(例如高速公路)。
主要挑战在于gps_points中的点可能与OSM中的点不完全相同。
例如,运行此查询仅返回1个点:
with points as
(
select st_geogpoint(lon, lat) gpoint FROM `prj-test.gps_points` LIMIT 1000
)
select gpoint from points inner join `bigquery-public-data.geo_openstreetmap.planet_features` osm
ON ST_CONTAINS(geometry, gpoint)
WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) limit 1000
我在以下部分遇到了困难:
ON ST_CONTAINS(geometry, gpoint)
。我可以添加WHERE ST_Distance(geometry, gpoint) < 5
,例如,但这仅适用于我进行交叉连接。我的表很大,我不确定交叉连接是否是最佳答案。
是否有一种有效的方法来执行这种查找?
英文:
Suppose I have a table called gps_points in Big query:
ID | lon | lat
and using OSM, for each point in gps_points table, I want to see what is the road type (motorway...)
The main challenge is that the points in gps_points may not be exactly the same as those in OSM.
For example, running this query returns only 1 point:
with points as
(
select st_geogpoint(lon, lat) gpoint FROM `prj-test.gps_points` LIMIT 1000
)
select gpoint from points inner join `bigquery-public-data.geo_openstreetmap.planet_features` osm
ON ST_CONTAINS(geometry, gpoint)
WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) limit 1000
I am struggling in:
ON ST_CONTAINS(geometry, gpoint)
. I can add WHERE ST_Distance(geometry, gpoint) < 5
for example, but that only works if I do cross join. My tables are huge and I am not sure if cross join is the best answer.
Is there an efficient way to do this kind of lookup?
答案1
得分: 1
以下是翻译好的部分:
- BigQuery 中,
INNER JOIN ... ON ST_CONTAINS(geometry, gpoint)
,CROSS JOIN ... WHERE ST_Distance(geometry, gpoint) < 5
,以及(也是合法的)INNER JOIN ... ON ST_Distance(geometry, gpoint) < 5
之间没有太大区别。 - 查询规划器会以类似的方式对它们进行优化,因此您将获得在文档中描述的优化空间连接:
https://cloud.google.com/bigquery/docs/geospatial-data#using_joins_with_spatial_data - 此外,尽管该文档没有提到
ST_Distance(...) < ...
,但这个函数也能正常工作!查询规划器会自动将这样的条件重写为ST_DWithin(geometry, gpoint, 5)
。您会在执行详细信息中看到 CROSS JOIN ON,但连接仍然经过了优化。请查看以下链接,以了解它是否在特定情况下有效:
https://mentin.medium.com/execution-details-for-spatial-join-7274356e0115
英文:
For BigQuery there is not much difference between
INNER JOIN ... ON ST_CONTAINS(geometry, gpoint)
,
CROSS JOIN ... WHERE ST_Distance(geometry, gpoint) < 5
,
or (also legal)
INNER JOIN ... ON ST_Distance(geometry, gpoint) < 5
.
They are optimized similarly by the query planner, and you get optimized spatial join as described in the doc:
https://cloud.google.com/bigquery/docs/geospatial-data#using_joins_with_spatial_data
Also, even though that doc does not mention ST_Distance(...) < ...
, it works with this function too! Such condition is rewritten automatically as ST_DWithin(geometry, gpoint, 5)
by the query planner. You'll see CROSS JOIN ON in the Execution Details, but the join is still optimized. See https://mentin.medium.com/execution-details-for-spatial-join-7274356e0115 to check if it works in specific case.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论