Using Big Query and OSM to find road type closest to a coordinate

huangapple go评论54阅读模式
英文:

Using Big Query and OSM to find road type closest to a coordinate

问题

假设我在BigQuery中有一个名为gps_points的表:

ID | lon | lat

并且使用OSM,对于gps_points表中的每个点,我想查看道路类型(例如高速公路)。

主要挑战在于gps_points中的点可能与OSM中的点不完全相同。

例如,运行此查询仅返回1个点:

with points as
(
  select st_geogpoint(lon, lat) gpoint FROM `prj-test.gps_points` LIMIT 1000
)

select gpoint from points inner join `bigquery-public-data.geo_openstreetmap.planet_features` osm 

ON ST_CONTAINS(geometry, gpoint) 

WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) limit 1000

我在以下部分遇到了困难:

ON ST_CONTAINS(geometry, gpoint)。我可以添加WHERE ST_Distance(geometry, gpoint) < 5,例如,但这仅适用于我进行交叉连接。我的表很大,我不确定交叉连接是否是最佳答案。

是否有一种有效的方法来执行这种查找?

英文:

Suppose I have a table called gps_points in Big query:

ID | lon | lat

and using OSM, for each point in gps_points table, I want to see what is the road type (motorway...)

The main challenge is that the points in gps_points may not be exactly the same as those in OSM.

For example, running this query returns only 1 point:

with points as 
(
  select  st_geogpoint(lon, lat) gpoint FROM `prj-test.gps_points` LIMIT 1000
)

select gpoint from points inner join `bigquery-public-data.geo_openstreetmap.planet_features` osm 

ON ST_CONTAINS(geometry, gpoint) 

WHERE 'highway' IN (SELECT key FROM UNNEST(all_tags)) limit 1000

I am struggling in:

ON ST_CONTAINS(geometry, gpoint). I can add WHERE ST_Distance(geometry, gpoint) < 5 for example, but that only works if I do cross join. My tables are huge and I am not sure if cross join is the best answer.

Is there an efficient way to do this kind of lookup?

答案1

得分: 1

以下是翻译好的部分:

  • BigQuery 中,INNER JOIN ... ON ST_CONTAINS(geometry, gpoint)CROSS JOIN ... WHERE ST_Distance(geometry, gpoint) < 5,以及(也是合法的)INNER JOIN ... ON ST_Distance(geometry, gpoint) < 5 之间没有太大区别。
  • 查询规划器会以类似的方式对它们进行优化,因此您将获得在文档中描述的优化空间连接:
    https://cloud.google.com/bigquery/docs/geospatial-data#using_joins_with_spatial_data
  • 此外,尽管该文档没有提到 ST_Distance(...) < ...,但这个函数也能正常工作!查询规划器会自动将这样的条件重写为 ST_DWithin(geometry, gpoint, 5)。您会在执行详细信息中看到 CROSS JOIN ON,但连接仍然经过了优化。请查看以下链接,以了解它是否在特定情况下有效:
    https://mentin.medium.com/execution-details-for-spatial-join-7274356e0115
英文:

For BigQuery there is not much difference between

INNER JOIN ... ON ST_CONTAINS(geometry, gpoint),

CROSS JOIN ... WHERE ST_Distance(geometry, gpoint) < 5,

or (also legal)

INNER JOIN ... ON ST_Distance(geometry, gpoint) < 5.

They are optimized similarly by the query planner, and you get optimized spatial join as described in the doc:
https://cloud.google.com/bigquery/docs/geospatial-data#using_joins_with_spatial_data

Also, even though that doc does not mention ST_Distance(...) < ..., it works with this function too! Such condition is rewritten automatically as ST_DWithin(geometry, gpoint, 5) by the query planner. You'll see CROSS JOIN ON in the Execution Details, but the join is still optimized. See https://mentin.medium.com/execution-details-for-spatial-join-7274356e0115 to check if it works in specific case.

huangapple
  • 本文由 发表于 2023年3月7日 04:52:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/75655730.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定