这个PostgreSQL/PostGIS语句是否可以加速?

huangapple go评论67阅读模式
英文:

Is it possible to accelerate this postgress / postgis statement?

问题

以下是您的翻译:

我有一个带有Postgis扩展的Postgres数据库,其中填充有开放街道地图数据。

使用以下SQL语句:

SELECT                                                                
    l.osm_id,
    sum(
        st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
        /
        st_area(ST_Buffer(l.way, 30))
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;

我计算了一个“绿色”分数。

我的目标是为每个osm_id创建一个“绿色”分数。

这意味着,道路附近有多少水域、森林或类似的地物。

为了做到这一点:

我创建了一个30米的缓冲区,围绕每个路线,并计算了该缓冲路线与附近的任何绿色地物的交集。

我使用“绿色地物”来指代OpenStreetMap数据库中的多边形,如公园。

是否可能加速这个计算?

我做的一件事是创建了两个索引,希望加速计算:

CREATE INDEX way_index_2 on planet_osm_polygon USING gist(way) WHERE "natural" IN ('water','wood','forest','hill','valley');
CREATE INDEX way_index_3 on planet_osm_polygon USING gist(way) WHERE "landuse" IN ('forest');

这是对此语句的一个"explain":

EXPLAIN (ANALYZE, BUFFERS) SELECT                                                                
    l.osm_id,
    sum(
        st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
        /
        st_area(ST_Buffer(l.way, 30))
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id limit 1;

更新:以下是两个表的简要模式:

planet_osm_line

Column       | Type            | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------------+-----------------+-----------+----------+---------+----------+-------------+--------------+-------------
osm_id        | bigint          |           |          |         | plain    |             |              | 
access        | text            |           |          |         | extended |             |              | 
addr:housename| text            |           |          |         | extended |             |              | 
addr:housenumber| text          |           |          |         | extended |             |              | 
addr:interpolation| text        |           |          |         | extended |             |              | 
admin_level   | text            |           |          |         | extended |             |              | 
...
natural       | text            |           |          |         | extended |             |              | 

planet_osm_polygon

Column       | Type            | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------------+-----------------+-----------+----------+---------+----------+-------------+--------------+-------------
osm_id        | bigint          |           |          |         | plain    |             |              | 
access        | text            |           |          |         | extended |             |              | 
addr:housename| text            |           |          |         | extended |             |              | 
addr:housenumber| text          |           |          |         | extended |             |              | 
addr:interpolation| text        |           |          |         | extended |             |              | 
admin_level   | text            |           |          |         | extended |             |              | 
aerialway     | text            |           |          |         | extended |             |              | 
aeroway       | text            |           |          |         | extended |             |              | 
amenity       | text            |           |          |         | extended |             |              | 
area          | text            |           |          |         | extended |             |              | 
barrier       | text            |           |          |         | extended |             |              | 
landuse       | text            |           |          |         | extended |             |              | 
...

wood          | text            |           |          |         | extended |             |              | 
z_order       | integer         |           |          |         | plain    |             |              | 
way_area      | real            |           |          |         | plain    |             |              | 
way           | geometry(Geometry,3857)|    |          |         | main     |             |              | 
way_buffer_30 | geometry(Polygon)|           |          |         | external |             |              | 
way_buffer_30_area| numeric     |           |          | generated always as (st_area(way_buffer_30)) stored | main | | |
英文:

I have a Postgres database with a postgis extention installed and filles with open street map data.

With the following SQL statement :

SELECT                                                                
    l.osm_id,
    sum(
        st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
        /
        st_area(ST_Buffer(l.way, 30))
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;

I calculate a "green" score.

My goal is to create a "green" score for each osm_id.

Which means; how much of a road is near a water, forrest or something similar.

To do so:

I Create a 30 meter buffer around each way, and calculate the intersection between that buffered way and any green features nearby.

I am using 'green features' to refer to polygons in OpenStreetMap's database, such as a park.

Is it possbile to accelerate this calculation?

One thing I id is to create 2 indices in a hope to accelerate the calculation:

CREATE INDEX way_index_2 on planet_osm_polygon USING gist(way) WHERE "natural" IN ('water','wood','forest','hill','valley');
CREATE INDEX way_index_3 on planet_osm_polygon USING gist(way) WHERE "landuse" IN ('forest');

Here is an "explain" to this statement:

EXPLAIN (ANALYZE, BUFFERS) SELECT                                                                
    l.osm_id,
    sum(
        st_area(st_intersection(ST_Buffer(l.way, 30), p.way))
        /
        st_area(ST_Buffer(l.way, 30))
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, ST_Buffer(p.way,30))
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id limit 1;
                                                                                       QUERY PLAN                                                                                       
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=17816.83..133789235.22 rows=1 width=16) (actual time=1575643.737..1575651.862 rows=1 loops=1)
   Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
   ->  GroupAggregate  (cost=17816.83..1435958589062981.00 rows=10734420 width=16) (actual time=1575643.723..1575651.847 rows=1 loops=1)
         Group Key: l.osm_id
         Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
         ->  Nested Loop  (cost=17816.83..1433802261939271.50 rows=28652652777 width=448) (actual time=978502.788..1575648.857 rows=8 loops=1)
               Join Filter: st_intersects(l.way, st_buffer(p.way, '30'::double precision, ''::text))
               Rows Removed by Join Filter: 6528525
               Buffers: shared hit=816315 read=221878, temp read=313661 written=73938
               ->  Index Scan using osm_id_idx on planet_osm_line l  (cost=0.44..1242021.57 rows=22671610 width=247) (actual time=5.963..6.181 rows=6 loops=1)
                     Buffers: shared hit=5 read=3
               ->  Materialize  (cost=17816.39..1445364.98 rows=1263812 width=201) (actual time=85.181..4605.942 rows=1088089 loops=6)
                     Buffers: shared hit=380236 read=220348, temp read=313661 written=73938
                     ->  Gather  (cost=17816.39..1403253.92 rows=1263812 width=201) (actual time=510.609..1066.182 rows=1250378 loops=1)
                           Workers Planned: 4
                           Workers Launched: 4
                           Buffers: shared hit=380236 read=220348
                           ->  Parallel Bitmap Heap Scan on planet_osm_polygon p  (cost=16816.39..1275872.72 rows=315953 width=201) (actual time=447.168..9410.838 rows=250076 loops=5)
                                 Recheck Cond: (("natural" = ANY ('{water,wood,forest,hill,valley}'::text[])) OR (landuse = 'forest'::text))
                                 Rows Removed by Index Recheck: 2554266
                                 Filter: (("natural" = 'water'::text) OR (landuse = 'forest'::text))
                                 Rows Removed by Filter: 53217
                                 Heap Blocks: lossy=1
                                 Buffers: shared hit=380236 read=220348
                                 ->  BitmapOr  (cost=16816.39..16816.39 rows=1554297 width=0) (actual time=491.891..491.893 rows=0 loops=1)
                                       Buffers: shared hit=7797
                                       ->  Bitmap Index Scan on way_index_2  (cost=0.00..7822.79 rows=750359 width=0) (actual time=413.690..413.690 rows=737741 loops=1)
                                             Buffers: shared hit=3758
                                       ->  Bitmap Index Scan on way_index_3  (cost=0.00..8361.69 rows=803938 width=0) (actual time=78.198..78.198 rows=783702 loops=1)
                                             Buffers: shared hit=4039
 Planning Time: 0.315 ms
 Execution Time: 1575673.609 ms
(32 rows)

UPDATE:

Here is a short schema of the both tables;

                                                Table 

"public.planet_osm_line"
       Column       |           Type            | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------------------+---------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 osm_id             | bigint                    |           |          |         | plain    |             |              | 
 access             | text                      |           |          |         | extended |             |              | 
 addr:housename     | text                      |           |          |         | extended |             |              | 
 addr:housenumber   | text                      |           |          |         | extended |             |              | 
 addr:interpolation | text                      |           |          |         | extended |             |              | 
 admin_level        | text                      |           |          |         | extended |             |              | 
....
natural            | text                      |           |          |         | extended |             |              | 



Indexes:
    "highway_idx" btree (highway)
    "motorway_idx" gist (way) WHERE highway = 'motorway'::text
    "motorway_trunk_primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['motorway'::text, 'trunk'::text, 'primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
    "name_idx" btree (name)
    "osm_id_idx" btree (osm_id)
    "planet_osm_line_osm_id_idx" btree (osm_id)
    "planet_osm_line_way_idx" gist (way)
    "primary_idx" gist (way) WHERE highway = 'primary'::text
    "primary_secondary_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text])
    "primary_secondary_tertiary_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text, 'tertiary'::text])
    "primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
    "secondary_idx" gist (way) WHERE highway = 'secondary'::text
    "secondary_tertiary_idx" gist (way) WHERE highway = ANY (ARRAY['secondary'::text, 'tertiary'::text])
    "tertiary_idx" gist (way) WHERE highway = 'tertiary'::text
    "tertiary_secondary_idx" gist (way) WHERE highway = ANY (ARRAY['tertiary'::text, 'unclassified'::text])
    "trunk_idx" gist (way) WHERE highway = 'trunk'::text
    "trunk_primary_secondary_tertiary_unclassified_idx" gist (way) WHERE highway = ANY (ARRAY['trunk'::text, 'primary'::text, 'secondary'::text, 'tertiary'::text, 'unclassified'::text])
    "unclassified_idx" gist (way) WHERE highway = 'unclassified'::text
    "way_idx" gist (way)
    "way_index_1" gist (way)
    "way_index_4" gist (way) WHERE "natural" = ANY (ARRAY['water'::text, 'wood'::text, 'forest'::text, 'hill'::text, 'valley'::text])
    "way_index_5" gist (way) WHERE landuse = 'forest'::text
Triggers:
    planet_osm_line_osm2pgsql_valid BEFORE INSERT OR UPDATE ON planet_osm_line FOR EACH ROW EXECUTE FUNCTION planet_osm_line_osm2pgsql_valid()
Access method: heap

And:

                                                 Table "public.planet_osm_polygon"
       Column       |          Type           | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
--------------------+-------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 osm_id             | bigint                  |           |          |         | plain    |             |              | 
 access             | text                    |           |          |         | extended |             |              | 
 addr:housename     | text                    |           |          |         | extended |             |              | 
 addr:housenumber   | text                    |           |          |         | extended |             |              | 
 addr:interpolation | text                    |           |          |         | extended |             |              | 
 admin_level        | text                    |           |          |         | extended |             |              | 
 aerialway          | text                    |           |          |         | extended |             |              | 
 aeroway            | text                    |           |          |         | extended |             |              | 
 amenity            | text                    |           |          |         | extended |             |              | 
 area               | text                    |           |          |         | extended |             |              | 
 barrier            | text                    |           |          |         | extended |             |              | 
landuse            | text                    |           |          |         | extended |             |              | 


Indexes:
    "fuel_toilet_parking_restaurant_idex" gist (way) WHERE amenity = ANY (ARRAY['fuel'::text, 'toilets'::text, 'parking'::text, 'restaurant'::text, 'cafe'::text, 'pub'::text, 'ice_cream'::text, 'biergarten'::text])
    "planet_osm_polygon_osm_id_idx" btree (osm_id)
    "planet_osm_polygon_way_idx" gist (way)
    "viewpoint_attraction_guest_house_idex" gist (way) WHERE tourism = ANY (ARRAY['viewpoint '::text, 'attraction'::text, 'guest_house'::text])
    "way_index_2" gist (way) WHERE "natural" = ANY (ARRAY['water'::text, 'wood'::text, 'forest'::text, 'hill'::text, 'valley'::text])
    "way_index_3" gist (way) WHERE landuse = 'forest'::text
Triggers:
    planet_osm_polygon_osm2pgsql_valid BEFORE INSERT OR UPDATE ON planet_osm_polygon FOR EACH ROW EXECUTE FUNCTION planet_osm_polygon_osm2pgsql_valid()
Access method: heap

UPDATE:

planet_osm_polygon

wood               | text                    |           |          |                                                     | extended |             |              | 
 z_order            | integer                 |           |          |                                                     | plain    |             |              | 
 way_area           | real                    |           |          |                                                     | plain    |             |              | 
 way                | geometry(Geometry,3857) |           |          |                                                     | main     |             |              | 
 way_buffer_30      | geometry(Polygon)       |           |          |                                                     | external |             |              | 
 way_buffer_30_area | numeric                 |           |          | generated always as (st_area(way_buffer_30)) stored | main     |             |              | 

AND

planet_osm_line

way_area           | real                      |           |          |         | plain    |             |              | 
 way                | geometry(LineString,3857) |           |          |         | main     |             |              | 

答案1

得分: 3

以下是翻译好的部分:

  1. default_toast_compression=lz4 你正在处理多边形数据,这些数据很可能会被压缩和 TOASTed。默认的 default_toast_compression=pglz 通常比 lz4 。请注意,在更改此设置后,您需要强制重新写入那些表/列,否则它将仅从那一刻起作为默认值应用,不会影响以前的任何内容。

  2. storage external 在检索大型/复杂形状数据时,可以跳过一个步骤。根据您的 PostGIS 版本,这可能是值得的。

  3. st_buffer(way,30,'quad_segs=1') 您可以调整第三个参数以获得更简单的形状,这样更容易进行比较。默认值 quad_segs=8 可能导致缓冲区的顶点数量比输入多 8 倍。

  4. with (fillfactor=100) 我假设这是一个静态的源表,当发布新版本时,您会替换它,而不是自己维护当前表。因此,索引也可以是静态的(默认值 fillfactor=90 是为了考虑插入表的新行)。

  5. cluster 将表页面与索引对齐,使堆获取更快。

  6. 在您要使用的最终缓冲区形状上创建索引比在使用 ST_Buffer() 创建它的列上创建索引更好。对于 ST_Transform() 也是如此 - 在索引之前对列应用它,而不是在查询中应用。

  7. ST_Subdivide all the Things

  8. 您可以测试 ST_DWithin() 是否加速了与您当前的代码和上面建议的代码相比的连接,并确保使用合适的索引。请记住,距离单位取决于列的 SRID - 如果您使用的是公制单位,那么会从使用英制/海里/度的系统中获得异常结果。

    INNER JOIN planet_osm_polygon AS p ON ST_DWithin(l.way,p.way,30)
    
  9. 您可以检查您的部分索引是否值得,如果值得,看看如果将其缩小到此查询的目的是否性能更好。

    create index way_buffer_30_gix on planet_osm_polygon 
    using gist(way_buffer_30) with (fillfactor=100)
    where natural='water' or landuse='forest';
    
  10. 大多数上述建议只在这是一次性操作时或者如果您可以克隆目标表以便不必处理其他经常在该表上执行的操作的索引和设置时才有意义。您还可以使用 where 子句在这个阶段过滤掉您不感兴趣的内容。

  11. 如果速度真的很慢,可以从两个表中提取一部分行,并在这些行上运行测试。为了获得更可靠的结果,使用 pgbench 运行您的基准测试。

英文:

Not really an answer as much as a bunch of comments too long for a comment format:

alter system set default_toast_compression=lz4;

alter table planet_osm_polygon 
    add column way_buffer_30 geometry, 
    alter column way_buffer_30 set storage external,
    add column way_buffer_30_area numeric 
        generated always as (st_area(way_buffer_30)) stored;
update planet_osm_polygon 
    set way_buffer_30=st_buffer(way,30,'quad_segs=1');
create index osmp_way_buffer_30_gix on planet_osm_polygon 
    using gist(way_buffer_30) with (fillfactor=100);
cluster verbose planet_osm_polygon using osmp_way_buffer_30_gix;

alter table planet_osm_line
    add column way_buffer_30 geometry, 
    alter column way_buffer_30 set storage external,
    add column way_buffer_30_area numeric 
        generated always as (st_area(way_buffer_30)) stored; 
update planet_osm_line
    set way_buffer_30=st_buffer(way,30,'quad_segs=1');
create index osml_way_buffer_30_gix on planet_osm_line
    using gist(way_buffer_30) with (fillfactor=100);
cluster verbose planet_osm_polygon using osml_way_buffer_30_gix;

SELECT                                                                
    l.osm_id,
    sum(st_area(st_intersection(l.way_buffer_30, p.way))
        / l.way_buffer_30_area
    ) as green_fraction
FROM planet_osm_line AS l
INNER JOIN planet_osm_polygon AS p ON ST_Intersects(l.way, p.way_buffer_30)
WHERE p.natural in ('water') or p.landuse in ('forest') GROUP BY l.osm_id;
  1. default_toast_compression=lz4 You're working with polygons, which are likely to be compressed and TOASTed. Default default_toast_compression=pglz is typically slower than lz4. Note that you need to force a re-write of those tables/columns after altering the setting, otherwise it'll apply as a default only from that point onwards, not affecting anything retroactively.
  2. storage external skips one step in retrieving the data if the shapes are big/complex enough. Might be worth it depending on your PostGIS version.
  3. st_buffer(way,30,'quad_segs=1') you can tweak the third parameter to get a simpler shape, that's easier to compare against. Default quad_segs=8 can cause your buffer to have 8x more vertices than the input.
  4. with (fillfactor=100) I assume it's a static source table that you replace when a new version is published, rather than maintain the current one yourself. Therefore, indexes can be made static, too (default fillfactor=90 is meant to account for new rows inserted into the table).
  5. cluster aligns table pages with the index, making heap fetches faster.
  6. An index on the final buffer shape you want to use will be better than an index on a column you plan to use to create it using ST_Buffer(). Same goes for ST_Transform() - apply it on the column before indexing, not in the query.
  7. ST_Subdivide all the Things.
  8. You can test if ST_DWithin() speeds up the join compared to your current code and the one suggested above. Be sure to test with a suitable index. Keep in mind the unit of distance depends on the column SRID - if you're using metric you'll get anomalies from systems in imperial/nautical/degrees.
    INNER JOIN planet_osm_polygon AS p ON ST_DWithin(l.way,p.way,30)
    
  9. You can check if your partial index is at all worth it, and see if it performs better if you narrow it down for the purpose of this query
    create index way_buffer_30_gix on planet_osm_polygon 
    using gist(way_buffer_30) with (fillfactor=100)
    where natural='water' or landuse='forest';
    
  10. Most of the above only makes sense if that's a one-off operation or if you can clone the target tables so that you don't have to deal with other indexes and settings that are supposed to improve performance of operations that you do on that table more often. You can also filter out whatever you're not interested in at this stage, using that where clause.
  11. If it's really slow, extract a subset of rows from both tables and run your tests on those. To get more reliable results, use pgbench to run your benchmarks.

huangapple
  • 本文由 发表于 2023年1月9日 00:09:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/75049318.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定