在R中查找与其他多边形的质心一定距离内的多边形质心的子集。

huangapple go评论67阅读模式
英文:

Finding Subset of Polygon centroids Which Are Within A Certain Distance of the Centroids of Other Polygons in R

问题

我正在尝试计算牙买加行政单位形状文件中每个行政单位周围5公里内的摘要统计信息。我使用dnearneigh()函数来查找这些行政单位,但然后我不确定如何最好地处理输出的列表以计算摘要统计信息。我尝试使用空间子集,但这没有起作用,所以关于如何最好地执行这个操作的建议会很有用。

shp_centroid <- st_point_on_surface(sf_communities)

shp_centroid_coord <- st_coordinates(shp_centroid)

shp_dist <- dnearneigh(shp_centroid_coord, 0, 5000)
subset <- sf_communities[unlist(shp_dist),]

注意:如果您需要更详细的帮助或代码示例,请提供更多上下文。

英文:

I am attempting to calculate summary statistics for the the administrative units within 5km of every administrative in a shapefile of Jamaica. I use dnearneigh() to find which administrative units these are, but then I am not sure how best to work with the list I get as an output to calculate summary statistics. I tried using spatial subsetting but that has not worked, so advice on how best to carry out this operation would be useful.

shp_centroid &lt;- st_point_on_surface(sf_communities)

shp_centroid_coord &lt;- st_coordinates(shp_centroid)

shp_dist &lt;- dnearneigh(shp_centroid_coord,0,5000)
subset &lt;- sf_communities[unlist(shp_dist),]

答案1

得分: 1

以下是翻译好的部分:

"One option is to go with nested tibbles / data.frames where every administrative unit has its own tibble of neighbors + itself. Thanks to rowwise operations in dplyr, collecting summary statistics from such structures is quite convenient. There's always an option to use unnest() to lengthen that nested dataset and go with group_by()/summarise() instead."

"一种选择是使用嵌套的 tibbles / data.frames,其中每个行政单位都有自己的邻居 tibble + 自己。由于 dplyr 中的 rowwise 操作,从这种结构中收集摘要统计信息非常方便。始终可以选择使用 unnest() 来展开这个嵌套数据集,然后使用 group_by()/summarise()。"

"The following example uses nc dataset from sf package, distance threshold is set to 50km to better match those shape sizes. Instead of spdep package, it just uses sf functions, meaning that the list we'll work with (sgbp - sparse geometry binary predicate lists, standard sf stuff) might have a slightly different structure. Here the neighborhood is defined as an intersection of 50km buffer and county polygons, not county centroids as described in the Question title; so this might be another detail to change / review."

"以下示例使用 sf 包中的 nc 数据集,距离阈值设置为50公里,以更好地匹配这些形状大小。与 spdep 包不同,它只使用 sf 函数,这意味着我们将使用的列表(sgbp - 稀疏几何二进制谓词列表,标准 sf 东西)可能具有稍微不同的结构。这里的邻居被定义为50公里缓冲区和县多边形的交集,而不是问题标题中描述的县质心;因此,这可能是要更改/审查的另一个详细信息。"

"Prepare example, test and visualize subsetting for a single county."

"准备示例,测试并可视化单个县的子集。"

"Apply that same logic on a dataset"

"在数据集上应用相同的逻辑"

"Build a nested tibble"

"构建一个嵌套的 tibble"

"Working with nested tibbles"

"使用嵌套的 tibbles"

英文:

One option is to go with nested tibbles / data.frames where every administrative unit has its own tibble of neighbours + itself. Thanks to rowwise operations in dplyr, collecting summary statistics from such structures is quite convenient. There's always an option to use unnest() to lengthen that nested dataset and go with group_by()/summarise() instead.

The following example uses nc dataset from sf package, distance threshold is set to 50km to better match those shape sizes. Instead of spdep package, it just uses sf functions, meaning that the list we'll work with (sgbp - sparse geometry binary predicate lists, standard sf stuff) might have a slightly different structure. Here the neighbourhood is defined as an intersection of 50km buffer and county polygons, not county centroids as described in the Question title; so this might be another detail to change / review.

Prepare example, test and visualize subsetting for a single county.
library(sf)
#&gt; Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(dplyr, warn.conflicts = FALSE)
library(purrr)
library(tidyr)
library(ggplot2)

# example data from sf package examples
nc = read_sf(system.file(&quot;shape/nc.shp&quot;, package=&quot;sf&quot;)) %&gt;% 
  select(NAME, AREA, starts_with(&quot;BIR&quot;))
nc
#&gt; Simple feature collection with 100 features and 4 fields
#&gt; Geometry type: MULTIPOLYGON
#&gt; Dimension:     XY
#&gt; Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#&gt; Geodetic CRS:  NAD27
#&gt; # A tibble: 100 &#215; 5
#&gt;    NAME         AREA BIR74 BIR79                                        geometry
#&gt;    &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;                              &lt;MULTIPOLYGON [&#176;]&gt;
#&gt;  1 Ashe        0.114  1091  1364 (((-81.47276 36.23436, -81.54084 36.27251, -81…
#&gt;  2 Alleghany   0.061   487   542 (((-81.23989 36.36536, -81.24069 36.37942, -81…
#&gt;  3 Surry       0.143  3188  3616 (((-80.45634 36.24256, -80.47639 36.25473, -80…
#&gt;  4 Currituck   0.07    508   830 (((-76.00897 36.3196, -76.01735 36.33773, -76.…
#&gt;  5 Northampton 0.153  1421  1606 (((-77.21767 36.24098, -77.23461 36.2146, -77.…
#&gt;  6 Hertford    0.097  1452  1838 (((-76.74506 36.23392, -76.98069 36.23024, -76…
#&gt;  7 Camden      0.062   286   350 (((-76.00897 36.3196, -75.95718 36.19377, -75.…
#&gt;  8 Gates       0.091   420   594 (((-76.56251 36.34057, -76.60424 36.31498, -76…
#&gt;  9 Warren      0.118   968  1190 (((-78.30876 36.26004, -78.28293 36.29188, -78…
#&gt; 10 Stokes      0.124  1612  2038 (((-80.02567 36.25023, -80.45301 36.25709, -80…
#&gt; # ℹ 90 more rows

# test and visualize neighbourhood subsetting for a single county (Lee)
lee = tibble::lst(
  polygon = nc[nc$NAME == &quot;Lee&quot;,&quot;geometry&quot;],
  centroid = st_centroid(polygon),
  buffer = st_buffer(centroid, 50000),
  within = st_filter(nc, buffer)
)

ggplot() +
  geom_sf(data = nc) +
  geom_sf(data = lee$within, fill = &quot;green&quot;) +
  geom_sf(data = lee$polygon, fill = &quot;red&quot;) +
  geom_sf(data = lee$buffer, alpha = .6, fill = &quot;gold&quot;) +
  geom_sf(data = lee$centroid, size = 1) 

在R中查找与其他多边形的质心一定距离内的多边形质心的子集。<!-- -->

Apply that same logic on a dataset
# resulting within_dist type is sgbp, &quot;sparse geometry binary predicate lists&quot;,
# a list of sf object row indeces that intersect with the buffer  
nc &lt;- nc %&gt;% 
  mutate(within_dist = st_centroid(geometry) %&gt;% 
           st_buffer(50000) %&gt;% 
           st_intersects(geometry)
         , .before = &quot;geometry&quot;)
nc
#&gt; Simple feature collection with 100 features and 5 fields
#&gt; Geometry type: MULTIPOLYGON
#&gt; Dimension:     XY
#&gt; Bounding box:  xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#&gt; Geodetic CRS:  NAD27
#&gt; # A tibble: 100 &#215; 6
#&gt;    NAME         AREA BIR74 BIR79 within_dist                            geometry
#&gt;  * &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;sgbp[,100]&gt;                 &lt;MULTIPOLYGON [&#176;]&gt;
#&gt;  1 Ashe        0.114  1091  1364 &lt;int [7]&gt;    (((-81.47276 36.23436, -81.54084 …
#&gt;  2 Alleghany   0.061   487   542 &lt;int [7]&gt;    (((-81.23989 36.36536, -81.24069 …
#&gt;  3 Surry       0.143  3188  3616 &lt;int [9]&gt;    (((-80.45634 36.24256, -80.47639 …
#&gt;  4 Currituck   0.07    508   830 &lt;int [8]&gt;    (((-76.00897 36.3196, -76.01735 3…
#&gt;  5 Northampton 0.153  1421  1606 &lt;int [9]&gt;    (((-77.21767 36.24098, -77.23461 …
#&gt;  6 Hertford    0.097  1452  1838 &lt;int [10]&gt;   (((-76.74506 36.23392, -76.98069 …
#&gt;  7 Camden      0.062   286   350 &lt;int [11]&gt;   (((-76.00897 36.3196, -75.95718 3…
#&gt;  8 Gates       0.091   420   594 &lt;int [9]&gt;    (((-76.56251 36.34057, -76.60424 …
#&gt;  9 Warren      0.118   968  1190 &lt;int [8]&gt;    (((-78.30876 36.26004, -78.28293 …
#&gt; 10 Stokes      0.124  1612  2038 &lt;int [8]&gt;    (((-80.02567 36.25023, -80.45301 …
#&gt; # ℹ 90 more rows

# within_dist for Lee:
nc$within_dist[nc$NAME == &quot;Lee&quot;]
#&gt; [[1]]
#&gt;  [1] 27 29 30 37 47 48 54 60 63 67 82 86

# resolved to NAMEs :
nc$NAME[nc$within_dist[nc$NAME == &quot;Lee&quot;][[1]]]
#&gt;  [1] &quot;Alamance&quot;   &quot;Orange&quot;     &quot;Durham&quot;     &quot;Wake&quot;       &quot;Randolph&quot;  
#&gt;  [6] &quot;Chatham&quot;    &quot;Johnston&quot;   &quot;Lee&quot;        &quot;Harnett&quot;    &quot;Moore&quot;     
#&gt; [11] &quot;Cumberland&quot; &quot;Hoke&quot;

Build a nested tibble

# ignore geometries for now for more compact output
nc_df &lt;- st_drop_geometry(nc)

# build a nested tibble, each county row gets a tibble of neighbours (nb),
# use rowwise grouping to subset nc_df with within_dist of current row
nc_nested &lt;- nc_df %&gt;% 
  rowwise() %&gt;%
  mutate(
    nb_idx = paste(within_dist, collapse = &quot;,&quot;),
    nb_names = paste(nc_df[[&quot;NAME&quot;]][within_dist], collapse = &quot;,&quot;),
    nb = (nc_df[within_dist, ]) %&gt;% select(-within_dist) %&gt;% list()) %&gt;% 
  ungroup()
nc_nested
#&gt; # A tibble: 100 &#215; 8
#&gt;    NAME         AREA BIR74 BIR79 within_dist  nb_idx           nb_names nb      
#&gt;    &lt;chr&gt;       &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;sgbp[,100]&gt; &lt;chr&gt;            &lt;chr&gt;    &lt;list&gt;  
#&gt;  1 Ashe        0.114  1091  1364 &lt;int [7]&gt;    1,2,3,18,19,22,… Ashe,Al… &lt;tibble&gt;
#&gt;  2 Alleghany   0.061   487   542 &lt;int [7]&gt;    1,2,3,18,19,23,… Ashe,Al… &lt;tibble&gt;
#&gt;  3 Surry       0.143  3188  3616 &lt;int [9]&gt;    1,2,3,10,18,23,… Ashe,Al… &lt;tibble&gt;
#&gt;  4 Currituck   0.07    508   830 &lt;int [8]&gt;    4,7,8,17,20,21,… Curritu… &lt;tibble&gt;
#&gt;  5 Northampton 0.153  1421  1606 &lt;int [9]&gt;    5,6,8,9,16,28,3… Northam… &lt;tibble&gt;
#&gt;  6 Hertford    0.097  1452  1838 &lt;int [10]&gt;   5,6,7,8,16,17,2… Northam… &lt;tibble&gt;
#&gt;  7 Camden      0.062   286   350 &lt;int [11]&gt;   4,6,7,8,17,20,2… Curritu… &lt;tibble&gt;
#&gt;  8 Gates       0.091   420   594 &lt;int [9]&gt;    4,5,6,7,8,17,20… Curritu… &lt;tibble&gt;
#&gt;  9 Warren      0.118   968  1190 &lt;int [8]&gt;    5,9,13,15,16,24… Northam… &lt;tibble&gt;
#&gt; 10 Stokes      0.124  1612  2038 &lt;int [8]&gt;    3,10,12,23,25,2… Surry,S… &lt;tibble&gt;
#&gt; # ℹ 90 more rows
Working with nested tibbles
# we can now calculate summary statistics by accessing nested tibbles in 
# nb column with purrr::map*() or lapply();
# or though rowwise grouping:

nc_nested %&gt;% 
  select(NAME, nb_names, nb) %&gt;% 
  rowwise() %&gt;% 
  mutate(nb_bir74_mean = mean(nb$BIR74),
         nb_bir74_sum  =  sum(nb$BIR74),
         nb_area_sum   =  sum(nb$AREA))
#&gt; # A tibble: 100 &#215; 6
#&gt; # Rowwise: 
#&gt;    NAME        nb_names          nb       nb_bir74_mean nb_bir74_sum nb_area_sum
#&gt;    &lt;chr&gt;       &lt;chr&gt;             &lt;list&gt;           &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt;
#&gt;  1 Ashe        Ashe,Alleghany,S… &lt;tibble&gt;         1946.        13625       0.784
#&gt;  2 Alleghany   Ashe,Alleghany,S… &lt;tibble&gt;         2092.        14643       0.839
#&gt;  3 Surry       Ashe,Alleghany,S… &lt;tibble&gt;         3111.        27997       1.06 
#&gt;  4 Currituck   Currituck,Camden… &lt;tibble&gt;          607          4856       0.576
#&gt;  5 Northampton Northampton,Hert… &lt;tibble&gt;         2047.        18420       1.22 
#&gt;  6 Hertford    Northampton,Hert… &lt;tibble&gt;         1293.        12933       1.05 
#&gt;  7 Camden      Currituck,Hertfo… &lt;tibble&gt;          784.         8622       0.953
#&gt;  8 Gates       Currituck,Northa… &lt;tibble&gt;          920.         8284       0.813
#&gt;  9 Warren      Northampton,Warr… &lt;tibble&gt;         2366.        18925       1.08 
#&gt; 10 Stokes      Surry,Stokes,Roc… &lt;tibble&gt;         5660.        45276       0.998
#&gt; # ℹ 90 more rows


# or we could unnest nb column to lenghten our dataset ... 
nc_unnested &lt;- nc_nested %&gt;% 
  unnest(nb, names_sep = &quot;.&quot;) %&gt;% 
  select(-(within_dist:nb_names)) 
print(nc_unnested, n = 14)
#&gt; # A tibble: 1,004 &#215; 8
#&gt;    NAME       AREA BIR74 BIR79 nb.NAME   nb.AREA nb.BIR74 nb.BIR79
#&gt;    &lt;chr&gt;     &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;       &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
#&gt;  1 Ashe      0.114  1091  1364 Ashe        0.114     1091     1364
#&gt;  2 Ashe      0.114  1091  1364 Alleghany   0.061      487      542
#&gt;  3 Ashe      0.114  1091  1364 Surry       0.143     3188     3616
#&gt;  4 Ashe      0.114  1091  1364 Wilkes      0.199     3146     3725
#&gt;  5 Ashe      0.114  1091  1364 Watauga     0.081     1323     1775
#&gt;  6 Ashe      0.114  1091  1364 Avery       0.064      781      977
#&gt;  7 Ashe      0.114  1091  1364 Caldwell    0.122     3609     4249
#&gt;  8 Alleghany 0.061   487   542 Ashe        0.114     1091     1364
#&gt;  9 Alleghany 0.061   487   542 Alleghany   0.061      487      542
#&gt; 10 Alleghany 0.061   487   542 Surry       0.143     3188     3616
#&gt; 11 Alleghany 0.061   487   542 Wilkes      0.199     3146     3725
#&gt; 12 Alleghany 0.061   487   542 Watauga     0.081     1323     1775
#&gt; 13 Alleghany 0.061   487   542 Yadkin      0.086     1269     1568
#&gt; 14 Alleghany 0.061   487   542 Iredell     0.155     4139     5400
#&gt; # ℹ 990 more rows

# ... and use group_by() / summarise() or just summarise(..., .by):
nc_unnested %&gt;% 
  summarise(nb_bir74_mean = mean(nb.BIR74),
            nb_bir74_sum  =  sum(nb.BIR74),
            nb_area_sum   =  sum(nb.AREA), .by = NAME)
#&gt; # A tibble: 100 &#215; 4
#&gt;    NAME        nb_bir74_mean nb_bir74_sum nb_area_sum
#&gt;    &lt;chr&gt;               &lt;dbl&gt;        &lt;dbl&gt;       &lt;dbl&gt;
#&gt;  1 Ashe                1946.        13625       0.784
#&gt;  2 Alleghany           2092.        14643       0.839
#&gt;  3 Surry               3111.        27997       1.06 
#&gt;  4 Currituck            607          4856       0.576
#&gt;  5 Northampton         2047.        18420       1.22 
#&gt;  6 Hertford            1293.        12933       1.05 
#&gt;  7 Camden               784.         8622       0.953
#&gt;  8 Gates                920.         8284       0.813
#&gt;  9 Warren              2366.        18925       1.08 
#&gt; 10 Stokes              5660.        45276       0.998
#&gt; # ℹ 90 more rows

<sup>Created on 2023-08-01 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月31日 23:43:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76805181.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定