在R中查找与其他多边形的质心一定距离内的多边形质心的子集。

huangapple go评论105阅读模式
英文:

Finding Subset of Polygon centroids Which Are Within A Certain Distance of the Centroids of Other Polygons in R

问题

我正在尝试计算牙买加行政单位形状文件中每个行政单位周围5公里内的摘要统计信息。我使用dnearneigh()函数来查找这些行政单位,但然后我不确定如何最好地处理输出的列表以计算摘要统计信息。我尝试使用空间子集,但这没有起作用,所以关于如何最好地执行这个操作的建议会很有用。

  1. shp_centroid <- st_point_on_surface(sf_communities)
  2. shp_centroid_coord <- st_coordinates(shp_centroid)
  3. shp_dist <- dnearneigh(shp_centroid_coord, 0, 5000)
  4. subset <- sf_communities[unlist(shp_dist),]

注意:如果您需要更详细的帮助或代码示例,请提供更多上下文。

英文:

I am attempting to calculate summary statistics for the the administrative units within 5km of every administrative in a shapefile of Jamaica. I use dnearneigh() to find which administrative units these are, but then I am not sure how best to work with the list I get as an output to calculate summary statistics. I tried using spatial subsetting but that has not worked, so advice on how best to carry out this operation would be useful.

  1. shp_centroid &lt;- st_point_on_surface(sf_communities)
  2. shp_centroid_coord &lt;- st_coordinates(shp_centroid)
  3. shp_dist &lt;- dnearneigh(shp_centroid_coord,0,5000)
  4. subset &lt;- sf_communities[unlist(shp_dist),]

答案1

得分: 1

以下是翻译好的部分:

"One option is to go with nested tibbles / data.frames where every administrative unit has its own tibble of neighbors + itself. Thanks to rowwise operations in dplyr, collecting summary statistics from such structures is quite convenient. There's always an option to use unnest() to lengthen that nested dataset and go with group_by()/summarise() instead."

"一种选择是使用嵌套的 tibbles / data.frames,其中每个行政单位都有自己的邻居 tibble + 自己。由于 dplyr 中的 rowwise 操作,从这种结构中收集摘要统计信息非常方便。始终可以选择使用 unnest() 来展开这个嵌套数据集,然后使用 group_by()/summarise()。"

"The following example uses nc dataset from sf package, distance threshold is set to 50km to better match those shape sizes. Instead of spdep package, it just uses sf functions, meaning that the list we'll work with (sgbp - sparse geometry binary predicate lists, standard sf stuff) might have a slightly different structure. Here the neighborhood is defined as an intersection of 50km buffer and county polygons, not county centroids as described in the Question title; so this might be another detail to change / review."

"以下示例使用 sf 包中的 nc 数据集,距离阈值设置为50公里,以更好地匹配这些形状大小。与 spdep 包不同,它只使用 sf 函数,这意味着我们将使用的列表(sgbp - 稀疏几何二进制谓词列表,标准 sf 东西)可能具有稍微不同的结构。这里的邻居被定义为50公里缓冲区和县多边形的交集,而不是问题标题中描述的县质心;因此,这可能是要更改/审查的另一个详细信息。"

"Prepare example, test and visualize subsetting for a single county."

"准备示例,测试并可视化单个县的子集。"

"Apply that same logic on a dataset"

"在数据集上应用相同的逻辑"

"Build a nested tibble"

"构建一个嵌套的 tibble"

"Working with nested tibbles"

"使用嵌套的 tibbles"

英文:

One option is to go with nested tibbles / data.frames where every administrative unit has its own tibble of neighbours + itself. Thanks to rowwise operations in dplyr, collecting summary statistics from such structures is quite convenient. There's always an option to use unnest() to lengthen that nested dataset and go with group_by()/summarise() instead.

The following example uses nc dataset from sf package, distance threshold is set to 50km to better match those shape sizes. Instead of spdep package, it just uses sf functions, meaning that the list we'll work with (sgbp - sparse geometry binary predicate lists, standard sf stuff) might have a slightly different structure. Here the neighbourhood is defined as an intersection of 50km buffer and county polygons, not county centroids as described in the Question title; so this might be another detail to change / review.

Prepare example, test and visualize subsetting for a single county.
  1. library(sf)
  2. #&gt; Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
  3. library(dplyr, warn.conflicts = FALSE)
  4. library(purrr)
  5. library(tidyr)
  6. library(ggplot2)
  7. # example data from sf package examples
  8. nc = read_sf(system.file(&quot;shape/nc.shp&quot;, package=&quot;sf&quot;)) %&gt;%
  9. select(NAME, AREA, starts_with(&quot;BIR&quot;))
  10. nc
  11. #&gt; Simple feature collection with 100 features and 4 fields
  12. #&gt; Geometry type: MULTIPOLYGON
  13. #&gt; Dimension: XY
  14. #&gt; Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
  15. #&gt; Geodetic CRS: NAD27
  16. #&gt; # A tibble: 100 &#215; 5
  17. #&gt; NAME AREA BIR74 BIR79 geometry
  18. #&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;MULTIPOLYGON [&#176;]&gt;
  19. #&gt; 1 Ashe 0.114 1091 1364 (((-81.47276 36.23436, -81.54084 36.27251, -81…
  20. #&gt; 2 Alleghany 0.061 487 542 (((-81.23989 36.36536, -81.24069 36.37942, -81…
  21. #&gt; 3 Surry 0.143 3188 3616 (((-80.45634 36.24256, -80.47639 36.25473, -80…
  22. #&gt; 4 Currituck 0.07 508 830 (((-76.00897 36.3196, -76.01735 36.33773, -76.…
  23. #&gt; 5 Northampton 0.153 1421 1606 (((-77.21767 36.24098, -77.23461 36.2146, -77.…
  24. #&gt; 6 Hertford 0.097 1452 1838 (((-76.74506 36.23392, -76.98069 36.23024, -76…
  25. #&gt; 7 Camden 0.062 286 350 (((-76.00897 36.3196, -75.95718 36.19377, -75.…
  26. #&gt; 8 Gates 0.091 420 594 (((-76.56251 36.34057, -76.60424 36.31498, -76…
  27. #&gt; 9 Warren 0.118 968 1190 (((-78.30876 36.26004, -78.28293 36.29188, -78…
  28. #&gt; 10 Stokes 0.124 1612 2038 (((-80.02567 36.25023, -80.45301 36.25709, -80…
  29. #&gt; # ℹ 90 more rows
  30. # test and visualize neighbourhood subsetting for a single county (Lee)
  31. lee = tibble::lst(
  32. polygon = nc[nc$NAME == &quot;Lee&quot;,&quot;geometry&quot;],
  33. centroid = st_centroid(polygon),
  34. buffer = st_buffer(centroid, 50000),
  35. within = st_filter(nc, buffer)
  36. )
  37. ggplot() +
  38. geom_sf(data = nc) +
  39. geom_sf(data = lee$within, fill = &quot;green&quot;) +
  40. geom_sf(data = lee$polygon, fill = &quot;red&quot;) +
  41. geom_sf(data = lee$buffer, alpha = .6, fill = &quot;gold&quot;) +
  42. geom_sf(data = lee$centroid, size = 1)

在R中查找与其他多边形的质心一定距离内的多边形质心的子集。<!-- -->

Apply that same logic on a dataset
  1. # resulting within_dist type is sgbp, &quot;sparse geometry binary predicate lists&quot;,
  2. # a list of sf object row indeces that intersect with the buffer
  3. nc &lt;- nc %&gt;%
  4. mutate(within_dist = st_centroid(geometry) %&gt;%
  5. st_buffer(50000) %&gt;%
  6. st_intersects(geometry)
  7. , .before = &quot;geometry&quot;)
  8. nc
  9. #&gt; Simple feature collection with 100 features and 5 fields
  10. #&gt; Geometry type: MULTIPOLYGON
  11. #&gt; Dimension: XY
  12. #&gt; Bounding box: xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
  13. #&gt; Geodetic CRS: NAD27
  14. #&gt; # A tibble: 100 &#215; 6
  15. #&gt; NAME AREA BIR74 BIR79 within_dist geometry
  16. #&gt; * &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;sgbp[,100]&gt; &lt;MULTIPOLYGON [&#176;]&gt;
  17. #&gt; 1 Ashe 0.114 1091 1364 &lt;int [7]&gt; (((-81.47276 36.23436, -81.54084 …
  18. #&gt; 2 Alleghany 0.061 487 542 &lt;int [7]&gt; (((-81.23989 36.36536, -81.24069 …
  19. #&gt; 3 Surry 0.143 3188 3616 &lt;int [9]&gt; (((-80.45634 36.24256, -80.47639 …
  20. #&gt; 4 Currituck 0.07 508 830 &lt;int [8]&gt; (((-76.00897 36.3196, -76.01735 3…
  21. #&gt; 5 Northampton 0.153 1421 1606 &lt;int [9]&gt; (((-77.21767 36.24098, -77.23461 …
  22. #&gt; 6 Hertford 0.097 1452 1838 &lt;int [10]&gt; (((-76.74506 36.23392, -76.98069 …
  23. #&gt; 7 Camden 0.062 286 350 &lt;int [11]&gt; (((-76.00897 36.3196, -75.95718 3…
  24. #&gt; 8 Gates 0.091 420 594 &lt;int [9]&gt; (((-76.56251 36.34057, -76.60424 …
  25. #&gt; 9 Warren 0.118 968 1190 &lt;int [8]&gt; (((-78.30876 36.26004, -78.28293 …
  26. #&gt; 10 Stokes 0.124 1612 2038 &lt;int [8]&gt; (((-80.02567 36.25023, -80.45301 …
  27. #&gt; # ℹ 90 more rows
  28. # within_dist for Lee:
  29. nc$within_dist[nc$NAME == &quot;Lee&quot;]
  30. #&gt; [[1]]
  31. #&gt; [1] 27 29 30 37 47 48 54 60 63 67 82 86
  32. # resolved to NAMEs :
  33. nc$NAME[nc$within_dist[nc$NAME == &quot;Lee&quot;][[1]]]
  34. #&gt; [1] &quot;Alamance&quot; &quot;Orange&quot; &quot;Durham&quot; &quot;Wake&quot; &quot;Randolph&quot;
  35. #&gt; [6] &quot;Chatham&quot; &quot;Johnston&quot; &quot;Lee&quot; &quot;Harnett&quot; &quot;Moore&quot;
  36. #&gt; [11] &quot;Cumberland&quot; &quot;Hoke&quot;

Build a nested tibble

  1. # ignore geometries for now for more compact output
  2. nc_df &lt;- st_drop_geometry(nc)
  3. # build a nested tibble, each county row gets a tibble of neighbours (nb),
  4. # use rowwise grouping to subset nc_df with within_dist of current row
  5. nc_nested &lt;- nc_df %&gt;%
  6. rowwise() %&gt;%
  7. mutate(
  8. nb_idx = paste(within_dist, collapse = &quot;,&quot;),
  9. nb_names = paste(nc_df[[&quot;NAME&quot;]][within_dist], collapse = &quot;,&quot;),
  10. nb = (nc_df[within_dist, ]) %&gt;% select(-within_dist) %&gt;% list()) %&gt;%
  11. ungroup()
  12. nc_nested
  13. #&gt; # A tibble: 100 &#215; 8
  14. #&gt; NAME AREA BIR74 BIR79 within_dist nb_idx nb_names nb
  15. #&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;sgbp[,100]&gt; &lt;chr&gt; &lt;chr&gt; &lt;list&gt;
  16. #&gt; 1 Ashe 0.114 1091 1364 &lt;int [7]&gt; 1,2,3,18,19,22,… Ashe,Al… &lt;tibble&gt;
  17. #&gt; 2 Alleghany 0.061 487 542 &lt;int [7]&gt; 1,2,3,18,19,23,… Ashe,Al… &lt;tibble&gt;
  18. #&gt; 3 Surry 0.143 3188 3616 &lt;int [9]&gt; 1,2,3,10,18,23,… Ashe,Al… &lt;tibble&gt;
  19. #&gt; 4 Currituck 0.07 508 830 &lt;int [8]&gt; 4,7,8,17,20,21,… Curritu… &lt;tibble&gt;
  20. #&gt; 5 Northampton 0.153 1421 1606 &lt;int [9]&gt; 5,6,8,9,16,28,3… Northam… &lt;tibble&gt;
  21. #&gt; 6 Hertford 0.097 1452 1838 &lt;int [10]&gt; 5,6,7,8,16,17,2… Northam… &lt;tibble&gt;
  22. #&gt; 7 Camden 0.062 286 350 &lt;int [11]&gt; 4,6,7,8,17,20,2… Curritu… &lt;tibble&gt;
  23. #&gt; 8 Gates 0.091 420 594 &lt;int [9]&gt; 4,5,6,7,8,17,20… Curritu… &lt;tibble&gt;
  24. #&gt; 9 Warren 0.118 968 1190 &lt;int [8]&gt; 5,9,13,15,16,24… Northam… &lt;tibble&gt;
  25. #&gt; 10 Stokes 0.124 1612 2038 &lt;int [8]&gt; 3,10,12,23,25,2… Surry,S… &lt;tibble&gt;
  26. #&gt; # ℹ 90 more rows
Working with nested tibbles
  1. # we can now calculate summary statistics by accessing nested tibbles in
  2. # nb column with purrr::map*() or lapply();
  3. # or though rowwise grouping:
  4. nc_nested %&gt;%
  5. select(NAME, nb_names, nb) %&gt;%
  6. rowwise() %&gt;%
  7. mutate(nb_bir74_mean = mean(nb$BIR74),
  8. nb_bir74_sum = sum(nb$BIR74),
  9. nb_area_sum = sum(nb$AREA))
  10. #&gt; # A tibble: 100 &#215; 6
  11. #&gt; # Rowwise:
  12. #&gt; NAME nb_names nb nb_bir74_mean nb_bir74_sum nb_area_sum
  13. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;list&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  14. #&gt; 1 Ashe Ashe,Alleghany,S… &lt;tibble&gt; 1946. 13625 0.784
  15. #&gt; 2 Alleghany Ashe,Alleghany,S… &lt;tibble&gt; 2092. 14643 0.839
  16. #&gt; 3 Surry Ashe,Alleghany,S… &lt;tibble&gt; 3111. 27997 1.06
  17. #&gt; 4 Currituck Currituck,Camden… &lt;tibble&gt; 607 4856 0.576
  18. #&gt; 5 Northampton Northampton,Hert… &lt;tibble&gt; 2047. 18420 1.22
  19. #&gt; 6 Hertford Northampton,Hert… &lt;tibble&gt; 1293. 12933 1.05
  20. #&gt; 7 Camden Currituck,Hertfo… &lt;tibble&gt; 784. 8622 0.953
  21. #&gt; 8 Gates Currituck,Northa… &lt;tibble&gt; 920. 8284 0.813
  22. #&gt; 9 Warren Northampton,Warr… &lt;tibble&gt; 2366. 18925 1.08
  23. #&gt; 10 Stokes Surry,Stokes,Roc… &lt;tibble&gt; 5660. 45276 0.998
  24. #&gt; # ℹ 90 more rows
  25. # or we could unnest nb column to lenghten our dataset ...
  26. nc_unnested &lt;- nc_nested %&gt;%
  27. unnest(nb, names_sep = &quot;.&quot;) %&gt;%
  28. select(-(within_dist:nb_names))
  29. print(nc_unnested, n = 14)
  30. #&gt; # A tibble: 1,004 &#215; 8
  31. #&gt; NAME AREA BIR74 BIR79 nb.NAME nb.AREA nb.BIR74 nb.BIR79
  32. #&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  33. #&gt; 1 Ashe 0.114 1091 1364 Ashe 0.114 1091 1364
  34. #&gt; 2 Ashe 0.114 1091 1364 Alleghany 0.061 487 542
  35. #&gt; 3 Ashe 0.114 1091 1364 Surry 0.143 3188 3616
  36. #&gt; 4 Ashe 0.114 1091 1364 Wilkes 0.199 3146 3725
  37. #&gt; 5 Ashe 0.114 1091 1364 Watauga 0.081 1323 1775
  38. #&gt; 6 Ashe 0.114 1091 1364 Avery 0.064 781 977
  39. #&gt; 7 Ashe 0.114 1091 1364 Caldwell 0.122 3609 4249
  40. #&gt; 8 Alleghany 0.061 487 542 Ashe 0.114 1091 1364
  41. #&gt; 9 Alleghany 0.061 487 542 Alleghany 0.061 487 542
  42. #&gt; 10 Alleghany 0.061 487 542 Surry 0.143 3188 3616
  43. #&gt; 11 Alleghany 0.061 487 542 Wilkes 0.199 3146 3725
  44. #&gt; 12 Alleghany 0.061 487 542 Watauga 0.081 1323 1775
  45. #&gt; 13 Alleghany 0.061 487 542 Yadkin 0.086 1269 1568
  46. #&gt; 14 Alleghany 0.061 487 542 Iredell 0.155 4139 5400
  47. #&gt; # ℹ 990 more rows
  48. # ... and use group_by() / summarise() or just summarise(..., .by):
  49. nc_unnested %&gt;%
  50. summarise(nb_bir74_mean = mean(nb.BIR74),
  51. nb_bir74_sum = sum(nb.BIR74),
  52. nb_area_sum = sum(nb.AREA), .by = NAME)
  53. #&gt; # A tibble: 100 &#215; 4
  54. #&gt; NAME nb_bir74_mean nb_bir74_sum nb_area_sum
  55. #&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  56. #&gt; 1 Ashe 1946. 13625 0.784
  57. #&gt; 2 Alleghany 2092. 14643 0.839
  58. #&gt; 3 Surry 3111. 27997 1.06
  59. #&gt; 4 Currituck 607 4856 0.576
  60. #&gt; 5 Northampton 2047. 18420 1.22
  61. #&gt; 6 Hertford 1293. 12933 1.05
  62. #&gt; 7 Camden 784. 8622 0.953
  63. #&gt; 8 Gates 920. 8284 0.813
  64. #&gt; 9 Warren 2366. 18925 1.08
  65. #&gt; 10 Stokes 5660. 45276 0.998
  66. #&gt; # ℹ 90 more rows

<sup>Created on 2023-08-01 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年7月31日 23:43:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76805181.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定