从R中的FeatureCollection中提取坐标数据到csv中

huangapple go评论151阅读模式
英文:

Extracting coordinate data from a FeatureCollection in csv within R

问题

I've got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:

  1. {"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}

There are 5,000 rows of data. What I'm trying to do is extract out the LineString data to use within R, but I'm getting stuck. Can anyone help please?

I've tried converting to JSON and then unnesting, but it comes up with an error (code adapted from other answers using Google Earth Engine):

  1. new_df <- df %>%
  2. mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
  3. as.data.frame() %>%
  4. unnest(geo) %>%
  5. filter(geo != "FeatureCollection") %>%
  6. mutate(coord = rep(c("x", "y"))) %>%
  7. pivot_wider(names_from = coord, values_from = coordinates)
  8. Error in `mutate()`:
  9. In argument: `coord = rep(c("x", "y"))`.
  10. Caused by error:
  11. ! `coord` must be size 5000 or 1, not 2.
  12. Run `rlang::last_trace()` to see where the error occurred.

Expecting an sf geometry column of LineString coordinates.

英文:

I've got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:

  1. {&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}

There are 5,000 rows of data. What I'm trying to do is extract out the LineString data to use within R, but I'm getting stuck. Can anyone help please?

I've tried converting to JSON and then unnesting, but comes up with an error (code adapted from other answers using Google Earth Engine):

  1. new_df &lt;- df %&gt;%
  2. mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %&gt;%
  3. as.data.frame() %&gt;%
  4. unnest(geo) %&gt;%
  5. filter(geo != &quot;FeatureCollection&quot;) %&gt;%
  6. mutate(coord = rep(c(&quot;x&quot;, &quot;y&quot;))) %&gt;%
  7. pivot_wider(names_from = coord, values_from = coordinates)
  8. Error in `mutate()`:
  9. In argument: `coord = rep(c(&quot;x&quot;, &quot;y&quot;))`.
  10. Caused by error:
  11. ! `coord` must be size 5000 or 1, not 2.
  12. Run `rlang::last_trace()` to see where the error occurred.

Expecting a sf geometry column of LineString coordinates.

答案1

得分: 2

以下是您提供的代码的翻译部分:

"由于我们正在处理GeoJSON字符串,可以使用 sf::st_read() 或者使用 gejsonsf::geojson_sfc() 来解析它,以提高性能(使用 geojson_sfc() 作为 st_read() 的替代品时,性能提升约 2 倍,使用逐行的 st_read() 与矢量化的 geojson_sfc() 相比,性能提升约 100 倍)。

逐行分组以逐行访问数据,仅保留 LINESTRING 几何对象(假设每个 FeatureCollection 中只有一个 LINESTRING 几何对象,就像提供的示例中一样)。

  1. library(dplyr)
  2. library(sf)
  3. #&gt; 链接到 GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1;sf_use_s2() 为 TRUE
  4. library(geojsonsf)
  5. json_str &lt;- '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}'
  6. # 100 行测试样本
  7. df_100 &lt;- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
  8. df_100
  9. #&gt; # A tibble: 100 &#215; 2
  10. #&gt; journey_id journeyroute
  11. #&gt; &lt;int&gt; &lt;chr&gt;
  12. #&gt; 1 1 '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Fe…
  13. #&gt; 2 2 '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Fe…
  14. #&gt; 3 3 '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Fe…
  15. microbenchmark::microbenchmark(
  16. sf = {
  17. # 使用 sf / GEOS 解析 GeoJSON 字符串
  18. routes_sf &lt;- df_100 %&gt;%
  19. rowwise() %&gt;%
  20. mutate(geometry = st_read(journeyroute, quiet = TRUE) %&gt;%
  21. st_geometry() %&gt;%
  22. `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;%
  23. ungroup() %&gt;%
  24. st_as_sf()
  25. },
  26. geojson_sf = {
  27. # 使用 geojsonsf 解析 GeoJSON 字符串
  28. routes_gj &lt;- df_100 %&gt;%
  29. rowwise() %&gt;%
  30. mutate(geometry = geojson_sfc(journeyroute) %&gt;%
  31. `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;%
  32. ungroup() %&gt;%
  33. st_as_sf()
  34. }
  35. )

性能基准测试结果和生成的 sf 对象:

  1. #&gt; 单位:毫秒
  2. #&gt; expr min lq mean median uq max neval cld
  3. #&gt; sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495 100 a
  4. #&gt; geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461 100 b
  5. routes_sf
  6. #&gt; 包含 100 个要素和 1 个字段的简单要素集合
  7. #&gt; 几何类型: LINESTRING
  8. #&gt; 维度: XY
  9. #&gt; 边界框: xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
  10. #&gt; 大地测量坐标系: WGS 84
  11. #&gt; # A tibble: 100 &#215; 2
  12. #&gt; journey_id geometry
  13. #&gt; &lt;int&gt; &lt;LINESTRING [&#176;]&gt;
  14. #&gt; 1 1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  15. #&gt; 2 2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  16. #&gt; 3 3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  17. #&gt; 4 4 (-4.095772
  18. <details>
  19. <summary>英文:</summary>
  20. As we are dealing with GeoJSON string, it can be parsed with `sf::st_read()` or perhaps with `gejsonsf::geojson_sfc()` for some performance boost (~ 2x when using geojson_sfc() as a drop-in for st_read(), ~ 100x when comparing rowwsie `st_read()` to vectorized `geojson_sfc()`).
  21. Rowwise grouping to access one row at a time; keeping only `LINESTRING` geometries (presumably one per FeatureCollection, as in provided sample).
  22. ``` r
  23. library(dplyr)
  24. library(sf)
  25. #&gt; Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
  26. library(geojsonsf)
  27. json_str &lt;- &#39;{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}&#39;
  28. # 100-row test sample
  29. df_100 &lt;- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
  30. df_100
  31. #&gt; # A tibble: 100 &#215; 2
  32. #&gt; journey_id journeyroute
  33. #&gt; &lt;int&gt; &lt;chr&gt;
  34. #&gt; 1 1 &quot;{\&quot;type\&quot;: \&quot;FeatureCollection\&quot;, \&quot;features\&quot;: [{\&quot;type\&quot;: \&quot;Fe…
  35. #&gt; 2 2 &quot;{\&quot;type\&quot;: \&quot;FeatureCollection\&quot;, \&quot;features\&quot;: [{\&quot;type\&quot;: \&quot;Fe…
  36. #&gt; 3 3 &quot;{\&quot;type\&quot;: \&quot;FeatureCollection\&quot;, \&quot;features\&quot;: [{\&quot;type\&quot;: \&quot;Fe…
  37. #&gt; ...
  38. microbenchmark::microbenchmark(
  39. sf = {
  40. # parse GeoJSON strings with sf / GEOS
  41. routes_sf &lt;- df_100 %&gt;%
  42. rowwise() %&gt;%
  43. mutate(geometry = st_read(journeyroute, quiet = TRUE) %&gt;%
  44. st_geometry() %&gt;%
  45. `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;%
  46. ungroup() %&gt;%
  47. st_as_sf()
  48. },
  49. geojson_sf = {
  50. # parse GeoJSON strings with geojsonsf
  51. routes_gj &lt;- df_100 %&gt;%
  52. rowwise() %&gt;%
  53. mutate(geometry = geojson_sfc(journeyroute) %&gt;%
  54. `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;%
  55. ungroup() %&gt;%
  56. st_as_sf()
  57. }
  58. )

Benchmark results and resulting sf object:

  1. #&gt; Unit: milliseconds
  2. #&gt; expr min lq mean median uq max neval cld
  3. #&gt; sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495 100 a
  4. #&gt; geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461 100 b
  5. routes_sf
  6. #&gt; Simple feature collection with 100 features and 1 field
  7. #&gt; Geometry type: LINESTRING
  8. #&gt; Dimension: XY
  9. #&gt; Bounding box: xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
  10. #&gt; Geodetic CRS: WGS 84
  11. #&gt; # A tibble: 100 &#215; 2
  12. #&gt; journey_id geometry
  13. #&gt; &lt;int&gt; &lt;LINESTRING [&#176;]&gt;
  14. #&gt; 1 1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  15. #&gt; 2 2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  16. #&gt; 3 3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  17. #&gt; 4 4 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  18. #&gt; 5 5 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  19. #&gt; 6 6 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  20. #&gt; 7 7 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  21. #&gt; 8 8 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  22. #&gt; 9 9 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  23. #&gt; 10 10 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
  24. #&gt; # ℹ 90 more rows

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

答案2

得分: 1

  1. library(geojsonsf) 可以读取一个GeoJSON向量,所以不需要进行任何逐行操作
  2. - 创建一些数据
  3. ```r
  4. json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
  5. df <- data.frame(json = rep(json, 3))
  • 转换为 sf 对象
  1. sf <- geojsonsf::geojson_sf(df$json)
  • 可以根据需要进行其他操作
  1. ## 删除空几何对象
  2. sf <- sf[ !sf::st_is_empty(sf), ]
  3. ## 提取只有LINESTRING的对象
  4. sf <- sf[sf::st_geometry_type(sf) == "LINESTRING", ]
  5. ## 转换为长格式的数据框
  6. df <- sfheaders::sf_to_df(sf = sf, fill = TRUE)
英文:

library(geojsonsf) can read a vector of geojson, so no need for any row-wise operations

  • Create some data
  1. json &lt;- &#39;{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}&#39;
  2. df &lt;- data.frame(json = rep(json, 3))
  • convert to sf object
  1. sf &lt;- geojsonsf::geojson_sf(df$json)
  • do any other operations you may want with the data
  1. ## Remove empty geometries
  2. sf &lt;- sf[ !sf::st_is_empty(sf), ]
  3. ## Extract just the LINESTRINGS
  4. sf &lt;- sf[sf::st_geometry_type(sf) == &quot;LINESTRING&quot;, ]
  5. ##&#160;Convert to a long data.frame
  6. df &lt;- sfheaders::sf_to_df(sf = sf, fill = TRUE)

huangapple
  • 本文由 发表于 2023年8月4日 22:20:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76836786.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定