英文:
Extracting coordinate data from a FeatureCollection in csv within R
问题
I've got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:
{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}
There are 5,000 rows of data. What I'm trying to do is extract out the LineString data to use within R, but I'm getting stuck. Can anyone help please?
I've tried converting to JSON and then unnesting, but it comes up with an error (code adapted from other answers using Google Earth Engine):
new_df <- df %>%
mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
as.data.frame() %>%
unnest(geo) %>%
filter(geo != "FeatureCollection") %>%
mutate(coord = rep(c("x", "y"))) %>%
pivot_wider(names_from = coord, values_from = coordinates)
Error in `mutate()`:
ℹ In argument: `coord = rep(c("x", "y"))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.
Expecting an sf geometry column of LineString coordinates.
英文:
I've got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:
{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}
There are 5,000 rows of data. What I'm trying to do is extract out the LineString data to use within R, but I'm getting stuck. Can anyone help please?
I've tried converting to JSON and then unnesting, but comes up with an error (code adapted from other answers using Google Earth Engine):
new_df <- df %>%
mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
as.data.frame() %>%
unnest(geo) %>%
filter(geo != "FeatureCollection") %>%
mutate(coord = rep(c("x", "y"))) %>%
pivot_wider(names_from = coord, values_from = coordinates)
Error in `mutate()`:
ℹ In argument: `coord = rep(c("x", "y"))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.
Expecting a sf geometry column of LineString coordinates.
答案1
得分: 2
以下是您提供的代码的翻译部分:
"由于我们正在处理GeoJSON字符串,可以使用 sf::st_read()
或者使用 gejsonsf::geojson_sfc()
来解析它,以提高性能(使用 geojson_sfc()
作为 st_read()
的替代品时,性能提升约 2 倍,使用逐行的 st_read()
与矢量化的 geojson_sfc()
相比,性能提升约 100 倍)。
逐行分组以逐行访问数据,仅保留 LINESTRING
几何对象(假设每个 FeatureCollection 中只有一个 LINESTRING
几何对象,就像提供的示例中一样)。
library(dplyr)
library(sf)
#> 链接到 GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1;sf_use_s2() 为 TRUE
library(geojsonsf)
json_str <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
# 100 行测试样本
df_100 <- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
df_100
#> # A tibble: 100 × 2
#> journey_id journeyroute
#> <int> <chr>
#> 1 1 '{"type": "FeatureCollection", "features": [{"type": "Fe…
#> 2 2 '{"type": "FeatureCollection", "features": [{"type": "Fe…
#> 3 3 '{"type": "FeatureCollection", "features": [{"type": "Fe…
microbenchmark::microbenchmark(
sf = {
# 使用 sf / GEOS 解析 GeoJSON 字符串
routes_sf <- df_100 %>%
rowwise() %>%
mutate(geometry = st_read(journeyroute, quiet = TRUE) %>%
st_geometry() %>%
`[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>%
ungroup() %>%
st_as_sf()
},
geojson_sf = {
# 使用 geojsonsf 解析 GeoJSON 字符串
routes_gj <- df_100 %>%
rowwise() %>%
mutate(geometry = geojson_sfc(journeyroute) %>%
`[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>%
ungroup() %>%
st_as_sf()
}
)
性能基准测试结果和生成的 sf
对象:
#> 单位:毫秒
#> expr min lq mean median uq max neval cld
#> sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495 100 a
#> geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461 100 b
routes_sf
#> 包含 100 个要素和 1 个字段的简单要素集合
#> 几何类型: LINESTRING
#> 维度: XY
#> 边界框: xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
#> 大地测量坐标系: WGS 84
#> # A tibble: 100 × 2
#> journey_id geometry
#> <int> <LINESTRING [°]>
#> 1 1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 2 2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 3 3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 4 4 (-4.095772
<details>
<summary>英文:</summary>
As we are dealing with GeoJSON string, it can be parsed with `sf::st_read()` or perhaps with `gejsonsf::geojson_sfc()` for some performance boost (~ 2x when using geojson_sfc() as a drop-in for st_read(), ~ 100x when comparing rowwsie `st_read()` to vectorized `geojson_sfc()`).
Rowwise grouping to access one row at a time; keeping only `LINESTRING` geometries (presumably one per FeatureCollection, as in provided sample).
``` r
library(dplyr)
library(sf)
#> Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(geojsonsf)
json_str <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
# 100-row test sample
df_100 <- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
df_100
#> # A tibble: 100 × 2
#> journey_id journeyroute
#> <int> <chr>
#> 1 1 "{\"type\": \"FeatureCollection\", \"features\": [{\"type\": \"Fe…
#> 2 2 "{\"type\": \"FeatureCollection\", \"features\": [{\"type\": \"Fe…
#> 3 3 "{\"type\": \"FeatureCollection\", \"features\": [{\"type\": \"Fe…
#> ...
microbenchmark::microbenchmark(
sf = {
# parse GeoJSON strings with sf / GEOS
routes_sf <- df_100 %>%
rowwise() %>%
mutate(geometry = st_read(journeyroute, quiet = TRUE) %>%
st_geometry() %>%
`[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>%
ungroup() %>%
st_as_sf()
},
geojson_sf = {
# parse GeoJSON strings with geojsonsf
routes_gj <- df_100 %>%
rowwise() %>%
mutate(geometry = geojson_sfc(journeyroute) %>%
`[`(st_geometry_type(.) == "LINESTRING"), .keep = "unused") %>%
ungroup() %>%
st_as_sf()
}
)
Benchmark results and resulting sf
object:
#> Unit: milliseconds
#> expr min lq mean median uq max neval cld
#> sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495 100 a
#> geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461 100 b
routes_sf
#> Simple feature collection with 100 features and 1 field
#> Geometry type: LINESTRING
#> Dimension: XY
#> Bounding box: xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
#> Geodetic CRS: WGS 84
#> # A tibble: 100 × 2
#> journey_id geometry
#> <int> <LINESTRING [°]>
#> 1 1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 2 2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 3 3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 4 4 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 5 5 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 6 6 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 7 7 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 8 8 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 9 9 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> 10 10 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#> # ℹ 90 more rows
<sup>Created on 2023-08-04 with reprex v2.0.2</sup>
答案2
得分: 1
library(geojsonsf) 可以读取一个GeoJSON向量,所以不需要进行任何逐行操作
- 创建一些数据
```r
json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
df <- data.frame(json = rep(json, 3))
- 转换为
sf
对象
sf <- geojsonsf::geojson_sf(df$json)
- 可以根据需要进行其他操作
## 删除空几何对象
sf <- sf[ !sf::st_is_empty(sf), ]
## 提取只有LINESTRING的对象
sf <- sf[sf::st_geometry_type(sf) == "LINESTRING", ]
## 转换为长格式的数据框
df <- sfheaders::sf_to_df(sf = sf, fill = TRUE)
英文:
library(geojsonsf)
can read a vector of geojson, so no need for any row-wise operations
- Create some data
json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
df <- data.frame(json = rep(json, 3))
- convert to
sf
object
sf <- geojsonsf::geojson_sf(df$json)
- do any other operations you may want with the data
## Remove empty geometries
sf <- sf[ !sf::st_is_empty(sf), ]
## Extract just the LINESTRINGS
sf <- sf[sf::st_geometry_type(sf) == "LINESTRING", ]
## Convert to a long data.frame
df <- sfheaders::sf_to_df(sf = sf, fill = TRUE)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论