2023年8月4日 22:20:00go评论151阅读模式

英文:

Extracting coordinate data from a FeatureCollection in csv within R

问题

I've got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:

{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}

There are 5,000 rows of data. What I'm trying to do is extract out the LineString data to use within R, but I'm getting stuck. Can anyone help please?

I've tried converting to JSON and then unnesting, but it comes up with an error (code adapted from other answers using Google Earth Engine):

new_df <- df %>%
    mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %>%
    as.data.frame() %>%
    unnest(geo) %>%
    filter(geo != "FeatureCollection") %>%
    mutate(coord = rep(c("x", "y"))) %>%
    pivot_wider(names_from = coord, values_from = coordinates)
Error in `mutate()`:
ℹ In argument: `coord = rep(c("x", "y"))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.

Expecting an sf geometry column of LineString coordinates.

英文:

I've got data currently within csv, with a column called "journeyroute." This column has the following data [truncated due to size]:

{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}

There are 5,000 rows of data. What I'm trying to do is extract out the LineString data to use within R, but I'm getting stuck. Can anyone help please?

I've tried converting to JSON and then unnesting, but comes up with an error (code adapted from other answers using Google Earth Engine):

new_df &lt;- df %&gt;%
    mutate(geo = map(Journey.Route, ~ jsonlite::fromJSON(.))) %&gt;%
    as.data.frame() %&gt;%
    unnest(geo) %&gt;%
    filter(geo != &quot;FeatureCollection&quot;) %&gt;%
    mutate(coord = rep(c(&quot;x&quot;, &quot;y&quot;))) %&gt;%
    pivot_wider(names_from = coord, values_from = coordinates)
Error in `mutate()`:
ℹ In argument: `coord = rep(c(&quot;x&quot;, &quot;y&quot;))`.
Caused by error:
! `coord` must be size 5000 or 1, not 2.
Run `rlang::last_trace()` to see where the error occurred.

Expecting a sf geometry column of LineString coordinates.

答案1

得分: 2

以下是您提供的代码的翻译部分：

"由于我们正在处理GeoJSON字符串，可以使用 sf::st_read() 或者使用 gejsonsf::geojson_sfc() 来解析它，以提高性能（使用 geojson_sfc() 作为 st_read() 的替代品时，性能提升约 2 倍，使用逐行的 st_read() 与矢量化的 geojson_sfc() 相比，性能提升约 100 倍）。

逐行分组以逐行访问数据，仅保留 LINESTRING 几何对象（假设每个 FeatureCollection 中只有一个 LINESTRING 几何对象，就像提供的示例中一样）。

library(dplyr)
library(sf)
#&gt; 链接到 GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1；sf_use_s2() 为 TRUE
library(geojsonsf)
json_str &lt;- '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}'
# 100 行测试样本
df_100 &lt;- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
df_100
#&gt; # A tibble: 100 &#215; 2
#&gt;    journey_id journeyroute                                                      
#&gt;         &lt;int&gt; &lt;chr&gt;                                                             
#&gt;  1          1 '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Fe…
#&gt;  2          2 '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Fe…
#&gt;  3          3 '{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Fe…
microbenchmark::microbenchmark(
  sf = {
    # 使用 sf / GEOS 解析 GeoJSON 字符串
    routes_sf &lt;- df_100 %&gt;% 
      rowwise() %&gt;% 
      mutate(geometry = st_read(journeyroute, quiet = TRUE) %&gt;% 
                        st_geometry() %&gt;% 
                        `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;% 
      ungroup() %&gt;% 
      st_as_sf()
  },
  geojson_sf = {
    # 使用 geojsonsf 解析 GeoJSON 字符串
    routes_gj &lt;- df_100 %&gt;% 
      rowwise() %&gt;% 
      mutate(geometry = geojson_sfc(journeyroute) %&gt;% 
                        `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;% 
      ungroup() %&gt;% 
      st_as_sf()
  }
)

性能基准测试结果和生成的 sf 对象：

#&gt; 单位：毫秒
#&gt;        expr      min       lq     mean   median       uq      max neval cld
#&gt;          sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495   100  a 
#&gt;  geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461   100   b
routes_sf
#&gt; 包含 100 个要素和 1 个字段的简单要素集合
#&gt; 几何类型： LINESTRING
#&gt; 维度： XY
#&gt; 边界框： xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
#&gt; 大地测量坐标系： WGS 84
#&gt; # A tibble: 100 &#215; 2
#&gt;    journey_id                                                           geometry
#&gt;         &lt;int&gt;                                                   &lt;LINESTRING [&#176;]&gt;
#&gt;  1          1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  2          2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  3          3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  4          4 (-4.095772 
<details>
<summary>英文:</summary>
As we are dealing with GeoJSON string, it can be parsed with `sf::st_read()` or perhaps with `gejsonsf::geojson_sfc()` for some performance boost (~ 2x when using geojson_sfc() as a drop-in for st_read(), ~ 100x when comparing rowwsie `st_read()` to vectorized `geojson_sfc()`). 
Rowwise grouping to access one row at a time; keeping only `LINESTRING` geometries (presumably one per FeatureCollection, as in provided sample).
``` r
library(dplyr)
library(sf)
#&gt; Linking to GEOS 3.9.3, GDAL 3.5.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(geojsonsf)
json_str &lt;- &#39;{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}&#39;
# 100-row test sample
df_100 &lt;- tibble(journey_id = 1:100, journeyroute = rep(json_str, 100))
df_100
#&gt; # A tibble: 100 &#215; 2
#&gt;    journey_id journeyroute                                                      
#&gt;         &lt;int&gt; &lt;chr&gt;                                                             
#&gt;  1          1 &quot;{\&quot;type\&quot;: \&quot;FeatureCollection\&quot;, \&quot;features\&quot;: [{\&quot;type\&quot;: \&quot;Fe…
#&gt;  2          2 &quot;{\&quot;type\&quot;: \&quot;FeatureCollection\&quot;, \&quot;features\&quot;: [{\&quot;type\&quot;: \&quot;Fe…
#&gt;  3          3 &quot;{\&quot;type\&quot;: \&quot;FeatureCollection\&quot;, \&quot;features\&quot;: [{\&quot;type\&quot;: \&quot;Fe…
#&gt; ...
microbenchmark::microbenchmark(
  sf = {
    # parse GeoJSON strings with sf / GEOS
    routes_sf &lt;- df_100 %&gt;% 
      rowwise() %&gt;% 
      mutate(geometry = st_read(journeyroute, quiet = TRUE) %&gt;% 
                        st_geometry() %&gt;% 
                        `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;% 
      ungroup() %&gt;% 
      st_as_sf()
  },
  geojson_sf = {
    # parse GeoJSON strings with geojsonsf
    routes_gj &lt;- df_100 %&gt;% 
      rowwise() %&gt;% 
      mutate(geometry = geojson_sfc(journeyroute) %&gt;% 
                        `[`(st_geometry_type(.) == &quot;LINESTRING&quot;), .keep = &quot;unused&quot;) %&gt;% 
      ungroup() %&gt;% 
      st_as_sf()
  }
)

Benchmark results and resulting sf object:

#&gt; Unit: milliseconds
#&gt;        expr      min       lq     mean   median       uq      max neval cld
#&gt;          sf 437.4351 453.1961 476.8028 464.1172 487.9901 628.0495   100  a 
#&gt;  geojson_sf 198.3025 207.9465 219.1129 212.6965 221.7101 309.2461   100   b
routes_sf
#&gt; Simple feature collection with 100 features and 1 field
#&gt; Geometry type: LINESTRING
#&gt; Dimension:     XY
#&gt; Bounding box:  xmin: -4.096064 ymin: 50.40939 xmax: -4.095772 ymax: 50.4104
#&gt; Geodetic CRS:  WGS 84
#&gt; # A tibble: 100 &#215; 2
#&gt;    journey_id                                                           geometry
#&gt;         &lt;int&gt;                                                   &lt;LINESTRING [&#176;]&gt;
#&gt;  1          1 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  2          2 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  3          3 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  4          4 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  5          5 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  6          6 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  7          7 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  8          8 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt;  9          9 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt; 10         10 (-4.095772 50.40939, -4.095781 50.4094, -4.095792 50.4094, -4.095…
#&gt; # ℹ 90 more rows

<sup>Created on 2023-08-04 with reprex v2.0.2</sup>

答案2

得分: 1

library(geojsonsf) 可以读取一个GeoJSON向量，所以不需要进行任何逐行操作
- 创建一些数据
```r
json <- '{"type": "FeatureCollection", "features": [{"type": "Feature", "geometry": {"type": "Point", "coordinates": [-4.095772, 50.409393]}, "properties": {"name": "start"}}, {"type": "Feature", "geometry": null, "properties": {"name": "end"}}, {"type": "Feature", "geometry": {"type": "LineString", "coordinates": [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, "properties": {"distance": 4027.4, "name": "Raw", "times": [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], "duration": 4923.0}}]}'
df <- data.frame(json = rep(json, 3))

转换为 sf 对象

sf <- geojsonsf::geojson_sf(df$json)

可以根据需要进行其他操作

## 删除空几何对象
sf <- sf[ !sf::st_is_empty(sf), ]
## 提取只有LINESTRING的对象
sf <- sf[sf::st_geometry_type(sf) == "LINESTRING", ]
## 转换为长格式的数据框
df <- sfheaders::sf_to_df(sf = sf, fill = TRUE)

英文:

library(geojsonsf) can read a vector of geojson, so no need for any row-wise operations

Create some data

json &lt;- &#39;{&quot;type&quot;: &quot;FeatureCollection&quot;, &quot;features&quot;: [{&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;Point&quot;, &quot;coordinates&quot;: [-4.095772, 50.409393]}, &quot;properties&quot;: {&quot;name&quot;: &quot;start&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: null, &quot;properties&quot;: {&quot;name&quot;: &quot;end&quot;}}, {&quot;type&quot;: &quot;Feature&quot;, &quot;geometry&quot;: {&quot;type&quot;: &quot;LineString&quot;, &quot;coordinates&quot;: [[-4.095772, 50.409393], [-4.095781, 50.409397], [-4.095792, 50.409401], [-4.095965, 50.40971], [-4.096064, 50.410069], [-4.09597, 50.410397]]}, &quot;properties&quot;: {&quot;distance&quot;: 4027.4, &quot;name&quot;: &quot;Raw&quot;, &quot;times&quot;: [1690900467000, 1690900520000, 1690900522000, 1690900539000, 1690900550000, 1690900569000], &quot;duration&quot;: 4923.0}}]}&#39;
df &lt;- data.frame(json = rep(json, 3))

convert to sf object

sf &lt;- geojsonsf::geojson_sf(df$json)

do any other operations you may want with the data

## Remove empty geometries
sf &lt;- sf[ !sf::st_is_empty(sf), ]
## Extract just the LINESTRINGS
sf &lt;- sf[sf::st_geometry_type(sf) == &quot;LINESTRING&quot;, ]
##&#160;Convert to a long data.frame
df &lt;- sfheaders::sf_to_df(sf = sf, fill = TRUE)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从R中的FeatureCollection中提取坐标数据到csv中

问题

答案1

答案2

R pheatmap确定列顺序

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

如何在R中绘制洛伦兹曲线

在R中的1000次迭代中未发现符号变化。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。