问题

我目前正在测试r5r包，它默认使用并行计算来描述其过程。考虑到我想要快速分析的大量起始-目的地点，我想测试一下是否可以使用R中的其他并行化工具来提高速度。

我正在尝试使用以下代码：

library(r5r)
library(sf)
library(tigris)
library(future.apply)
library(rJava)
library(tidyverse)

r5r_core <- setup_r5(data_path = path, verbose = FALSE)

tracts2 <- tracts(state = "PA", county = "Philadelphia", year = 2019) %>%
  select(GEOID) %>%
  st_centroid() %>%
  st_transform("EPSG:4326") %>%
  arrange(GEOID) %>%
  rename(id = GEOID) %>%
  mutate(lon = unlist(map(geometry, 1)),
         lat = unlist(map(geometry, 2))) %>%
  st_set_geometry(NULL) %>%
  as.data.frame()

mode <- c("WALK", "TRANSIT")
max_walk_time <- 30 # minutes
departure_datetime <- as.POSIXct("14-06-2023 8:30:00",
                                 format = "%d-%m-%Y %H:%M:%S")

plan(multicore)

fn <- function(x, y){
  detailed_itineraries(r5r_core = r5r_core,
                       origins = x,
                       destinations = y,
                       mode = mode,
                       departure_datetime = departure_datetime,
                       max_walk_time = max_walk_time,
                       walk_speed = 4.5,
                       max_trip_duration = 60,
                       shortest_path = TRUE,
                       all_to_all = FALSE,
                       drop_geometry = TRUE,
                       progress = TRUE)
}

future_mapply(fn, tracts2, tracts2)

但是出现以下错误：

Error in assign_points_input(origins, "origins") : 
  'origins' must be either a 'data.frame' or a 'POINT sf'.

这里出了什么问题？或者，我是否试图以错误的方式提高速度？

英文:

I'm currently testing out the r5r package, which describes its processes as using parallel computation by default. Given the large number of origin-destination points I want to analyze quickly using its detailed_itineraries function, I wanted to test to see whether it could be sped up at all using any of the other parallelization tools in R.

I am trying using this code:

library(r5r)
library(sf)
library(tigris)
library(future.apply)
library(rJava)
library(tidyverse)

r5r_core &lt;- setup_r5(data_path = path, verbose = FALSE)

tracts2 &lt;- tracts(state = &quot;PA&quot;, county = &quot;Philadelphia&quot;, year=2019)%&gt;%
  select(GEOID)%&gt;%
  st_centroid()%&gt;%
  st_transform(&quot;EPSG:4326&quot;)%&gt;%
  arrange(GEOID)%&gt;%
  rename(id = GEOID)%&gt;%
  mutate(lon = unlist(map(geometry,1)),
         lat = unlist(map(geometry,2)))%&gt;%
  st_set_geometry(NULL)%&gt;%
  as.data.frame()

mode &lt;- c(&quot;WALK&quot;, &quot;TRANSIT&quot;)
max_walk_time &lt;- 30 # minutes
departure_datetime &lt;- as.POSIXct(&quot;14-06-2023 8:30:00&quot;,
                                 format = &quot;%d-%m-%Y %H:%M:%S&quot;)

plan(multicore)

fn &lt;- function(x, y){
  detailed_itineraries(r5r_core = r5r_core,
                       origins = x,
                       destinations = y,
                       mode = mode,
                       departure_datetime = departure_datetime,
                       max_walk_time = max_walk_time,
                       walk_speed = 4.5,
                       max_trip_duration = 60,
                       shortest_path = TRUE,
                       all_to_all = FALSE,
                       drop_geometry = TRUE,
                       progress= TRUE)
}

future_mapply(fn, tracts2, tracts2)

And am getting this error:

Error in assign_points_input(origins, &quot;origins&quot;) : 
  &#39;origins&#39; must be either a &#39;data.frame&#39; or a &#39;POINT sf&#39;.

What is going wrong here? Alternatively, am I barking up the wrong tree trying to gain any speed this way?

答案1

得分: 2

{r5r} 函数不能通过R中的并行化来加速。我们已经多次测试过了，我们尝试在R中进行的任何并行化都不如当前在Java中实现的并行化效率高。要控制路由时使用的线程数，请使用 n_threads 参数。

附注：您看到的错误是由于不正确使用 future_mapply() 引起的。在底层，数据框是一个带有一些附加属性和方法的列表，因此当您将数据框传递给 future_mapply() 时，该函数会迭代到其每一列。实际上，您的代码所做的是将 tracts2 的列传递给 detailed_itineraries()，而不是传递 tracts2 本身。

英文:

{r5r} functions cannot be sped up using parallelization from R. We've tested it a few times already, and any parallelization we tried doing in R was less efficient than the current parallelization implemented in Java. To control the number of threads used when routing, please use the n_threads parameter.

PS: The error you're seeing results from incorrect future_mapply() usage. Under the hood, a data.frame is a list with some additional attributes and methods, so when you pass a data.frame to future_mapply() the function iterates to each one of its columns. Effectively, what you're doing with your code is passing tracts2's columns to detailed_itineraries(), not tracts2 itself.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用r5r进行并行处理。

问题

答案1

@Scheduled行为在处理时间较长时会并行运行吗？

Splitting line or polygon with sf: st_intersection doesn’t work?

Kedro – 在使用ParallelRunner运行流水线时如何设置max_workers？

为什么带有填充字段的结构体运行更快呢？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论