英文:
Parallel Processing with r5r
问题
我目前正在测试r5r包,它默认使用并行计算来描述其过程。考虑到我想要快速分析的大量起始-目的地点,我想测试一下是否可以使用R中的其他并行化工具来提高速度。
我正在尝试使用以下代码:
library(r5r)
library(sf)
library(tigris)
library(future.apply)
library(rJava)
library(tidyverse)
r5r_core <- setup_r5(data_path = path, verbose = FALSE)
tracts2 <- tracts(state = "PA", county = "Philadelphia", year = 2019) %>%
select(GEOID) %>%
st_centroid() %>%
st_transform("EPSG:4326") %>%
arrange(GEOID) %>%
rename(id = GEOID) %>%
mutate(lon = unlist(map(geometry, 1)),
lat = unlist(map(geometry, 2))) %>%
st_set_geometry(NULL) %>%
as.data.frame()
mode <- c("WALK", "TRANSIT")
max_walk_time <- 30 # minutes
departure_datetime <- as.POSIXct("14-06-2023 8:30:00",
format = "%d-%m-%Y %H:%M:%S")
plan(multicore)
fn <- function(x, y){
detailed_itineraries(r5r_core = r5r_core,
origins = x,
destinations = y,
mode = mode,
departure_datetime = departure_datetime,
max_walk_time = max_walk_time,
walk_speed = 4.5,
max_trip_duration = 60,
shortest_path = TRUE,
all_to_all = FALSE,
drop_geometry = TRUE,
progress = TRUE)
}
future_mapply(fn, tracts2, tracts2)
但是出现以下错误:
Error in assign_points_input(origins, "origins") :
'origins' must be either a 'data.frame' or a 'POINT sf'.
这里出了什么问题?或者,我是否试图以错误的方式提高速度?
英文:
I'm currently testing out the r5r package, which describes its processes as using parallel computation by default. Given the large number of origin-destination points I want to analyze quickly using its detailed_itineraries function, I wanted to test to see whether it could be sped up at all using any of the other parallelization tools in R.
I am trying using this code:
library(r5r)
library(sf)
library(tigris)
library(future.apply)
library(rJava)
library(tidyverse)
r5r_core <- setup_r5(data_path = path, verbose = FALSE)
tracts2 <- tracts(state = "PA", county = "Philadelphia", year=2019)%>%
select(GEOID)%>%
st_centroid()%>%
st_transform("EPSG:4326")%>%
arrange(GEOID)%>%
rename(id = GEOID)%>%
mutate(lon = unlist(map(geometry,1)),
lat = unlist(map(geometry,2)))%>%
st_set_geometry(NULL)%>%
as.data.frame()
mode <- c("WALK", "TRANSIT")
max_walk_time <- 30 # minutes
departure_datetime <- as.POSIXct("14-06-2023 8:30:00",
format = "%d-%m-%Y %H:%M:%S")
plan(multicore)
fn <- function(x, y){
detailed_itineraries(r5r_core = r5r_core,
origins = x,
destinations = y,
mode = mode,
departure_datetime = departure_datetime,
max_walk_time = max_walk_time,
walk_speed = 4.5,
max_trip_duration = 60,
shortest_path = TRUE,
all_to_all = FALSE,
drop_geometry = TRUE,
progress= TRUE)
}
future_mapply(fn, tracts2, tracts2)
And am getting this error:
Error in assign_points_input(origins, "origins") :
'origins' must be either a 'data.frame' or a 'POINT sf'.
What is going wrong here? Alternatively, am I barking up the wrong tree trying to gain any speed this way?
答案1
得分: 2
{r5r}
函数不能通过R中的并行化来加速。我们已经多次测试过了,我们尝试在R中进行的任何并行化都不如当前在Java中实现的并行化效率高。要控制路由时使用的线程数,请使用 n_threads
参数。
附注:您看到的错误是由于不正确使用 future_mapply()
引起的。在底层,数据框是一个带有一些附加属性和方法的列表,因此当您将数据框传递给 future_mapply()
时,该函数会迭代到其每一列。实际上,您的代码所做的是将 tracts2
的列传递给 detailed_itineraries()
,而不是传递 tracts2
本身。
英文:
{r5r}
functions cannot be sped up using parallelization from R. We've tested it a few times already, and any parallelization we tried doing in R was less efficient than the current parallelization implemented in Java. To control the number of threads used when routing, please use the n_threads
parameter.
PS: The error you're seeing results from incorrect future_mapply()
usage. Under the hood, a data.frame is a list with some additional attributes and methods, so when you pass a data.frame to future_mapply()
the function iterates to each one of its columns. Effectively, what you're doing with your code is passing tracts2
's columns to detailed_itineraries()
, not tracts2
itself.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论