英文:
Identify all children of a certain node for very large data
问题
I am trying to find all downstream nodes in a large graph object (~1 million edges).
在尝试查找大型图对象(约1百万个边)中的所有下游节点。
In this example, I want to see all nodes that are downstream from 4:
在此示例中,我想查看从4开始的所有下游节点:
The below code is adapted from this answer using igraph
but it ran for over 20 minutes before I ended it. I saw the hR
package (see here) but it also has trouble with the size of the data.
下面的代码是从此回答中使用igraph
适应的,但在我结束之前运行了20多分钟。我看到了hR
包(参见此处),但它也无法处理数据的大小。
I'm new to graph analyses and not sure if there are better methods.
我是图分析的新手,不确定是否有更好的方法。
Bonus question:
Is there a purrr
method to replace the sapply()
here? I didn't have any luck with map()
.
奖励问题:
是否有purrr
方法可以替代这里的sapply()
?我尝试使用map()
没有成功。
英文:
I am trying to find all downstream nodes in a large graph object (~1 million edges).
In this example, I want to see all nodes that are downstream from 4:
The below code is adapted from this answer using igraph
but it ran for over 20 minutes before I ended it. I saw the hR
package (see here) but it also has trouble with the size of the data.
I'm new to graph analyses and not sure if there are better methods
library(tidyverse)
library(igraph)
graph_df <- # real example has 1.2M connections
make_tree(100000, children = 3)
leafnodes <- # bottleneck
sapply(
X = V(graph_df),
FUN = \(x) length(neighbors(graph_df, x)) == 0
)
find_all_children <- function(root) {
paths <-
get.all.shortest.paths(
graph = graph_df,
from = V(graph_df)[root],
to = leafnodes
)$res
tibble(
path = map_chr(paths, ~paste(.x, collapse = " -> "))
)
}
# desired output
find_all_children(12)
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8384
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8385
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8386
find_all_children(1234)
#> 1234 -> 3701 -> 11102 -> 33305 -> 99914
#> 1234 -> 3701 -> 11102 -> 33305 -> 99915
#> 1234 -> 3701 -> 11102 -> 33305 -> 99916
Bonus question:
Is there a purrr
method to replace the sapply()
here? I didn't have any luck with map()
答案1
得分: 1
以下是您要翻译的内容:
"Probably you can try subgraph
+ subcomponent
to prune your tree first and then find all_simple_paths
, e.g.,
n <- 100000
g <- make_tree(n, children = 3) %>%
set_vertex_attr(name = "name", value = seq(vcount(.)))
find_all_children <- function(g, root) {
g %>%
subgraph(subcomponent(., root, "out")) %>%
all_simple_paths(1, V(.)[degree(., mode = "out") == 0])
}
and run the following lines for example
r12 <- find_all_children(g, 12)
r1234 <- find_all_children(g, 1234)
which might be done in seconds even with n <- 100000
.
Toy Example
n <- 20
g <- make_tree(n, children = 2) %>%
set_vertex_attr(name = "name", value = seq(vcount(.)))
plot(g)
and we can obtain
> find_all_children(g, 3)
[[1]]
+ 3/7 vertices, named, from dab4822:
[1] 3 6 12
[[2]]
+ 3/7 vertices, named, from dab4822:
[1] 3 6 13
[[3]]
+ 3/7 vertices, named, from dab4822:
[1] 3 7 14
[[4]]
+ 3/7 vertices, named, from dab4822:
[1] 3 7 15
英文:
Probably you can try subgraph
+ subcomponent
to prune your tree first and then find all_simple_paths
, e.g.,
n <- 100000
g <- make_tree(n, children = 3) %>%
set_vertex_attr(name = "name", value = seq(vcount(.)))
find_all_children <- function(g, root) {
g %>%
subgraph(subcomponent(., root, "out")) %>%
all_simple_paths(1, V(.)[degree(., mode = "out") == 0])
}
and run the following lines for example
r12 <- find_all_children(g, 12)
r1234 <- find_all_children(g, 1234)
which might be done in seconds even with n <- 100000
.
Toy Example
n <- 20
g <- make_tree(n, children = 2) %>%
set_vertex_attr(name = "name", value = seq(vcount(.)))
plot(g)
and we can obtain
> find_all_children(g, 3)
[[1]]
+ 3/7 vertices, named, from dab4822:
[1] 3 6 12
[[2]]
+ 3/7 vertices, named, from dab4822:
[1] 3 6 13
[[3]]
+ 3/7 vertices, named, from dab4822:
[1] 3 7 14
[[4]]
+ 3/7 vertices, named, from dab4822:
[1] 3 7 15
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论