识别特定节点的所有子节点,适用于非常大的数据。

huangapple go评论58阅读模式
英文:

Identify all children of a certain node for very large data

问题

I am trying to find all downstream nodes in a large graph object (~1 million edges).

在尝试查找大型图对象(约1百万个边)中的所有下游节点。

In this example, I want to see all nodes that are downstream from 4:

在此示例中,我想查看从4开始的所有下游节点:

The below code is adapted from this answer using igraph but it ran for over 20 minutes before I ended it. I saw the hR package (see here) but it also has trouble with the size of the data.

下面的代码是从此回答中使用igraph适应的,但在我结束之前运行了20多分钟。我看到了hR包(参见此处),但它也无法处理数据的大小。

I'm new to graph analyses and not sure if there are better methods.

我是图分析的新手,不确定是否有更好的方法。

Bonus question:

Is there a purrr method to replace the sapply() here? I didn't have any luck with map().

奖励问题:

是否有purrr方法可以替代这里的sapply()?我尝试使用map()没有成功。

英文:

I am trying to find all downstream nodes in a large graph object (~1 million edges).

In this example, I want to see all nodes that are downstream from 4:

识别特定节点的所有子节点,适用于非常大的数据。

The below code is adapted from this answer using igraph but it ran for over 20 minutes before I ended it. I saw the hR package (see here) but it also has trouble with the size of the data.

I'm new to graph analyses and not sure if there are better methods

library(tidyverse)
library(igraph)

graph_df <- # real example has 1.2M connections
  make_tree(100000, children = 3)


leafnodes <- # bottleneck
  sapply(
    X = V(graph_df), 
    FUN = \(x) length(neighbors(graph_df, x)) == 0
  )


find_all_children <- function(root) {
  paths <- 
    get.all.shortest.paths(
      graph = graph_df, 
      from = V(graph_df)[root], 
      to = leafnodes
    )$res
  
  tibble(
    path = map_chr(paths, ~paste(.x, collapse = " -> "))
  )
}

# desired output
find_all_children(12)
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8384
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8385
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8386

find_all_children(1234)
#> 1234 -> 3701 -> 11102 -> 33305 -> 99914
#> 1234 -> 3701 -> 11102 -> 33305 -> 99915
#> 1234 -> 3701 -> 11102 -> 33305 -> 99916

Bonus question:

Is there a purrr method to replace the sapply() here? I didn't have any luck with map()

答案1

得分: 1

以下是您要翻译的内容:

"Probably you can try subgraph + subcomponent to prune your tree first and then find all_simple_paths, e.g.,

n <- 100000

g <- make_tree(n, children = 3) %>%
    set_vertex_attr(name = "name", value = seq(vcount(.)))

find_all_children <- function(g, root) {
    g %>%
        subgraph(subcomponent(., root, "out")) %>%
        all_simple_paths(1, V(.)[degree(., mode = "out") == 0])
}

and run the following lines for example

r12 <- find_all_children(g, 12)
r1234 <- find_all_children(g, 1234)

which might be done in seconds even with n <- 100000.

Toy Example

n <- 20

g <- make_tree(n, children = 2) %>%
    set_vertex_attr(name = "name", value = seq(vcount(.)))

plot(g)

识别特定节点的所有子节点,适用于非常大的数据。

and we can obtain

> find_all_children(g, 3)
[[1]]
+ 3/7 vertices, named, from dab4822:
[1] 3  6  12

[[2]]
+ 3/7 vertices, named, from dab4822:
[1] 3  6  13

[[3]]
+ 3/7 vertices, named, from dab4822:
[1] 3  7  14

[[4]]
+ 3/7 vertices, named, from dab4822:
[1] 3  7  15
英文:

Probably you can try subgraph + subcomponent to prune your tree first and then find all_simple_paths, e.g.,

n <- 100000

g <- make_tree(n, children = 3) %>%
    set_vertex_attr(name = "name", value = seq(vcount(.)))

find_all_children <- function(g, root) {
    g %>%
        subgraph(subcomponent(., root, "out")) %>%
        all_simple_paths(1, V(.)[degree(., mode = "out") == 0])
}

and run the following lines for example

r12 <- find_all_children(g, 12)
r1234 <- find_all_children(g, 1234)

which might be done in seconds even with n <- 100000.

Toy Example

n <- 20

g <- make_tree(n, children = 2) %>%
    set_vertex_attr(name = "name", value = seq(vcount(.)))

plot(g)

识别特定节点的所有子节点,适用于非常大的数据。

and we can obtain

> find_all_children(g, 3)
[[1]]
+ 3/7 vertices, named, from dab4822:
[1] 3  6  12

[[2]]
+ 3/7 vertices, named, from dab4822:
[1] 3  6  13

[[3]]
+ 3/7 vertices, named, from dab4822:
[1] 3  7  14

[[4]]
+ 3/7 vertices, named, from dab4822:
[1] 3  7  15

huangapple
  • 本文由 发表于 2023年5月26日 08:13:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76336914.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定