2023年6月19日 11:50:16go评论96阅读模式

英文:

Writing a BFS Search Algorithm by Hand

问题

我正在尝试了解有关网络图的搜索算法。为了说明这一点，我创建了以下示例。

步骤1： 假设有100个国家（country_1....country_100）彼此随机连接：

set.seed(123)
library(igraph)
countries <- paste0("country_", 1:100)
g <- make_empty_graph(100)
num_edges <- 200
edge_list <- sample(countries, size = num_edges * 2, replace = TRUE)
edge_list <- matrix(edge_list, ncol = 2, byrow = TRUE)
g <- graph_from_edgelist(edge_list, directed = FALSE)
V(g)$name <- countries
plot(g, vertex.label.cex = 0.7, vertex.label.color = "black", vertex.label.dist = 2)

步骤2： 现在，假设有20个人（person_A...person_T）住在这些国家（每个国家最多只能有一个人 - 80个国家将为空）：

edge_list <- as_edgelist(g)
df <- as.data.frame(edge_list)
colnames(df) <- c("from", "to")
people <- paste0("person_", LETTERS[1:20])
assignment <- sample(countries, size = length(people), replace = FALSE)
names(assignment) <- people
df2 <- data.frame(country = countries)
df2$person <- ifelse(df2$country %in% assignment, names(assignment)[match(df2$country, assignment)], "empty")

步骤3： 作为可选步骤，我们可以可视化结果：

library(visNetwork)
df2$color <- ifelse(df2$person == "empty", "grey", "red")
df2$label <- ifelse(df2$person == "empty", df2$country, paste0(df2$person, "\n", df2$country))
nodes <- data.frame(id = df2$country, label = df2$label, color = df2$color)
edges <- df
visNetwork(nodes, edges) %>%
    visInteraction(navigationButtons = TRUE)

我的问题： 假设我们选择 "person_A" - 我想找出离 "person_A" 最近的人是谁，以及这个人住在哪个国家。 我有兴趣学习如何手动编写一个BFS算法来解决这个问题 - 例如：选择 person_A 并搜索半径为degree1的所有人 - 如果找不到任何人，现在搜索半径为degree2的所有人...一直继续，直到找到第一个人。

我知道如何使用预先构建的算法实现这个算法：

adj_matrix <- as_adjacency_matrix(g)
diag(adj_matrix) <- 0
shortest_paths <- shortest.paths(g)
df2_filtered <- subset(df2, person != "empty")
selected_countries <- intersect(rownames(shortest_paths), df2_filtered$country)
filtered_paths <- shortest_paths[selected_countries, selected_countries]
item = df2[df2$person %in% c("person_A"), ]
#答案（排除距离= 0，即相同国家本身）
sort(filtered_paths[rownames(filtered_paths) == item$country, ])[2]

请问有人能否展示给我如何编写一个（手动）搜索算法来完成这个任务，从一个人的名字开始，然后在每一步打印搜索结果，直到找到一个人为止？

英文:

I am trying to learn more about search algorithms for network graphs. To illustrate this, I created the following example.

Step 1: Suppose there are 100 countries (country_1....country_100) that are randomly connected to each other

set.seed(123)
library(igraph)
countries &lt;- paste0(&quot;country_&quot;, 1:100)
g &lt;- make_empty_graph(100)
num_edges &lt;- 200
edge_list &lt;- sample(countries, size = num_edges * 2, replace = TRUE)
edge_list &lt;- matrix(edge_list, ncol = 2, byrow = TRUE)
g &lt;- graph_from_edgelist(edge_list, directed = FALSE)
V(g)$name &lt;- countries
plot(g, vertex.label.cex = 0.7, vertex.label.color = &quot;black&quot;, vertex.label.dist = 2)

Step 2: Now, suppose 20 people (person_A...person_T) live in these countries (each country can only have at most one person - 80 of the countries will be empty):

edge_list &lt;- as_edgelist(g)
df &lt;- as.data.frame(edge_list)
colnames(df) &lt;- c(&quot;from&quot;, &quot;to&quot;)
people &lt;- paste0(&quot;person_&quot;, LETTERS[1:20])
assignment &lt;- sample(countries, size = length(people), replace = FALSE)
names(assignment) &lt;- people
df2 &lt;- data.frame(country = countries)
df2$person &lt;- ifelse(df2$country %in% assignment, names(assignment)[match(df2$country, assignment)], &quot;empty&quot;)

Step 3: As an optional step, we can visualize the results:

library(visNetwork)
df2$color &lt;- ifelse(df2$person == &quot;empty&quot;, &quot;grey&quot;, &quot;red&quot;)
df2$label &lt;- ifelse(df2$person == &quot;empty&quot;, df2$country, paste0(df2$person, &quot;\n&quot;, df2$country))
nodes &lt;- data.frame(id = df2$country, label = df2$label, color = df2$color)
edges &lt;- df
visNetwork(nodes, edges) %&gt;% 
    visInteraction(navigationButtons = TRUE)

My Problem: Suppose we take "person_A" - I want to find out who is the nearest person to "person_A" and which country this person lives in. I am interested in learning how to write a BFS algorithm for this problem by hand - for example: take person_A and search everyone in a radius of degree1 - if no one is found, now search everyone in a radius of degree2 ... continue until you find the first person(s).

I know how to use a pre-built implementation of this algorithm :

adj_matrix &lt;- as_adjacency_matrix(g)
diag(adj_matrix) &lt;- 0
shortest_paths &lt;- shortest.paths(g)
df2_filtered &lt;- subset(df2, person != &quot;empty&quot;)
selected_countries &lt;- intersect(rownames(shortest_paths), df2_filtered$country)
filtered_paths &lt;- shortest_paths[selected_countries, selected_countries]
item = df2[df2$person %in% c(&quot;person_A&quot;), ]
#answer (exclude distance = 0, i.e. the same country itself)
sort(filtered_paths[rownames(filtered_paths) == item$country, ])[2]

Can someone please show me how I could write a search algorithm (by hand) to accomplish this task which starts with a person's name - and then prints the results of the search at each step until a person is found?

答案1

得分: 3

背景/概述

广度优先搜索的大致思想是从图中的一个点开始（我们称之为a），将所有邻居添加到未探索点的列表中（我们称之为frontier）。然后，逐个遍历列表中的点，对于每个点，将该点的未见邻居添加到队列的末尾，以此类推，直到找到要查找的点（可能是特定点b，或者只是符合您设置的某些条件的任何点），或者直到没有地方可探索（因为已经探索了所有地方）。

整理数据

首先，我将对数据进行一些清理。我创建一个数据框，其中只包含存在的人（没有空值）：

people_df <- df2 %>%
    filter(person != "empty") %>%
    select(person, country)

然后，我将国家连接数据框df转换为数据框neighbours_df，其中给出了每个点的邻居。数据框的结构是这样的，例如有一行：

>           from         to
>     country_31 country_79

但没有相反的情况，即：

>           from         to
>     country_79 country_31

因此，我交换了列，将反向的列添加到第一列的末尾，并将每个点的邻居分组到一个列表中，以使其更整洁：

reversed_df <- df %>%
    mutate(new_from = to, to = from, from = new_from) %>%
    select(from, to)
neighbours_df <- df %>%
    bind_rows(reversed_df) %>%
    filter(from != to) %>%
    group_by(from) %>%
    summarise(to = list(to))

实现

breadth_first_search <- function(person, neighbours_df, people_df) {
  # 获取该人的国家
  starting_country <- people_df$country[people_df$person == person]
  # 初始化已访问列表
  visited <- c()
  # 初始化前沿，以起始点为起点
  frontier <- list()
  frontier[[starting_country]] <- 0
  # 初始化距离起点的变量（以便我们可以打印距离起点有多远）
  distance_from_start <- 0
  # 当前前沿不为空时
  while (length(frontier) > 0) {
    # 获取前沿的第一个元素
    current <- names(frontier)[1]
    # 获取从当前点到起点的距离
    distance_from_start <- frontier[[1]]
    print(paste0("当前点：", current, "（距离起点", distance_from_start, "步）"))
    # 移除前沿的第一个元素（我们刚刚选择的那个）
    frontier <- frontier[-1]
    # 如果当前点在我们的`people_df`的国家列中（即有人居住在那里），并且不是起始国家，则返回居住在那里的人
    if (current %in% people_df$country && current != starting_country) {
      found_person <- people_df$person[people_df$country == current]
      print(paste0("找到的人：", found_person, "，距离起点", distance_from_start, "步，在国家", current))
      return(found_person)
    }
    # 将当前点添加到已访问列表
    visited <- c(visited, current)
    # 获取当前点的邻居
    neighbs <- neighbours_df$to[neighbours_df$from == current][[1]]
    # 添加邻居到前沿，如果它们还没有被访问过
    neighbs <- neighbs[!neighbs %in% visited]
    frontier <- c(frontier, setNames(rep(distance_from_start + 1, length(neighbs)), neighbs))
  }
  # 如果搜索了所有点，但没有找到任何人，则返回NA
  return(NA)
}
print(breadth_first_search("person_R", neighbours_df, people_df))
# [1] "person_J"

参考资料/更多信息

我在很大程度上参考了这篇文章，作者是Red Blob Games（这是另一篇文章的伴随文章，介绍了广度优先搜索（以及其他类似的图搜索算法，如A*）的工作原理）。如果您想要更全面地了解它们的工作原理、BFS的优缺点，和/或想要玩一些交互式的东西，我建议您查看这些文章！

英文:

Background/Overview

The big picture idea of Breadth First Search is that you start at a point in a graph (let's call it a), and you add all of the neighbours to a list of unexplored points (which we'll call the frontier). You then go through the list, one by one, and for each point, you add the unseen neighbours of that point to the end of the queue, and so on, until you either find the point you are looking for (which could be a specific point b, or just any point which meets a certain criteria you've set), or alternatively you run out of places (because you've explored everywhere).

Tidying the data

First, I'm going to clean up the data a bit. I create a dataframe which is just the people that exist (no empties):

people_df &lt;- df2 %&gt;% 
    filter(person != &quot;empty&quot;) %&gt;%
    select(person, country)

I then turn the country connections dataframe df into a dataframe neighbours_df which gives me the neighbours of each point. The way the dataframe was structured, it has (for example) a row:

> from to
> country_31 country_79
but none for the reverse, i.e.

> from to
> country_79 country_31

So I switch the columns, add the reversed ones onto the end of the first one, and group each point's neighbours into a list, to make it tidier:

reversed_df &lt;- df %&gt;% 
    mutate(new_from = to, to = from, from = new_from) %&gt;% 
    select(from, to)
neighbours_df &lt;- df %&gt;% 
    bind_rows(reversed_df) %&gt;% 
    filter(from != to) %&gt;%
    group_by(from) %&gt;%
    summarise(to = list(to))
 	
#          from                            to
# 1    country_1   c(&quot;country_8&quot;, &quot;country_92&quot;)

Implementation

breadth_first_search &lt;- function(person, neighbours_df, people_df) {
  # get the country of the person
    starting_country &lt;- people_df$country[people_df$person == person]
  # initialise the visited list 
    visited &lt;- c()
  # initialise the frontier with the starting point
    frontier &lt;- list()
    frontier[[starting_country]] &lt;- 0
  # initialise distance from start variable (so we can print how far we are from the start)
    distance_from_start &lt;- 0
  # while the frontier is not empty
    while (length(frontier) &gt; 0) {
      # get the first element of the frontier
        current &lt;- names(frontier)[1]
      # get the distance from current to start
        distance_from_start &lt;- frontier[[1]]
      
      print(paste0(&quot;Current point: &quot;, current, &quot; (&quot;, distance_from_start, &quot; steps from start)&quot;))
      # remove the first element of the frontier (the one we just selected)
        frontier &lt;- frontier[-1]
        # if the current point is in the country column of our `people_df` (i.e. someone lives there), and it&#39;s not the starting country, return the person who lives there
        if (current %in% people_df$country &amp;&amp; current != starting_country) {
          found_person &lt;- people_df$person[people_df$country == current]
          print(paste0(&quot;Found person: &quot;, found_person, &quot;, &quot; , distance_from_start , &quot; steps from start, in country &quot;, current))
            return(found_person)
        }
        # add the current point to the visited list
        visited &lt;- c(visited, current)
        # get the neighbors of the current point
        neighbs &lt;- neighbours_df$to[neighbours_df$from == current][[1]]
        # add the neighbors to the frontier if they haven&#39;t been visited already
        neighbs &lt;- neighbs[!neighbs %in% visited]
        frontier &lt;- c(frontier, setNames(rep(distance_from_start + 1, length(neighbs)), neighbs))
        }
    # if we search through all the points, and didn&#39;t find anyone, return NA
    return(NA)
}
print(breadth_first_search(&quot;person_R&quot;, neighbours_df, people_df))
# [1] &quot;person_J&quot;

References / More Information

I'm cribbed fairly extensively from this article by Red Blob Games (the companion to this other piece, which provides a great introduction to how breadth first search (and other similar graph search algorithms, like A* work). If you would like a more comprehensive take on how they work, the drawbacks and strengths of BFS, and/or want to play around with some interactive things, I would recommend checking that out!

答案2

得分: 2

更新

我认为你不是在寻找广度优先搜索（BFS），而是寻找由最近有效邻居定义的“ego”网络。可能你可以尝试下面的代码

f <- function(p, df = df2, graph = g) {
    v <- df %>%
        filter(person == p) %>%
        select(country) %>%
        pluck(1)
    nulls <- df2 %>%
        filter(person == "empty") %>%
        select(country) %>%
        pluck(1)
    pers <- df2 %>%
        filter(person != "empty")
    d <- distances(graph, v, setdiff(names(V(graph)), c(v, nulls)))
    split(pers$person[match(colnames(d), pers$country)], d)
}

如此

> f("person_A")
$`2`
[1] "person_N" "person_K"
$`3`
[1] "person_C" "person_E" "person_I"
$`4`
[1] "person_P" "person_M" "person_Q"
$`5`
[1] "person_J" "person_S" "person_O" "person_T" "person_G" "person_F" "person_R"
[8] "person_H"
$`6`
[1] "person_L"
$`7`
[1] "person_B"
> f("person_B")
$`2`
[1] "person_F"
$`3`
[1] "person_S" "person_G"
$`4`
[1] "person_J" "person_P" "person_H"
$`5`
[1] "person_M" "person_N" "person_K" "person_T" "person_R" "person_E" "person_L"
$`6`
[1] "person_Q" "person_O" "person_C" "person_I"
$`7`
[1] "person_A"
> f("person_C")
$`1`
[1] "person_N"
$`3`
[1] "person_P" "person_M" "person_A" "person_K"
$`4`
[1] "person_J" "person_Q" "person_O" "person_G" "person_R" "person_H" "person_E"
[8] "person_I"
$`5`
[1] "person_S" "person_T" "person_F" "person_L"
$`6`
[1] "person_B"

之前

不确定你自己实现BFS算法的动机和/或必要性，但我猜值得尝试一下igraph中的bfs（因为你在你的问题中已经在使用igraph）

f <- function(p, df = df2, graph = g) {
    v <- df %>%
        filter(person == p) %>%
        select(country) %>%
        pluck(1)
    df %>%
        arrange(match(country, names(bfs(graph, v)$order))) %>%
        filter(person != "empty") %>%
        slice(2) %>%
        select(person) %>%
        pluck(1)
}

如此

> f("person_A")
[1] "person_N"
> f("person_B")
[1] "person_F"
> f("person_C")
[1] "person_N"

英文:

Update

I don't think you are looking for BFS but just the ego networks defined by the nearest valid neighbors. Probably you can try the code below

f &lt;- function(p, df = df2, graph = g) {
    v &lt;- df %&gt;%
        filter(person == p) %&gt;%
        select(country) %&gt;%
        pluck(1)
    nulls &lt;- df2 %&gt;%
        filter(person == &quot;empty&quot;) %&gt;%
        select(country) %&gt;%
        pluck(1)
    pers &lt;- df2 %&gt;%
        filter(person != &quot;empty&quot;)
    d &lt;- distances(graph, v, setdiff(names(V(graph)), c(v, nulls)))
    split(pers$person[match(colnames(d), pers$country)], d)
}

such that

&gt; f(&quot;person_A&quot;)
$`2`
[1] &quot;person_N&quot; &quot;person_K&quot;
$`3`
[1] &quot;person_C&quot; &quot;person_E&quot; &quot;person_I&quot;
$`4`
[1] &quot;person_P&quot; &quot;person_M&quot; &quot;person_Q&quot;
$`5`
[1] &quot;person_J&quot; &quot;person_S&quot; &quot;person_O&quot; &quot;person_T&quot; &quot;person_G&quot; &quot;person_F&quot; &quot;person_R&quot;
[8] &quot;person_H&quot;
$`6`
[1] &quot;person_L&quot;
$`7`
[1] &quot;person_B&quot;
&gt; f(&quot;person_B&quot;)
$`2`
[1] &quot;person_F&quot;
$`3`
[1] &quot;person_S&quot; &quot;person_G&quot;
$`4`
[1] &quot;person_J&quot; &quot;person_P&quot; &quot;person_H&quot;
$`5`
[1] &quot;person_M&quot; &quot;person_N&quot; &quot;person_K&quot; &quot;person_T&quot; &quot;person_R&quot; &quot;person_E&quot; &quot;person_L&quot;
$`6`
[1] &quot;person_Q&quot; &quot;person_O&quot; &quot;person_C&quot; &quot;person_I&quot;
$`7`
[1] &quot;person_A&quot;
&gt; f(&quot;person_C&quot;)
$`1`
[1] &quot;person_N&quot;
$`3`
[1] &quot;person_P&quot; &quot;person_M&quot; &quot;person_A&quot; &quot;person_K&quot;
$`4`
[1] &quot;person_J&quot; &quot;person_Q&quot; &quot;person_O&quot; &quot;person_G&quot; &quot;person_R&quot; &quot;person_H&quot; &quot;person_E&quot;
[8] &quot;person_I&quot;
$`5`
[1] &quot;person_S&quot; &quot;person_T&quot; &quot;person_F&quot; &quot;person_L&quot;
$`6`
[1] &quot;person_B&quot;

Not sure about your motivation and/or necessity of implementing the BFS algorithm by yourself, but I guess it is worthy to try bfs from igraph (since you are using igraph anyway in your question)

f &lt;- function(p, df = df2, graph = g) {
    v &lt;- df %&gt;%
        filter(person == p) %&gt;%
        select(country) %&gt;%
        pluck(1)
    df %&gt;%
        arrange(match(country, names(bfs(graph, v)$order))) %&gt;%
        filter(person != &quot;empty&quot;) %&gt;%
        slice(2) %&gt;%
        select(person) %&gt;%
        pluck(1)
}