英文:
Algorithm for finding all closed walks in a graph from a given vertex
问题
我有一个无向图,没有多重/平行边,每条边都带有权重(距离)。我想要在图中找到一定最小和最大总距离之间的闭合路径。我还希望能够设置路径中重复距离的限制,例如,如果一条边的权重为10,被遍历两次,那么重复距离就是10。将重复距离设置为0应该可以找到图中的闭合路径(而不是路径)。最后,虽然找到所有闭合路径会很好,但我不确定对于具有成千上万个潜在路径的较大图来说是否可行,因为我认为这将需要指数时间。因此,我目前正在尝试找到一些根据启发式函数最佳的闭合路径,以便算法能够在合理的时间内完成。
编辑:近似性能要求-我用来测试的图当前具有17,070个节点和23,026条边。生成的循环包含大约50条边,需要大约20秒的时间来找到,但我希望在时间范围内找到更长距离(~100条边)的循环,理想情况下也是在<1分钟的时间内。
这是我目前的方法:
graph.go:
package graph
import (
"fmt";
"github.com/bgadrian/data-structures/priorityqueue"
)
// Graph结构是一个由节点和边组成的集合,表示为邻接表。
// 每条边都有一个权重(float64)。
type Edge struct {
HeuristicValue int // 较低的启发式值=较高的优先级
Distance float64
}
type Graph struct {
Edges map[int]map[int]Edge // 邻接表,通过Edges[fromId][toId]访问
}
func (g *Graph) FindAllLoops(start int, minDistance float64, maxDistance float64, maxRepeated float64, routesCap int) []Route {
// 找到从节点start开始并以start结束的所有循环。
// 循环是图中的闭合路径。
loops := []Route{}
loopCount := 0
queue, _ := priorityqueue.NewHierarchicalHeap(100, 0, 100, false)
queue.Enqueue(*NewRoute(start), 0)
size := 1
distanceToStart := getDistanceToStart(g, start)
for size > 0 {
// 从栈中弹出最后一个元素
temp, _ := queue.Dequeue()
size--
route := temp.(Route)
// 获取路径的最后一个节点
lastNode := route.LastNode()
// 如果路径太长或者重复距离太多,
// 就不再继续探索它
if route.Distance+distanceToStart[route.LastNode()] > maxDistance || route.RepeatedDistance > maxRepeated {
continue
}
// 现在我们需要高效地检查总重复距离是否太长
// 如果最后一个节点是起始节点并且路径足够长,将其添加到循环列表中
if lastNode == start && route.Distance >= minDistance && route.Distance <= maxDistance {
loops = append(loops, route)
loopCount++
if loopCount >= routesCap {
return loops
}
}
// 将最后一个节点的所有邻居添加到栈中
for neighbour := range g.Edges[lastNode] {
// newRoute将是当前路径的副本,但添加了新节点
newRoute := route.WithAddedNode(neighbour, g.Edges[lastNode][neighbour])
queue.Enqueue(newRoute, newRoute.Heuristic())
size++
}
}
return loops
}
route.go:
package graph
import (
"fmt";
)
// route是一个轻量级的结构,描述了一条路径
// 它以节点序列(int,即节点ID)和距离(float64)的形式存储
// 它还计算了重复距离的总和(两次经过相同边)
// 为此,它存储了一系列已经遍历过的边
type Route struct {
Nodes []int
Distance float64
Repeated []IntPair // 两个int对应节点,按(最小,最大)排序
RepeatedDistance float64
Heuristic int // 用于生成所需类型的Routes的启发式函数
LastWay int
}
type IntPair struct {
A int
B int
}
// WithAddedNode向路径添加一个节点
func (r *Route) WithAddedNode(node int, edge Edge) Route {
newRoute := Route{Nodes: make([]int, len(r.Nodes)+1), Distance: r.Distance, Repeated: make([]IntPair, len(r.Repeated), len(r.Repeated)+1), RepeatedDistance: r.RepeatedDistance, Ways: r.Ways, LastWay: r.LastWay}
copy(newRoute.Nodes, r.Nodes)
copy(newRoute.Repeated, r.Repeated)
newRoute.Nodes[len(r.Nodes)] = node
newRoute.Distance += edge.Distance
// 获取一个IntPair,其中A是较小的节点ID,B是较大的节点ID
var pair IntPair
if newRoute.Nodes[len(newRoute.Nodes)-2] < node {
pair = IntPair{newRoute.Nodes[len(newRoute.Nodes)-2], node}
} else {
pair = IntPair{node, newRoute.Nodes[len(newRoute.Nodes)-2]}
}
// 检查边是否已经遍历过
found := false
for _, p := range newRoute.Repeated {
if p == pair {
newRoute.RepeatedDistance += edge.Distance
found = true
break
}
}
if !found {
newRoute.Repeated = append(newRoute.Repeated, pair)
}
// 当前启发式函数将边的启发式值相加,但这可能会在将来更改
newRoute.Heuristic += edge.HeuristicValue
return newRoute
}
func (r *Route) Heuristic() int {
return r.Heuristic
}
func (r *Route) LastNode() int {
return r.Nodes[len(r.Nodes)-1]
}
目前我已经采取了以下措施来加快算法的速度:
- 不再找到所有循环,而是根据给定的启发式函数找到前几个循环。
- 修剪图以尽可能删除大部分度为2的节点,用权重等于原来两条边权重之和的单条边替换它们。
- 使用切片来存储重复边,而不是使用映射。
- 使用Dijkstra算法找到从起始节点到每个其他节点的最短距离。如果一条路径无法在这个距离内返回,立即丢弃它。(算法实现的代码未包含在帖子中,因为它只占运行时间的约4%)
根据Go分析器的结果,在route.go中的WithAddedNode方法占用了68%的运行时间,该方法的时间大致均分在runtime.memmove和runtime.makeslice(调用runtime.mallocgc)之间。基本上,我认为大部分时间都花在复制Routes上。非常感谢您对如何改进此算法的运行时间或任何可能的替代方法的任何反馈。如果有任何其他信息我可以提供,请告诉我。非常感谢!
英文:
I have graph which is undirected, no multiple/parallel edges, and each edge is weighted with a distance. I am looking to find closed walks in the graph between a certain minimum and maximum total distance. I would also like to be able to set a limit on the amount of repeated distance in the walk, for example, if an edge with 10 is traversed twice, there is a repeated distance of 10. Setting the repeated distance to 0 should find closed paths in the graph (as opposed to walks). Lastly, although it would be nice to find all closed walks, I'm not sure if this is feasible for larger graphs where there are thousands/millions of potential routes, since I believe this would take exponential time. So, I am currently trying to find a few closed walks that are the best according to a heuristic function, in order for the algorithm to finish in a reasonable amount of time.
Edit: approximate performance requirements - the graph I am using to test currently has 17,070 nodes and 23,026 edges. Loops that are generated contain approximately 50 edges, taking on the order of 20 seconds to find, however I would like to find loops of longer distances (~100 edges) ideally also in the time frame of <1 minute.
Here is my current approach:
graph.go:
package graph
import (
"fmt"
"github.com/bgadrian/data-structures/priorityqueue"
)
// A Graph struct is a collection of nodes and edges represented as an adjacency list.
// Each edge has a weight (float64).
type Edge struct {
HeuristicValue int // Lower heuristic = higher priority
Distance float64
}
type Graph struct {
Edges map[int]map[int]Edge // adjacency list, accessed like Edges[fromId][toId]
}
func (g *Graph) FindAllLoops(start int, minDistance float64, maxDistance float64, maxRepeated float64, routesCap int) []Route {
// Find all loops that start and end at node start.
// A loop is a closed walk through the graph.
loops := []Route{}
loopCount := 0
queue, _ := priorityqueue.NewHierarchicalHeap(100, 0, 100, false)
queue.Enqueue(*NewRoute(start), 0)
size := 1
distanceToStart := getDistanceToStart(g, start)
for size > 0 {
// Pop the last element from the stack
temp, _ := queue.Dequeue()
size--
route := temp.(Route)
// Get the last node from the route
lastNode := route.LastNode()
// If the route is too long or has too much repeated distance,
// don't bother exploring it further
if route.Distance + distanceToStart[route.LastNode()] > maxDistance || route.RepeatedDistance > maxRepeated {
continue
}
// Now we need to efficiently check if the total repeated distance is too long
// If the last node is the start node and the route is long enough, add it to the list of loops
if lastNode == start && route.Distance >= minDistance && route.Distance <= maxDistance {
loops = append(loops, route)
loopCount++
if loopCount >= routesCap {
return loops
}
}
// Add all the neighbours of the last node to the stack
for neighbour := range g.Edges[lastNode] {
// newRoute will be a copy of the current route, but with the new node added
newRoute := route.WithAddedNode(neighbour, g.Edges[lastNode][neighbour])
queue.Enqueue(newRoute, newRoute.Heuristic())
size++
}
}
return loops
}
route.go:
package graph
import (
"fmt"
)
// A route is a lightweight struct describing a route
// It is stored as a sequence of nodes (ints, which are the node ids)
// and a distance (float64)
// It also counts the total distance that is repeated (going over the same
// edge twice)
// To do this, it stores a list of edges that have been traversed before
type Route struct {
Nodes []int
Distance float64
Repeated []IntPair // two ints corresponding to the nodes, orderedd by (smallest, largest)
RepeatedDistance float64
Heuristic int // A heuristic for the type of Routes we would like to generate
LastWay int
}
type IntPair struct {
A int
B int
}
// WithAddedNode adds a node to the route
func (r *Route) WithAddedNode(node int, edge Edge) Route {
newRoute := Route{Nodes: make([]int, len(r.Nodes) + 1), Distance: r.Distance, Repeated: make([]IntPair, len(r.Repeated), len(r.Repeated) + 1), RepeatedDistance: r.RepeatedDistance, Ways: r.Ways, LastWay: r.LastWay}
copy(newRoute.Nodes, r.Nodes)
copy(newRoute.Repeated, r.Repeated)
newRoute.Nodes[len(r.Nodes)] = node
newRoute.Distance += edge.Distance
// Get an IntPair where A is the smaller node id and B is the larger
var pair IntPair
if newRoute.Nodes[len(newRoute.Nodes)-2] < node {
pair = IntPair{newRoute.Nodes[len(newRoute.Nodes)-2], node}
} else {
pair = IntPair{node, newRoute.Nodes[len(newRoute.Nodes)-2]}
}
// Check if the edge has been traversed before
found := false
for _, p := range newRoute.Repeated {
if p == pair {
newRoute.RepeatedDistance += edge.Distance
found = true
break
}
}
if !found {
newRoute.Repeated = append(newRoute.Repeated, pair)
}
// Current heuristic sums heuristics of edges but this may change in the future
newRoute.Heuristic += edge.HeuristicValue
return newRoute
}
func (r *Route) Heuristic() int {
return r.Heuristic
}
func (r *Route) LastNode() int {
return r.Nodes[len(r.Nodes)-1]
}
Here are the things I have currently done to speed up the algorithm so far:
- Rather than finding all loops, find the first few loops according to a given heuristic.
- Trimmed the graph to remove almost all nodes of degree 2 where possible, replacing them with a single edge with weight = sum of the original two edges.
- Used a slice to store repeated edges as opposed to a map.
- Used Dijkstra's algorithm to find the shortest distance from the start node to every other node. If a route is unable to make it back in this amount of distance, discard it immediately. (Code for the algorithm implementation is not included in the post because it only accounts for ~4% of runtime)
According to the Go profiler, the WithAddedNode method in route.go is accounting for 68% of the runtime, with time in that method approximately evenly split between runtime.memmove and runtime.makeslice (calling runtime.mallocgc). Essentially, I believe most of the time is spent copying Routes. I would greatly appreciate any feedback on how to improve this algorithm's runtime or any possible alternative approaches. Let me know if there is any additional information I can provide. Thank you so much!
答案1
得分: 1
寻找两个顶点之间所有路径的标准算法如下:
- 应用Dijkstra算法找到最便宜的路径
- 增加路径中的链接成本
- 重复上述步骤,直到没有新的路径被找到
在你的情况下,你希望路径返回到起始顶点。这个算法可以很容易地修改如下:
- 将起始顶点S分成两个顶点S1和S2。
- 遍历与S相连的顶点V。
- 在S1和V之间添加成本为零的边
- 在S2和V之间添加成本为零的边
- 移除S及其边。
- 运行标准算法,找到S1和S2之间的所有路径。
你需要修改Dijkstra算法的代码以考虑其他约束条件,如最大长度和"启发式"。
性能
虽然你提到你关心大图上的性能,但没有提供你所需的性能指标。我将尝试提供一个估计。
我不了解GO的性能情况,所以我将假设它与我熟悉的C++相同。
以下是C++的一些性能测量数据:
在从https://dyngraphlab.github.io/下载的大图上运行Dijkstra算法,找到从一个随机选择的顶点到其他每个顶点的最便宜路径的运行时间。使用graphex应用程序进行3次运行的最长结果。
顶点数 | 边数 | 运行时间(秒) |
---|---|---|
10,000 | 50,000 | 0.3 |
145,000 | 2,200,000 | 40 |
假设你希望找到10条闭合路径-将上述运行时间乘以10。
英文:
The standard algorithm to find all paths between two vertices is:
- Apply Dijkstra to find cheapest path
- Increment cost of links in path
- Repeat until no new path found
In your case you want the paths to return to the starting vertex. The algorithm can easily be modified to do this
- Split the starting vertex, S, into two, S1 and S2.
- LOOP V over vertices connected to S.
- Add zero cost edge between S1 and V
- Add zero cost edge between S2 and V
- Remove S and its edges.
- Run standard algorithm to find all paths between S1 and S2
You will need to modify the Dijkstra code to take account of your other constraints - maximum length and the "heuristic"
Performance
Although you say you are concerned about performance on large graphs, you give no indication of your required performance. I will try to provide an estimate.
I do not know anything about the performance of GO, so I will assume it is the same as C++ with which I am familiar.
Here are some performance measurements for C++:
Run time for running the Dijkstra algorithm to find the cheapest paths from one randomly selected vertex to every other vertex on large graphs downloaded from https://dyngraphlab.github.io/. Longest result from 3 runs using the graphex application.
Vertex Count | Edge Count | Run time ( secs ) |
---|---|---|
10,000 | 50,000 | 0.3 |
145,000 | 2,200,000 | 40 |
Suppose that you wish to find 10 closed paths - multiply the above run times by 10.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论