英文:
Optimization of mgo for pathfinding
问题
我已经在Go中实现了一个A*算法,用于在地图上找到两个坐标之间的路径。地图数据是使用mgo从MongoDB集合中获取的。
然而,它非常慢。对于1000米的路线,它需要大约4秒的时间。我已经计时了算法的不同部分,并得出结论,瓶颈在于从数据库获取数据。或者更准确地说,在将二进制数据转换为Go可以理解的数据结构时。
我尽量减少请求的数量,并使用多线程请求来加快速度,这在一定程度上有所帮助。但它远远没有我希望的那么快。
看起来我做错了一些基本的事情。任何建议都将非常有帮助。
MongoDB中的数据结构(从OSM获取的节点):
{
"_id": NumberLong(194637483),
"lat": 55.7079899,
"lon": 13.3756414,
"neighbours": [NumberLong(1566264689), NumberLong(1566264806)]
}
Go中的数据结构:
type Node struct {
ID int64 `bson:"_id" json:"id"`
Lat float64 `bson:"lat" json:"lat"`
Lon float64 `bson:"lon" json:"lon"`
Neighbours []int64 `bson:"neighbours" json:"neighbours"`
}
获取数据的代码示例:
func addToBufferLong(buffer *WriteLockMap, session *mgo.Session, at, goal geo.Node, lowerLat, higherLat, lowerLon, higherLon float64) {
c := session.DB("collectionName").C("nodes")
query := c.Find(bson.M{"lat": bson.M{"$gt": lowerLat, "$lt": higherLat}, "lon": bson.M{"$gt": lowerLon, "$lt": higherLon}})
var nodes []geo.Node
query.All(&nodes)
for _, node := range nodes {
tmp := PathNode{0, node.DistanceTo(goal), 0, node}
buffer.Lock()
buffer.m[node.ID] = tmp
buffer.Unlock()
}
}
编辑:
多线程策略是基于将我希望查询的区域分成4个不同的正方形,即四个象限,并使用addToBufferLong(...)分别进行查询。
最近的输出结果:
> time GOMAXPROCS=8 ./toystar
Starting A star
Thread A, count: 19657, db-fetch: 0.122792104s, parsing: 0.934650055s
Thread B, count: 19574, db-fetch: 0.274384302s, parsing: 1.196350664s
Thread C, count: 4197, db-fetch: 0.388197823s, parsing: 0.930109241s
Thread D, count: 9900, db-fetch: 0.540008325s, parsing: 0.93963728s
Total database fetch: 1.534268099s
Total A star (with fetches): 1.854748244s
real 0m1.907s
其中,db-fetch表示执行以query := c.Find(...)开头的行所花费的时间,parsing表示执行query.All(&nodes)所花费的时间。
编辑2:
在Stack Overflow的帮助下,我成功地显著降低了执行时间。当前的输出结果如下:
> time GOMAXPROCS=8 ./toystar
Starting A star
Thread A: 0.210783141s
Thread B: 0.380938949s
Thread C: 0.441447793s
Thread D: 0.507361847s
Total database fetch: 0.507543875s
number of fetches: 1
Total A star: 0.826343287s
real 0m0.860s
主要区别在于多线程策略和使用*mgo.Iter
而不是query.All(&nodes)
。
英文:
I have implemented an A* algorithm in Go to find the path between two coordinates on a map. The map data is fetched with mgo from a MongoDB collection.
It is however very slow. It sits around 4 seconds for a 1000 meter route. I have timed the different parts of the algorithm and concluded that the bottle neck is in the fetching from the database. Or to be precise: in the conversion from binary data to a data structure that Go understands.
I try to do as few requests as possible, and multi-thread the requests to make it faster, and it helps to a certain extent. But it's not nearly as fast as I wish it was.
It seems like I'm doing something fundamentally wrong. Any suggestions at all would be helpful.
Data structure in mongoDB: (Nodes fetched from OSM)
{ "_id" : NumberLong(194637483),
"lat" : 55.7079899,
"lon" : 13.3756414,
"neighbours" : [ NumberLong(1566264689), NumberLong(1566264806) ]
}
Data structure in Go
type Node struct {
ID int64 `bson:"_id" json:"id"`
Lat float64 `bson:"lat" json:"lat"`
Lon float64 `bson:"lon" json:"lon"`
Neighbours []int64 `bson:"neighbours" json:"neighbours"`
}
Code for getting a piece of data:
func addToBufferLong(buffer *WriteLockMap, session *mgo.Session, at, goal geo.Node, lowerLat, higherLat, lowerLon, higherLon float64) {
c := session.DB("collectionName").C("nodes")
query := c.Find(bson.M{"lat": bson.M{"$gt": lowerLat, "$lt": higherLat}, "lon": bson.M{"$gt": lowerLon, "$lt": higherLon}})
var nodes []geo.Node
query.All(&nodes)
for _, node := range nodes {
tmp := PathNode{0, node.DistanceTo(goal), 0, node}
buffer.Lock()
buffer.m[node.ID] = tmp
buffer.Unlock()
}
}
Edit:
The multi-threading strategy is based on splitting the area I wish to query into 4 different squares, quadrants if you will, and doing them separately with addToBufferLong(...)
Most recent print-outs:
> time GOMAXPROCS=8 ./toystar
Starting A star
Thread A, count: 19657, db-fetch: 0.122792104s, parsing: 0.934650055s
Thread B, count: 19574, db-fetch: 0.274384302s, parsing: 1.196350664s
Thread C, count: 4197, db-fetch: 0.388197823s, parsing: 0.930109241s
Thread D, count: 9900, db-fetch: 0.540008325s, parsing: 0.93963728s
Total database fetch: 1.534268099 s
Total A star (with fetches): 1.854748244
real 0m1.907s
where db-fetch measures the time it takes to do the row that starts with query := c.Find(...) and parsing measures the time it takes to do the query.All(&nodes)
Edit 2:
I've managed to drop the execution time significantly, with the help of fellow stack overflow users. Current print-outs:
> time GOMAXPROCS=8 ./toystar
Starting A star
Thread A: 0.210783141s
Thread B: 0.380938949s
Thread C: 0.441447793s
Thread D: 0.507361847s
Total database fetch: 0.507543875 s
number of fetches: 1
Total A star: 0.826343287s
real 0m0.860s
The main difference being the multi-threading strategy and using *mgo.Iter
instead of query.All(&nodes)
答案1
得分: 2
根据可用信息,我可以得出以下结论:
- 您的机器似乎很慢,可能是嵌入式设备?
您所调用的db-fetch
阶段实际上并没有访问数据库。c.Find(...)
只是构建了一个*mgo.Query
值。该方法只有6行代码,不应该花费超过一毫秒的时间。除非数据库对象的内部会话状态存在争用,但由于您只使用了4个goroutine,这似乎不是问题。
- 瓶颈可能是网络和/或反射
query.All(&nodes)
是实际执行数据库查询的地方。此外,这个阶段会分配所需的节点切片,并通过反射将bson逐个解码为您的结构定义。
- 你可以尝试使用
*mgo.iter
您可以使用query.Iter()
而不是query.All(...)
来获取一个*mgo.Iter
,并逐批迭代您的数据集。这可能通过更好地分配网络负载来提高性能。
- 使用地理空间索引,MongoDB支持
请参阅文档。也许您已经在使用了。如果没有,这可能会提高查找时间。
- 将您的象限分割成子象限,递归进行处理。
我认为这一点是显而易见的。分而治之,对吧?
英文:
From the information available, this is what I can deduce:
- Your machine seems to be slow, maybe an embedded device?
The stage you called db-fetch
doesn't actually access the database. The only thing c.Find(...)
does is building a *mgo.Query
value. The method is 6 lines long. This shouldn't take more than a millisecond. Unless there's contention on the database object's internal session state, which doesn't seem to be the case because you're using only 4 goroutines.
- Bottleneck seems to be network and/or reflection
query.All(&nodes)
is where the query is actually executed on your database. In addition, this phase allocates the slice of nodes it needs and then iteratively decodes bson into your struct definition via reflection.
- One thing you could try:
*mgo.iter
Instead of query.All(...)
you could use query.Iter()
to obtain a *mgo.Iter
and iterate batch-wise over your data sets. This might improve performance by better distributing network load over time.
- Use geospacial indexing, Mongo supports it
See the documentation. Maybe you're already doing this. If not, it might improve lookup time.
- Split up your quadrants into subquadrants, recursively.
I think this one is obvious. Divide and conquer, right?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论