递归链接对象的算法

huangapple go评论113阅读模式
英文:

Algorithm for recursively linked objects

问题

我正在维护一个小程序,该程序遍历Neo4j数据库中的文档,并将JSON编码的对象转储到文档数据库中。在Neo4j中,出于性能原因,我想象中没有真正的数据,只有ID。

想象一下类似于这样的结构:

posts:
    post:
        id: 1
        tags: 1, 2
        author: 2
        similar: 1, 2, 3

我不知道为什么要这样做,但这就是我要处理的情况。然后,程序使用这些ID来获取每个数据结构的信息,从而得到正确的结构。author不再只是一个int,而是一个Author对象,包含姓名、电子邮件等信息。

这个方法在添加similar功能之前运行良好。similar由引用其他帖子的ID组成。由于在我的循环中我正在构建实际的帖子对象,我该如何以高效的方式引用它们呢?我能想到的唯一办法就是创建一个缓存,用于存储我已经“转换”过的帖子,如果引用的ID不在缓存中,就将当前帖子放在列表底部。最终,它们都将被处理。

英文:

I'm maintaining a small program that goes through documents in a Neo4j database and dumps a JSON-encoded object to a document database. In Neo4j—for performance reasons, I imagine—there's no real data, just ID's.

Imagine something like this:

posts:
    post:
        id: 1
        tags: 1, 2
        author: 2
        similar: 1, 2, 3

I have no idea why it was done like this, but this is what I have to deal with. The program then uses the ID's to fetch information for each data structure, resulting in a proper structure. Instead of author being just an int, it's an Author object, with name, email, and so on.

This worked well until the similar feature was added. Similar consists of ID's referencing other posts. Since in my loop I'm building the actual post objects, how can I reference them in an efficient manner? The only thing I could imagine was creating a cache with the posts I already "converted" and, if the referenced ID is not in the cache, put the current post on the bottom of the list. Eventually, they will all be processed.

答案1

得分: 1

你提出的方法在存在循环的“相似”关系时是行不通的,而这种情况很可能存在。

例如,你展示了一个帖子 1,它与帖子 2 相似。假设你首先遇到了帖子 1。它引用了尚未在缓存中的帖子 2,所以你将帖子 1 推回队列的末尾。现在你到达了帖子 2。它引用了尚未在缓存中的帖子 1,所以你将帖子 2 推回队列的末尾。这个过程会一直进行下去。

你可以通过两次遍历来解决这个问题。在第一次遍历中,你创建 Post 对象并填充所有信息,除了“相似”引用之外。同时,你建立一个将 ID 数字映射到帖子的 map[int]*Post。在第二次遍历中,对于每个帖子,你遍历“相似”ID 数字,在映射中查找每个数字,并使用得到的 *Post 值填充一个 []*Post 类型的相似帖子切片。

英文:

The approach you're proposing won't work if there are cycles of similar relationships, which there probably are.

For example, you've shown a post 1 that is similar to a post 2. Let's say you come across post 1 first. It refers to post 2, which isn't in the cache yet, so you push post 1 back onto the end of the queue. Now you get to post 2. It refers to post 1, which isn't in the cache yet, so you push post 2 back onto the end of the queue. This goes on forever.

You can solve this problem by building the post objects in two passes. During the first pass, you make Post objects and fill them with all the information except for the similar references, and you build up a map[int]*Post that maps ID numbers to posts. On the second pass, for each post, you iterate over the similar ID numbers, look up each one in the map, and use the resulting *Post values to fill a []*Post slice of similar posts.

huangapple
  • 本文由 发表于 2015年9月19日 03:10:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/32659414.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定