英文:
Algorithm for recursively linked objects
问题
我正在维护一个小程序,该程序遍历Neo4j数据库中的文档,并将JSON编码的对象转储到文档数据库中。在Neo4j中,出于性能原因,我想象中没有真正的数据,只有ID。
想象一下类似于这样的结构:
posts:
post:
id: 1
tags: 1, 2
author: 2
similar: 1, 2, 3
我不知道为什么要这样做,但这就是我要处理的情况。然后,程序使用这些ID来获取每个数据结构的信息,从而得到正确的结构。author
不再只是一个int
,而是一个Author
对象,包含姓名、电子邮件等信息。
这个方法在添加similar
功能之前运行良好。similar
由引用其他帖子的ID组成。由于在我的循环中我正在构建实际的帖子对象,我该如何以高效的方式引用它们呢?我能想到的唯一办法就是创建一个缓存,用于存储我已经“转换”过的帖子,如果引用的ID不在缓存中,就将当前帖子放在列表底部。最终,它们都将被处理。
英文:
I'm maintaining a small program that goes through documents in a Neo4j database and dumps a JSON-encoded object to a document database. In Neo4j—for performance reasons, I imagine—there's no real data, just ID's.
Imagine something like this:
posts:
post:
id: 1
tags: 1, 2
author: 2
similar: 1, 2, 3
I have no idea why it was done like this, but this is what I have to deal with. The program then uses the ID's to fetch information for each data structure, resulting in a proper structure. Instead of author
being just an int
, it's an Author
object, with name, email, and so on.
This worked well until the similar
feature was added. Similar
consists of ID's referencing other posts. Since in my loop I'm building the actual post objects, how can I reference them in an efficient manner? The only thing I could imagine was creating a cache with the posts I already "converted" and, if the referenced ID is not in the cache, put the current post on the bottom of the list. Eventually, they will all be processed.
答案1
得分: 1
你提出的方法在存在循环的“相似”关系时是行不通的,而这种情况很可能存在。
例如,你展示了一个帖子 1
,它与帖子 2
相似。假设你首先遇到了帖子 1
。它引用了尚未在缓存中的帖子 2
,所以你将帖子 1
推回队列的末尾。现在你到达了帖子 2
。它引用了尚未在缓存中的帖子 1
,所以你将帖子 2
推回队列的末尾。这个过程会一直进行下去。
你可以通过两次遍历来解决这个问题。在第一次遍历中,你创建 Post
对象并填充所有信息,除了“相似”引用之外。同时,你建立一个将 ID 数字映射到帖子的 map[int]*Post
。在第二次遍历中,对于每个帖子,你遍历“相似”ID 数字,在映射中查找每个数字,并使用得到的 *Post
值填充一个 []*Post
类型的相似帖子切片。
英文:
The approach you're proposing won't work if there are cycles of similar
relationships, which there probably are.
For example, you've shown a post 1
that is similar to a post 2
. Let's say you come across post 1
first. It refers to post 2
, which isn't in the cache yet, so you push post 1
back onto the end of the queue. Now you get to post 2
. It refers to post 1
, which isn't in the cache yet, so you push post 2
back onto the end of the queue. This goes on forever.
You can solve this problem by building the post objects in two passes. During the first pass, you make Post
objects and fill them with all the information except for the similar
references, and you build up a map[int]*Post
that maps ID numbers to posts. On the second pass, for each post, you iterate over the similar
ID numbers, look up each one in the map, and use the resulting *Post
values to fill a []*Post
slice of similar posts.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论