2015年9月19日 03:10:48go评论142阅读模式

英文:

Algorithm for recursively linked objects

问题

我正在维护一个小程序，该程序遍历Neo4j数据库中的文档，并将JSON编码的对象转储到文档数据库中。在Neo4j中，出于性能原因，我想象中没有真正的数据，只有ID。

想象一下类似于这样的结构：

posts:
    post:
        id: 1
        tags: 1, 2
        author: 2
        similar: 1, 2, 3

我不知道为什么要这样做，但这就是我要处理的情况。然后，程序使用这些ID来获取每个数据结构的信息，从而得到正确的结构。author不再只是一个int，而是一个Author对象，包含姓名、电子邮件等信息。

这个方法在添加similar功能之前运行良好。similar由引用其他帖子的ID组成。由于在我的循环中我正在构建实际的帖子对象，我该如何以高效的方式引用它们呢？我能想到的唯一办法就是创建一个缓存，用于存储我已经“转换”过的帖子，如果引用的ID不在缓存中，就将当前帖子放在列表底部。最终，它们都将被处理。

英文:

I'm maintaining a small program that goes through documents in a Neo4j database and dumps a JSON-encoded object to a document database. In Neo4j—for performance reasons, I imagine—there's no real data, just ID's.

Imagine something like this:

posts:
    post:
        id: 1
        tags: 1, 2
        author: 2
        similar: 1, 2, 3

I have no idea why it was done like this, but this is what I have to deal with. The program then uses the ID's to fetch information for each data structure, resulting in a proper structure. Instead of author being just an int, it's an Author object, with name, email, and so on.

This worked well until the similar feature was added. Similar consists of ID's referencing other posts. Since in my loop I'm building the actual post objects, how can I reference them in an efficient manner? The only thing I could imagine was creating a cache with the posts I already "converted" and, if the referenced ID is not in the cache, put the current post on the bottom of the list. Eventually, they will all be processed.

答案1

得分: 1

你提出的方法在存在循环的“相似”关系时是行不通的，而这种情况很可能存在。

例如，你展示了一个帖子 1，它与帖子 2 相似。假设你首先遇到了帖子 1。它引用了尚未在缓存中的帖子 2，所以你将帖子 1 推回队列的末尾。现在你到达了帖子 2。它引用了尚未在缓存中的帖子 1，所以你将帖子 2 推回队列的末尾。这个过程会一直进行下去。

你可以通过两次遍历来解决这个问题。在第一次遍历中，你创建 Post 对象并填充所有信息，除了“相似”引用之外。同时，你建立一个将 ID 数字映射到帖子的 map[int]*Post。在第二次遍历中，对于每个帖子，你遍历“相似”ID 数字，在映射中查找每个数字，并使用得到的 *Post 值填充一个 []*Post 类型的相似帖子切片。

英文:

The approach you're proposing won't work if there are cycles of similar relationships, which there probably are.

For example, you've shown a post 1 that is similar to a post 2. Let's say you come across post 1 first. It refers to post 2, which isn't in the cache yet, so you push post 1 back onto the end of the queue. Now you get to post 2. It refers to post 1, which isn't in the cache yet, so you push post 2 back onto the end of the queue. This goes on forever.

You can solve this problem by building the post objects in two passes. During the first pass, you make Post objects and fill them with all the information except for the similar references, and you build up a map[int]*Post that maps ID numbers to posts. On the second pass, for each post, you iterate over the similar ID numbers, look up each one in the map, and use the resulting *Post values to fill a []*Post slice of similar posts.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

递归链接对象的算法

问题

答案1

为什么在尝试将此JSON编组时，Go会出错？

proto: 无法解析无效的线格式数据

调用中的参数过多

编写长正则表达式时，嵌入注释的惯用语是什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。