2023年3月7日 20:18:55go评论61阅读模式

英文:

Duplication of vertices in Amazon Neptune

问题

我想在Amazon Neptune中使用Gremlin创建一些逻辑，实现以下操作：

1. 加载包含customer_id和postcode列的数据行。

2. 检查该行的postcode值是否已存在于数据库中：

A. 如果已存在，则为该行的customer_id值创建一个新顶点，然后创建一个新边，将连接从刚创建的customer_id顶点到已存在的postcode顶点。

B. 否则，如果不存在，则为该行的customer_id值创建一个新顶点，为该行的postcode值创建一个新顶点，然后创建一个新边，将连接从刚创建的customer_id顶点到刚创建的postcode顶点。

这样做的目的是避免创建重复的顶点。
如果您认为我的逻辑存在问题，我愿意尝试不同的方法。
我尝试过一些方法，但未能找到一段代码来执行上述所有操作。
我正在使用Gremlin。

英文:

I want to create some logic that does the following in Amazon Neptune using Gremlin:

1. Load a row of data that contains customer_id and postcode columns

2. Check if the postcode value from that row already exists in the database:

A. If it does, then create a new vertex for the row's customer_id value and then create a new edge that makes a connection from the customer_id vertex that has just been created to the pre-existing postcode vertex

B. Else, if it does not, then create a new vertex for the row's customer_id value, create a new vertex for the row's postcode value and then create a new edge that makes a connection from the customer_id vertex that has just been created to the postcode vertex that has just been created

The purpose of this is to avoid creating duplicate vertices.
I am open to different approaches if you can see flaws in my logic.
I have tried a few methods but I've been unable to get a single piece of logic to perform all of the above.
I am using Gremlin.

答案1

得分: 1

First, if you want to ensure uniqueness, each vertex and edge in a graph in Neptune must have a unique ID. So it is good practice to leverage that concept to the fullest. Deterministic IDs are also great for fast lookups, as a lookup by a vertex/edge ID is the fastest operation in Neptune. If you don't supply a value for the vertex/edge IDs, then Neptune creates an ID using a UUID.

After that you'll want to consider using a conditional write pattern. In Gremlin, you can follow the pattern documented in Practical Gremlin [1].

So the pattern, for your use case, would follow something like:

g.V().hasLabel('customer').has('customer_id',<id>).
    fold().coalesce(
        unfold(),
        addV('customer').property('customer_id',<id>)
    ).aggregate('c').
    V().hasLabel('postcode').has('postcode',<postcode>).
        fold().coalesce(
            unfold(),
            addV('postcode').property('postcode',<postcode>)
        ).
    addE('hasPostCode').from(select('c').unfold())

Note: The aggregate() step is used above because we're wanting to label something in our query but then we need to cross a collapsing barrier step (fold()) later on in the query. If we were to use as(), the label will not persist beyond the collapsing barrier step.

If using deterministic IDs, this could be simplified. Say we use an ID nomenclature of "customer-id" for customer vertices and "postcode-code" for postcode vertices:

g.V(<customer_id>).
    fold().coalesce(
        unfold(),
        addV('customer').property(id,<customer_id>)
    ).
    V(<postcode_id>).
        fold().coalesce(
            unfold(),
            addV('postcode').property(id,<postcode_id>)
        ).
    addE('hasPostCode').from(V(<customer_id>)

英文:

After that you'll want to consider using a conditional write pattern. In Gremlin, you can follow the pattern documented in Practical Gremlin [1].

So the pattern, for your use case, would follow something like:

g.V().hasLabel(&#39;customer&#39;).has(&#39;customer_id&#39;,&lt;id&gt;).
    fold().coalesce(
        unfold(),
        addV(&#39;customer&#39;).property(&#39;customer_id&#39;,&lt;id&gt;)
    ).aggregate(&#39;c&#39;).
    V().hasLabel(&#39;postcode&#39;).has(&#39;postcode&#39;,&lt;postcode&gt;).
        fold().coalesce(
            unfold(),
            addV(&#39;postcode&#39;).property(&#39;postcode&#39;,&lt;postcode&gt;)
        ).
    addE(&#39;hasPostCode&#39;).from(select(&#39;c&#39;).unfold())

>Note: The aggregate() step is used above because we're wanting to label something in our query but then we need to cross a collapsing barrier step (fold()) later on in the query. If we were to use as(), the label will not persist beyond the collapsing barrier step.

If using deterministic IDs, this could be simplified. Say we use an ID nomenclature of "customer-id" for customer vertices and "postcode-code" for postcode vertices:

g.V(&lt;customer_id&gt;).
    fold().coalesce(
        unfold(),
        addV(&#39;customer&#39;).property(id,&lt;customer_id&gt;)
    ).
    V(&lt;postcode_id&gt;).
        fold().coalesce(
            unfold(),
            addV(&#39;postcode&#39;).property(id,&lt;postcode_id&gt;)
        ).
    addE(&#39;hasPostCode&#39;).from(V(&lt;customer_id&gt;)

[1] https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html#upsert

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Amazon Neptune 中顶点的重复

问题

答案1

如何在哈希映射中识别重复的数值。

LOAD DATA INFILE 主键上的重复条目 auto_increment id

在Golang中处理查询结果

按首次发生日期和姓名在R中保留记录。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论