Modeling relationships in neo4j when they aren't known initially

huangapple go评论71阅读模式
英文:

Modeling relationships in neo4j when they aren't known initially

问题

我目前有一些代码,用于查看各种数据集并建模它们之间的电子关系,例如JSESSIONID。

我想对每个用户与应用程序的交互进行建模,其中他们必须提交唯一标识符,例如电子邮件地址。

在处理应用程序的日志时,我看到emailA@host.com使用JSESSIONID为asdfghjkl的应用程序。然后,我看到emailB@host.com也使用JSESSIONID为asdfghjkl的应用程序。最后,我看到emailB@host.com使用JSESSIONID为qwertyuiop。

在我的Go代码中,我可以轻松处理日志,并将emailA@host.com和emailB@host.com写入为节点,然后写入它们之间的JSESSIONID关系。

MERGE (a:EMAIL {label:userA@host.com}) MERGE (b:EMAIL {label:userB@host.com}) MERGE (a)-[:asdfghjkl]-(b)

然而,我不知道在大规模情况下应该如何处理。 (即应用程序日志大小为1TB)。限制是内存--我无法在不处理所有数据的情况下找到所有使用asdfghjkl作为SessionIDs的电子邮件地址,因此由于内存限制,我无法写出它们之间的关系。

我真正想做的是写出以下内容,但这显然会失败:

MERGE (a:EMAIL {label:userA@host.com}) (a)-[:asdfghjkl]

然后稍后:
MERGE (b:EMAIL {label:userB@host.com}) (b)-[:asdfghjkl]

我能否在事后使用查询创建这些关系?

英文:

I currently have some code that looks through various datasets and models electronic relationships between them. E.g., JSESSIONID.

I would like to model each user's interactions with an application where they have to submit unique identifiers. E.g., email address.

In processing logs of the application, I see emailA@host.com use the application with JSESSIONID asdfghjkl. I then see emailB@host.com also use the applcation with JESSIONID asdfghjkl. Finally, I see emailB@host.com use JSESSIONID qwertyuiop.

In my go code, it's easy for me to process the logs and write out both emailA@host.com and emailB@host.com as Nodes and then write the JSESSIONID relationship between them.

MERGE (a:EMAIL {label:userA@host.com}) MERGE (b:EMAIL {label:userB@host.com}) MERGE (a)-[:asdfghjkl]-(b)

However, I don't know the best way to do this at scale. (i.e., Application logs are 1TB in size). The limitation is memory -- I can't find all email addresses that use asdfghjkl as a SessionIDs without processing all the data, so I can't write out the relationship between them due to memory constraints.

What I would really like to do is to write out something as is follows, but this obviously fails:

MERGE (a:EMAIL {label:userA@host.com}) (a)-[:asdfghjkl]

Then later:
MERGE (b:EMAIL {label:userB@host.com}) (b)-[:asdfghjkl]

Can I create these relationships with a query after the fact?

答案1

得分: 1

听起来你应该将JSESSIONID建模为节点而不是关系,这样可以将JSESSIONID与多个电子邮件地址关联起来,并且可以在id上添加唯一约束以进行快速查找。

MERGE (a:EMAIL {label:userA@host.com})
MERGE (b:EMAIL {label:userB@host.com})
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)

使用特定JSESSION id查找所有:EMAIL节点的查询应该非常快:

MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email

英文:

Sounds like you should model JSESSIONID as nodes rather than as relationships, as that will allow you to link the JSESSIONID to multiple email addresses, and you can add a unique constraint on the id for fast lookups.

MERGE (a:EMAIL {label:userA@host.com}) 
MERGE (b:EMAIL {label:userB@host.com}) 
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)

Your queries to find all :EMAIL nodes using a specific JSESSION id should be quite fast:

MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email

huangapple
  • 本文由 发表于 2017年2月6日 23:47:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/42071805.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定