英文:
Modeling relationships in neo4j when they aren't known initially
问题
我目前有一些代码,用于查看各种数据集并建模它们之间的电子关系,例如JSESSIONID。
我想对每个用户与应用程序的交互进行建模,其中他们必须提交唯一标识符,例如电子邮件地址。
在处理应用程序的日志时,我看到emailA@host.com使用JSESSIONID为asdfghjkl的应用程序。然后,我看到emailB@host.com也使用JSESSIONID为asdfghjkl的应用程序。最后,我看到emailB@host.com使用JSESSIONID为qwertyuiop。
在我的Go代码中,我可以轻松处理日志,并将emailA@host.com和emailB@host.com写入为节点,然后写入它们之间的JSESSIONID关系。
MERGE (a:EMAIL {label:userA@host.com}) MERGE (b:EMAIL {label:userB@host.com}) MERGE (a)-[:asdfghjkl]-(b)
然而,我不知道在大规模情况下应该如何处理。 (即应用程序日志大小为1TB)。限制是内存--我无法在不处理所有数据的情况下找到所有使用asdfghjkl作为SessionIDs的电子邮件地址,因此由于内存限制,我无法写出它们之间的关系。
我真正想做的是写出以下内容,但这显然会失败:
MERGE (a:EMAIL {label:userA@host.com}) (a)-[:asdfghjkl]
然后稍后:
MERGE (b:EMAIL {label:userB@host.com}) (b)-[:asdfghjkl]
我能否在事后使用查询创建这些关系?
英文:
I currently have some code that looks through various datasets and models electronic relationships between them. E.g., JSESSIONID.
I would like to model each user's interactions with an application where they have to submit unique identifiers. E.g., email address.
In processing logs of the application, I see emailA@host.com use the application with JSESSIONID asdfghjkl. I then see emailB@host.com also use the applcation with JESSIONID asdfghjkl. Finally, I see emailB@host.com use JSESSIONID qwertyuiop.
In my go code, it's easy for me to process the logs and write out both emailA@host.com and emailB@host.com as Nodes and then write the JSESSIONID relationship between them.
MERGE (a:EMAIL {label:userA@host.com}) MERGE (b:EMAIL {label:userB@host.com}) MERGE (a)-[:asdfghjkl]-(b)
However, I don't know the best way to do this at scale. (i.e., Application logs are 1TB in size). The limitation is memory -- I can't find all email addresses that use asdfghjkl as a SessionIDs without processing all the data, so I can't write out the relationship between them due to memory constraints.
What I would really like to do is to write out something as is follows, but this obviously fails:
MERGE (a:EMAIL {label:userA@host.com}) (a)-[:asdfghjkl]
Then later:
MERGE (b:EMAIL {label:userB@host.com}) (b)-[:asdfghjkl]
Can I create these relationships with a query after the fact?
答案1
得分: 1
听起来你应该将JSESSIONID建模为节点而不是关系,这样可以将JSESSIONID与多个电子邮件地址关联起来,并且可以在id上添加唯一约束以进行快速查找。
MERGE (a:EMAIL {label:userA@host.com})
MERGE (b:EMAIL {label:userB@host.com})
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)
使用特定JSESSION id查找所有:EMAIL节点的查询应该非常快:
MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email
英文:
Sounds like you should model JSESSIONID as nodes rather than as relationships, as that will allow you to link the JSESSIONID to multiple email addresses, and you can add a unique constraint on the id for fast lookups.
MERGE (a:EMAIL {label:userA@host.com})
MERGE (b:EMAIL {label:userB@host.com})
MERGE (jsid:JSESSIONID {id:'asdfghjkl'})
MERGE (a)-[:jsid]->(jsid)
MERGE (b)-[:jsid]->(jsid)
Your queries to find all :EMAIL nodes using a specific JSESSION id should be quite fast:
MATCH (email:EMAIL)-[:jsid]->(jsid:JSESSIONID {id:'asdfghjkl'})
RETURN email
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论