创建一个 Neo4j 中的图数据库,来自大型 CSV 文件

huangapple go评论52阅读模式
英文:

Creating a Graph DB in Neo4j from large CSV file

问题

抱歉,这是你提供的英文文本,以下是中文翻译:

我相对新于neo4j,所以如果这似乎很琐碎,我道歉。我正在尝试使用Neo4j桌面应用程序从一个包含大约250万行的CSV文件中导入数据,使用Neo4j浏览器运行它。文件的内容遵循以下格式:

entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM

我尝试使用以下查询:

LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)

但是在运行一个小时后,我收到了MemoryPoolOutOfMemoryError错误,因此我修改了查询以在运行时释放内存:

:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS

但是查询运行了几个小时,我认为它没有被正确实现。所以我需要能够将这些信息存储在数据库中,以便我可以在之后提取节点嵌入向量(我不需要能够可视化图形)。是否有更好的方法来加载这样一个大的列表?导入250万条记录不应该花费那么长的时间,任何帮助都将不胜感激。

英文:

I am relatively new to neo4j so I apologised if this seems trivial.
I am trying to import data from a csv file with quite a large amount of rows, around 2.5 Million rows using the Neo4j desktop app, running it in the Neo4j browser.
The contents of the file follow the format:

entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM

I have tried using the query:

LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)

but I get a MemoryPoolOutOfMemoryError after an hour running, so modify my query to run in 'batches' to free up memory while running:

:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS

but the query runs for hours which I dont think is being implemented correctly, So what I need is to be able to store this information in a DB so that I can extract the node embeddings after (i dont need to be able to visualize the graph). Is there a better way to load a large list like this? importing 2.5M records should not take that long to be honest, any help is appreciated.

答案1

得分: 1

确保在 :Entity(name) 上拥有一个 索引唯一约束(也会作为副产品创建索引),以提高节点的 MERGE 操作效率。例如:

CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE
英文:

Make sure you have an index or uniqueness constraint (which also creates an index as a by-product) on :Entity(name), to make the MERGEs of your nodes more efficient. For instance:

CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE

huangapple
  • 本文由 发表于 2023年5月11日 01:28:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76221146.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定