英文:
Creating a Graph DB in Neo4j from large CSV file
问题
抱歉,这是你提供的英文文本,以下是中文翻译:
我相对新于neo4j,所以如果这似乎很琐碎,我道歉。我正在尝试使用Neo4j桌面应用程序从一个包含大约250万行的CSV文件中导入数据,使用Neo4j浏览器运行它。文件的内容遵循以下格式:
entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM
我尝试使用以下查询:
LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
但是在运行一个小时后,我收到了MemoryPoolOutOfMemoryError错误,因此我修改了查询以在运行时释放内存:
:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS
但是查询运行了几个小时,我认为它没有被正确实现。所以我需要能够将这些信息存储在数据库中,以便我可以在之后提取节点嵌入向量(我不需要能够可视化图形)。是否有更好的方法来加载这样一个大的列表?导入250万条记录不应该花费那么长的时间,任何帮助都将不胜感激。
英文:
I am relatively new to neo4j so I apologised if this seems trivial.
I am trying to import data from a csv file with quite a large amount of rows, around 2.5 Million rows using the Neo4j desktop app, running it in the Neo4j browser.
The contents of the file follow the format:
entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM
I have tried using the query:
LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
but I get a MemoryPoolOutOfMemoryError after an hour running, so modify my query to run in 'batches' to free up memory while running:
:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS
but the query runs for hours which I dont think is being implemented correctly, So what I need is to be able to store this information in a DB so that I can extract the node embeddings after (i dont need to be able to visualize the graph). Is there a better way to load a large list like this? importing 2.5M records should not take that long to be honest, any help is appreciated.
答案1
得分: 1
确保在 :Entity(name)
上拥有一个 索引 或 唯一约束(也会作为副产品创建索引),以提高节点的 MERGE
操作效率。例如:
CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE
英文:
Make sure you have an index or uniqueness constraint (which also creates an index as a by-product) on :Entity(name)
, to make the MERGE
s of your nodes more efficient. For instance:
CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论