2023年5月11日 01:28:00go评论87阅读模式

英文:

Creating a Graph DB in Neo4j from large CSV file

问题

抱歉，这是你提供的英文文本，以下是中文翻译：

我相对新于neo4j，所以如果这似乎很琐碎，我道歉。我正在尝试使用Neo4j桌面应用程序从一个包含大约250万行的CSV文件中导入数据，使用Neo4j浏览器运行它。文件的内容遵循以下格式：

entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM

我尝试使用以下查询：

LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)

但是在运行一个小时后，我收到了MemoryPoolOutOfMemoryError错误，因此我修改了查询以在运行时释放内存：

:auto LOAD CSV WITH HEADERS FROM 'file:///all_triplets.csv' AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -> (entity2)
} IN TRANSACTIONS

但是查询运行了几个小时，我认为它没有被正确实现。所以我需要能够将这些信息存储在数据库中，以便我可以在之后提取节点嵌入向量（我不需要能够可视化图形）。是否有更好的方法来加载这样一个大的列表？导入250万条记录不应该花费那么长的时间，任何帮助都将不胜感激。

英文:

I am relatively new to neo4j so I apologised if this seems trivial.
I am trying to import data from a csv file with quite a large amount of rows, around 2.5 Million rows using the Neo4j desktop app, running it in the Neo4j browser.
The contents of the file follow the format:

entity1, relation, entity2
entity1, relation, entity3
...
entityN, relation, entityM

I have tried using the query:

LOAD CSV WITH HEADERS FROM &#39;file:///all_triplets.csv&#39; AS row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -&gt; (entity2)

but I get a MemoryPoolOutOfMemoryError after an hour running, so modify my query to run in 'batches' to free up memory while running:

:auto LOAD CSV WITH HEADERS FROM &#39;file:///all_triplets.csv&#39; AS row
CALL {
WITH row
MERGE (entity1:Entity {name:row.entity1} )
MERGE (entity2:Entity {name:row.entity2} )
MERGE (entity1) - [:RELATION {name:row.relation}] -&gt; (entity2)
} IN TRANSACTIONS

but the query runs for hours which I dont think is being implemented correctly, So what I need is to be able to store this information in a DB so that I can extract the node embeddings after (i dont need to be able to visualize the graph). Is there a better way to load a large list like this? importing 2.5M records should not take that long to be honest, any help is appreciated.

答案1

得分: 1

确保在 :Entity(name) 上拥有一个索引或唯一约束（也会作为副产品创建索引），以提高节点的 MERGE 操作效率。例如：

CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE

英文:

Make sure you have an index or uniqueness constraint (which also creates an index as a by-product) on :Entity(name), to make the MERGEs of your nodes more efficient. For instance:

CREATE CONSTRAINT Entity_name FOR (e:Entity) REQUIRE e.name IS UNIQUE

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建一个 Neo4j 中的图数据库，来自大型 CSV 文件

问题

答案1

Setting up Neo4J with Golang

返回边的查询

Neo4j Python Driver：将节点作为参数传递给数据库。

无法在Cypher查询中正确使用COUNT。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。