英文:
Is there a simpler version of this cypher query?
问题
我已经构建了一个查询,用于查找互相关注且阅读相同流派书籍的人。以下是查询:
MATCH (u1:User)-[:READ]->(b1:Book)
WITH collect(DISTINCT b1.genre) AS genres, u1 AS user1
MATCH (u2:User)-[:READ]->(b2:Book)
WHERE (user1)<-[:FOLLOWS]->(u2) AND b2.genre IN genres
RETURN DISTINCT user1.username AS user1, u2.username AS user2
这个查询的思路是,首先收集其中一个用户阅读的所有书籍的流派,然后在另一个用户的书籍中查找是否有书籍的流派与前者相同,并且他们互相关注。这个查询似乎有效:它返回一组不同的个体对。不过,我想知道是否有更快捷的方法来执行此操作?我的解决方案似乎有些繁琐,但我发现尝试指定它们阅读相同流派的书籍而不获取所有书籍对和重复的个体有些麻烦。例如,我最初编写了以下查询:
MATCH (b1:Book)<-[:READ]-(u1:User)-[:FOLLOWS]-(u2:User)-[:READ]->(b2:Book)
WHERE b1.genre = b2.genre
RETURN DISTINCT u1.username AS user1, u2.username AS user2
这个查询看起来更简单,但实际上会返回所有阅读相同流派书籍的书籍对的重复姓名。我的解决方案是否最简单,还是有更简单的方法?
英文:
I have constructed a query to find the people who follow each other and who have read books in the same genre. Here it is:
MATCH (u1:User)-[:READ]->(b1:Book)
WITH collect(DISTINCT b1.genre) AS genres,u1 AS user1
MATCH (u2:User)-[:READ]->(b2:Book)
WHERE (user1)<-[:FOLLOWS]->(u2) AND b2.genre IN genres
RETURN DISTINCT user1.username AS user1,u2.username AS user2
The idea is that we collect all the book genres for one of them, and if a book read by the other is in that list of genres (and they follow each other), then we return those users. This seems to work: we get a list of distinct pairs of individuals. I wonder, though, if there a quicker way to do this? My solution seems somewhat clumsy, but I found it surprisingly finicky trying to specify that they have read a book in the same genre without getting back all the pairs of books and duplicating individuals. For example, I
first wrote the following:
MATCH (b1:Book)<-[:READ]-(u1:User)-[:FOLLOWS]-(u2:User)-[:READ]->(b2:Book)
WHERE b1.genre = b2.genre
RETURN DISTINCT u1.username AS user1, u2.username AS user2
Which seems simpler, but in fact it returned repeated names for all the books that were read in the same genre. Is my solution the simplest, or is there a simpler one?
答案1
得分: 4
这是重新编写查询的一种方式:
MATCH (n1:用户)-[:关注]-(n2:用户)
MATCH (n1)-[:阅读]->(书), (n2)-[:阅读]->(书2)
WHERE 书.类型 = 书2.类型
RETURN n1.用户名, n2.用户名, count(*)
这是收集每个用户类型的另一种方式:
MATCH (n1:用户)-[:关注]-(n2:用户)
WITH n1, n2,
[(n1)-[:阅读]->(书) | 书.类型] AS g1,
[(n2)-[:阅读]->(书) | 书.类型] AS g2
WHERE ANY(x IN g1 WHERE x IN g2)
RETURN n1, n2, count(*)
请注意,有时候查询越长并不一定更好,因为数据检索的方式需要符合你自己的理解。
然而,你的模型清楚地显示出从图形重构中获益,将类型提取到自己的节点中,例如:
MATCH (n:书)
MERGE (g:类型 {名称: n.类型})
MERGE (n)-[:有类型]->(g)
这将是利用图模型的新查询:
PROFILE
MATCH (n1:用户)-[:关注]-(n2:用户)
WHERE (n1)-[:阅读]->()-[:有类型]->()<-[:有类型]-()<-[:阅读]-(n2)
RETURN n1.用户名, n2.用户名, count(*)
英文:
This is one way of rewriting the query
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
MATCH (n1)-[:READ]->(book), (n2)-[:READ]->(book2)
WHERE book.genre = book2.genre
RETURN n1.username, n2.username, count(*)
Here is another collecting genres for each user
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WITH n1, n2,
[(n1)-[:READ]->(book) | book.genre] AS g1,
[(n2)-[:READ]->(book) | book.genre] AS g2
WHERE ANY(x IN g1 WHERE x IN g2)
RETURN n1, n2, count(*)
Note that sometimes longer queries are not especially better in the sense that the ways the data are retrieved need to make sense to yourself.
Your model however clearly shows that you would benefit from a bit of graph refactoring, extracting the genre into its own node, for eg
MATCH (n:Book)
MERGE (g:Genre {name: n.genre})
MERGE (n)-[:HAS_GENRE]->(g)
And this would be the new query which leverages a graph model
PROFILE
MATCH (n1:User)-[:FOLLOWS]-(n2:User)
WHERE (n1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(n2)
RETURN n1.username, n2.username, count(*)
答案2
得分: 1
根据@ChristopheWillemsen的建议,您应该考虑创建唯一的“Genre”节点,并在每个“Book”和其“Genre”之间添加关系。
这不仅会使您的数据模型“更具图形性”,通过直接存储(并使关系可见)书籍和流派之间的关系,还可以优化您的用例。
以下是一个快速查询,返回阅读了至少一本具有相同流派的书籍的所有唯一用户对的名称。
MATCH (u1:User)-[:FOLLOWS]-(u2:User)
WHERE
ID(u1) < ID(u2) AND
(u1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(u2)
RETURN DISTINCT u1.username, u2.username
解释(这里有很多内容):
-
上述的
MATCH
关系模式是“无方向”的(不指定关系方向),因此它将匹配任何方向的关系。这很好,但是一个_对称_的无方向关系模式(其中两个端点具有相同的节点模式,或者至少一个节点模式只是()
)将导致相同的节点对被返回两次(除非以相反的顺序)。对于您的用例,假定您不希望将'Alice'/'Bob'和'Bob'/'Alice'视为不同的用户对。-
(A) 修复此问题的一种潜在方法是使用有向关系模式。这对于所有用例都不适用,但对于您的用例将有效。但是,如果
u2
也FOLLOWS
u1
,那么您仍然会获得重复的用户对。只有在可能存在多个相同方向的FOLLOWS
关系时,才需要在这里使用DISTINCT
选项。MATCH (u1:User)-[:FOLLOWS]->(u2:User) WHERE (u1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(u2) RETURN DISTINCT u1.username, u2.username
-
(B) 此答案顶部呈现的查询使用了一种不同的方法来强制返回节点的顺序——即通过本地ID。对于该查询,可能明智使用
DISTINCT
选项,因为它将消除相同两个节点之间存在的多个FOLLOWS
关系,无论方向如何。当使用我的测试数据对(A)和(B)进行分析时,(B) 使用了稍微较少的DB查询。但是,您应该对(A)和(B)进行自己的分析,以查看在您自己的实际数据中哪个更好,如果(A)在任何情况下都可以接受的话。
-
-
WHERE
表达式(u1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(u2)
是一个路径模式。WHERE
子句中的路径模式
表达式是一个_predicate_,只要模式至少出现一次就评估为true
。一旦找到模式的一个实例,就不再继续查找,因此如果路径模式
具有多个匹配,它将非常高效。
英文:
As was suggested by @ChristopheWillemsen, you should consider creating unique Genre
nodes and adding a relationship between each Book
and its Genre
.
Not only would that make your data model "more graphy" by directly storing (and making visible) the relationships between genres and books, but it can optimize your use case as well.
Here is a fast query that returns the names of all unique pairs of users who read at least one book with the same genre.
MATCH (u1:User)-[:FOLLOWS]-(u2:User)
WHERE
ID(u1) < ID(u2) AND
(u1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(u2)
RETURN DISTINCT u1.username, u2.username
Explanations (there is a lot going on):
-
The above
MATCH
relationship pattern is "undirected" (specifies no relationship direction), so it will match a relationship in either direction. That is well and good, but a symmetrical undirected relationship pattern (where both end nodes have the same node pattern, or at least one node pattern is just()
) causes the same pair of nodes to be returned twice (except in opposite order). For your use case, presumably you do not want to treat 'Alice'/'Bob' and 'Bob'/'Alice' as different pairs of users.-
(A) One potential way to fix this is to use a directional relationship pattern instead. This is not acceptable for all use cases, but will work for yours. (However, if
u2
alsoFOLLOWS
u1
, then you will still get duplicate pairs.) TheDISTINCT
option is only needed here if it is possible for there to be multipleFOLLOWS
relationships in the same direction.MATCH (u1:User)-[:FOLLOWS]->(u2:User) WHERE (u1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(u2) RETURN DISTINCT u1.username, u2.username
-
(B) The query presented at the top of this answer uses a different method of enforcing the order of the returned nodes -- that is, by native ID. With that query the
DISTINCT
option is probably wise to keep, since it will eliminate duplicate pairs no matter how manyFOLLOWS
relationships exist between the same 2 nodes, in either direction. When profiling (A) and (B) with my own test data, (B) turned out to use slightly fewer DB hits. But you should profile (A) and (B) yourself to see which one is better with your own actual data, if (A) is at all acceptable.
-
-
The
WHERE
expression(u1)-[:READ]->()-[:HAS_GENRE]->()<-[:HAS_GENRE]-()<-[:READ]-(u2)
is a path pattern. Apath pattern
expression in aWHERE
clause is a predicate that evaluates totrue
iff the pattern is found at least once. As soon as a single instance of the pattern is found no effort is made to look further, so it is very efficient if thepath pattern
can have multiple matches.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论