2023年5月10日 17:56:10go评论90阅读模式

英文:

Efficient Node Grouping Technique for Centrality Calculations in Graphs

问题

I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.

This is the implementation I did, but it's pretty basic. As you can see, the fact that I have 2 loops, the code is about O(n^2), which is quite a lot when used on a big graph.

Here is a MRE if you want to test the code:

import networkx as nx
def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes += 1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node + 1, nbNodes):
                if (node != node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if (not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups
i = 4  # number of communities
n = 5  # number of nodes per community
p_in = 1  # probability of an edge within a community
p_out = 0.1  # probability of an edge between communities
G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42)
print(group_similar_nodes(G))
# expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]

(Note: This is a translation of your provided code and description.)

英文:

This is the implementation i did but it's pretty basic. As you can see the fact that I have 2 loop the code is about O(n^2) which is quite a lot when used on big graph.
As @ravenspoint and @PaulBrodersen correctly pointed out, my loop wasnt working correctly so i changed the implementation.

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

Here is a MRE if you want to test the code

import networkx as nx
def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups
i = 4 #number of communitues
n = 5 # number of nodes per communitues
p_in = 1 # probability of an edge within a community
p_out = 0.1 # probability of an edge between communities
G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42) 
print(group_similar_nodes(G))
#expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]

答案1

得分: 1

for node2 in lonely_node:

这个循环会导致每一对节点都进行两次检查（一次是X,Y，另一次是Y,X）。

我不知道如何在Python中修复这个问题。在C++中，可以像这样实现：

for (node_index = 0;
     node_index < lonely_node.size();
     node_index++)
   for (node_index2 = node_index + 1;
        node_index2 < lonely_node.size();
        node_index2++)

由于您关心性能，您应该考虑从Python转向编译语言，如C++。这将为您提供性能提升，有时可以使相同的算法运行速度提高50倍。

英文:

   for node2 in lonely_node

This loop causes a check to be done on each pair of nodes twice ( once with X,Y and again with Y,X )

I don't know how to to fix that in python. In c++ it would be something like this

for( node_index = 0;
     node_index &lt; lonely_node.size();
     node_index++ )
   for( node_index2 = node_index+1;
        node_index2 &lt; lonely_node.size();
        node_index++ )

Since you care about performance you should consider moving from python to a compiled language such as C++. This will give you a performance boost, sometimes running the same algorithm 50 times faster.

答案2

得分: 1

以下是您要翻译的代码部分：

After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the `nx.Graph` object is created. The `nx.Graph` object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:
import networkx as nx
def invert_dict(d):
    # https://stackoverflow.com/a/485368/2912349
    inverted = dict()
    for k, v in d.items():
        inverted[v] = inverted.get(v, []) + [k]
    return inverted
if __name__ == '__main__':
    G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph
    d = dict()
    for node in G:
        d[node] = tuple(sorted(G.neighbors(node)))
    inverted = invert_dict(d)
    print(list(inverted.values()))
    # [[0, 2], [1, 3]]

英文:

After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the nx.Graph object is created. The nx.Graph object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:

import networkx as nx
def invert_dict(d):
    # https://stackoverflow.com/a/485368/2912349
    inverted = dict()
    for k, v in d.items():
        inverted[v] = inverted.get(v, []) + [k]
    return inverted
if __name__ == &#39;__main__&#39;:
    G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph
    d = dict()
    for node in G:
        d[node] = tuple(sorted(G.neighbors(node)))
    inverted = invert_dict(d)
    print(list(inverted.values()))
    # [[0, 2], [1, 3]]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Efficient Node Grouping Technique for Centrality Calculations in Graphs

问题

答案1

答案2

Selenium选择选项无法使用列表元素。

使用ImageField进行Django ModelForm的单元测试，测试显示表单无效。

Failed to parse the problem — EOL while scanning string literal (, line 43)

为什么我的使用NumPy数组的排序算法比使用列表慢？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。