Efficient Node Grouping Technique for Centrality Calculations in Graphs

huangapple go评论57阅读模式
英文:

Efficient Node Grouping Technique for Centrality Calculations in Graphs

问题

I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.

This is the implementation I did, but it's pretty basic. As you can see, the fact that I have 2 loops, the code is about O(n^2), which is quite a lot when used on a big graph.

Here is a MRE if you want to test the code:

import networkx as nx

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes += 1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node + 1, nbNodes):
                if (node != node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if (not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

i = 4  # number of communities
n = 5  # number of nodes per community
p_in = 1  # probability of an edge within a community
p_out = 0.1  # probability of an edge between communities

G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42)
print(group_similar_nodes(G))
# expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]

(Note: This is a translation of your provided code and description.)

英文:

I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.

This is the implementation i did but it's pretty basic. As you can see the fact that I have 2 loop the code is about O(n^2) which is quite a lot when used on big graph.
As @ravenspoint and @PaulBrodersen correctly pointed out, my loop wasnt working correctly so i changed the implementation.

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

Here is a MRE if you want to test the code

import networkx as nx

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

i = 4 #number of communitues
n = 5 # number of nodes per communitues
p_in = 1 # probability of an edge within a community
p_out = 0.1 # probability of an edge between communities

G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42) 
print(group_similar_nodes(G))
#expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]

答案1

得分: 1

for node2 in lonely_node:

这个循环会导致每一对节点都进行两次检查(一次是X,Y,另一次是Y,X)。

我不知道如何在Python中修复这个问题。在C++中,可以像这样实现:

for (node_index = 0;
     node_index < lonely_node.size();
     node_index++)
   for (node_index2 = node_index + 1;
        node_index2 < lonely_node.size();
        node_index2++)

由于您关心性能,您应该考虑从Python转向编译语言,如C++。这将为您提供性能提升,有时可以使相同的算法运行速度提高50倍。

英文:
   for node2 in lonely_node

This loop causes a check to be done on each pair of nodes twice ( once with X,Y and again with Y,X )

I don't know how to to fix that in python. In c++ it would be something like this

for( node_index = 0;
     node_index &lt; lonely_node.size();
     node_index++ )
   for( node_index2 = node_index+1;
        node_index2 &lt; lonely_node.size();
        node_index++ )

Since you care about performance you should consider moving from python to a compiled language such as C++. This will give you a performance boost, sometimes running the same algorithm 50 times faster.

答案2

得分: 1

以下是您要翻译的代码部分:

After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the `nx.Graph` object is created. The `nx.Graph` object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:

import networkx as nx

def invert_dict(d):
    # https://stackoverflow.com/a/485368/2912349
    inverted = dict()
    for k, v in d.items():
        inverted[v] = inverted.get(v, []) + [k]
    return inverted

if __name__ == '__main__':

    G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph

    d = dict()
    for node in G:
        d[node] = tuple(sorted(G.neighbors(node)))

    inverted = invert_dict(d)
    print(list(inverted.values()))
    # [[0, 2], [1, 3]]
英文:

After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the nx.Graph object is created. The nx.Graph object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:

import networkx as nx

def invert_dict(d):
    # https://stackoverflow.com/a/485368/2912349
    inverted = dict()
    for k, v in d.items():
        inverted[v] = inverted.get(v, []) + [k]
    return inverted

if __name__ == &#39;__main__&#39;:

    G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph

    d = dict()
    for node in G:
        d[node] = tuple(sorted(G.neighbors(node)))

    inverted = invert_dict(d)
    print(list(inverted.values()))
    # [[0, 2], [1, 3]]

huangapple
  • 本文由 发表于 2023年5月10日 17:56:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/76217073.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定