英文:
Efficient Node Grouping Technique for Centrality Calculations in Graphs
问题
I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.
This is the implementation I did, but it's pretty basic. As you can see, the fact that I have 2 loops, the code is about O(n^2), which is quite a lot when used on a big graph.
Here is a MRE if you want to test the code:
import networkx as nx
def group_similar_nodes(G):
node_groups = []
lonely_node = []
nbNodes = 0
neighbors = {}
for node in G.nodes():
neighbors[node] = set(G.neighbors(node))
lonely_node.append(node)
nbNodes += 1
for node in range(nbNodes):
if node in lonely_node:
group = [node]
lonely_node.remove(node)
for node2 in range(node + 1, nbNodes):
if (node != node2) and (node2 in lonely_node):
diff = neighbors[node] ^ neighbors[node2]
diff.discard(node)
diff.discard(node2)
if (not diff):
group.append(node2)
lonely_node.remove(node2)
node_groups.append(group)
return node_groups
i = 4 # number of communities
n = 5 # number of nodes per community
p_in = 1 # probability of an edge within a community
p_out = 0.1 # probability of an edge between communities
G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42)
print(group_similar_nodes(G))
# expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]
(Note: This is a translation of your provided code and description.)
英文:
I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.
This is the implementation i did but it's pretty basic. As you can see the fact that I have 2 loop the code is about O(n^2) which is quite a lot when used on big graph.
As @ravenspoint and @PaulBrodersen correctly pointed out, my loop wasnt working correctly so i changed the implementation.
def group_similar_nodes(G):
node_groups = []
lonely_node = []
nbNodes = 0
neighbors = {}
for node in G.nodes():
neighbors[node] = set(G.neighbors(node))
lonely_node.append(node)
nbNodes+=1
for node in range(nbNodes):
if node in lonely_node:
group = [node]
lonely_node.remove(node)
for node2 in range(node+1,nbNodes):
if(node !=node2) and (node2 in lonely_node):
diff = neighbors[node] ^ neighbors[node2]
diff.discard(node)
diff.discard(node2)
if(not diff):
group.append(node2)
lonely_node.remove(node2)
node_groups.append(group)
return node_groups
Here is a MRE if you want to test the code
import networkx as nx
def group_similar_nodes(G):
node_groups = []
lonely_node = []
nbNodes = 0
neighbors = {}
for node in G.nodes():
neighbors[node] = set(G.neighbors(node))
lonely_node.append(node)
nbNodes+=1
for node in range(nbNodes):
if node in lonely_node:
group = [node]
lonely_node.remove(node)
for node2 in range(node+1,nbNodes):
if(node !=node2) and (node2 in lonely_node):
diff = neighbors[node] ^ neighbors[node2]
diff.discard(node)
diff.discard(node2)
if(not diff):
group.append(node2)
lonely_node.remove(node2)
node_groups.append(group)
return node_groups
i = 4 #number of communitues
n = 5 # number of nodes per communitues
p_in = 1 # probability of an edge within a community
p_out = 0.1 # probability of an edge between communities
G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42)
print(group_similar_nodes(G))
#expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]
答案1
得分: 1
for node2 in lonely_node:
这个循环会导致每一对节点都进行两次检查(一次是X,Y,另一次是Y,X)。
我不知道如何在Python中修复这个问题。在C++中,可以像这样实现:
for (node_index = 0;
node_index < lonely_node.size();
node_index++)
for (node_index2 = node_index + 1;
node_index2 < lonely_node.size();
node_index2++)
由于您关心性能,您应该考虑从Python转向编译语言,如C++。这将为您提供性能提升,有时可以使相同的算法运行速度提高50倍。
英文:
for node2 in lonely_node
This loop causes a check to be done on each pair of nodes twice ( once with X,Y and again with Y,X )
I don't know how to to fix that in python. In c++ it would be something like this
for( node_index = 0;
node_index < lonely_node.size();
node_index++ )
for( node_index2 = node_index+1;
node_index2 < lonely_node.size();
node_index++ )
Since you care about performance you should consider moving from python to a compiled language such as C++. This will give you a performance boost, sometimes running the same algorithm 50 times faster.
答案2
得分: 1
以下是您要翻译的代码部分:
After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the `nx.Graph` object is created. The `nx.Graph` object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:
import networkx as nx
def invert_dict(d):
# https://stackoverflow.com/a/485368/2912349
inverted = dict()
for k, v in d.items():
inverted[v] = inverted.get(v, []) + [k]
return inverted
if __name__ == '__main__':
G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph
d = dict()
for node in G:
d[node] = tuple(sorted(G.neighbors(node)))
inverted = invert_dict(d)
print(list(inverted.values()))
# [[0, 2], [1, 3]]
英文:
After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the nx.Graph
object is created. The nx.Graph
object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:
import networkx as nx
def invert_dict(d):
# https://stackoverflow.com/a/485368/2912349
inverted = dict()
for k, v in d.items():
inverted[v] = inverted.get(v, []) + [k]
return inverted
if __name__ == '__main__':
G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph
d = dict()
for node in G:
d[node] = tuple(sorted(G.neighbors(node)))
inverted = invert_dict(d)
print(list(inverted.values()))
# [[0, 2], [1, 3]]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论