2016-08-14 307 views
1

我尝试使用以下代码来生成网络矩阵。使用这个矩阵,我想找到不在diagnal上的20个最高加权边(,即矩阵中的i!=j)。我也想获得由这些边缘组成的节点的名称(成对)。如何查找网络矩阵(networkx)中的n最大边权重?

import heapq 
def mapper_network(self, _, info): 
    G = nx.Graph() #create a graph 
    for i in range(len(info)): 
     edge_from = info[0] # edge from 
     edge_to = info[1] # edge to 
     weight = info[2]  # edge weight 
     G.add_edge(edge_from, edge_to, weight=weight) #insert the edge to the graph 
    A = nx.adjacency_matrix(G) # create an adjacency matrix 
    A_square = A * A # find the product of the matrix 
    print heapq.nlargest(20, A_square) # to print out the 20 highest weighted edges 

但是,通过此代码,我无法生成20个最重加权的边。我获得raise ValueError("The truth value of an array with more than one " ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().

相反,本

print heapq.nlargest(20, range(len(A_square)), A_square.take) 

它给我:

raise TypeError("sparse matrix length is ambiguous; use getnnz()" 
    TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0] 

随着

def mapper_network(self, _, info): 
    G = nx.Graph() 
    for i in range(len(info)): 
     edge_from = info[0] 
     edge_to = info[1] 
     weight = info[2] 
     G.add_edge(edge_from, edge_to, weight=weight) 
    A = nx.adjacency_matrix(G) 
    A_square = A * A #can print (A_square.todense()) 
    weight = nx.get_edge_attributes(A_square, weight) 
    edges = A_square.edges(data = True) 
    s = sorted(G.edges(data=True), key=lambda (source, target, data): data['weight']) 
    print s 

我收到

File   "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 433, in run 
mr_job.execute() 
File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 442, in execute 
self.run_mapper(self.options.step_num) 
File "/tmp/MRQ7_trevor.vagrant.20160814.040827.770006/job_local_dir/1/mapper/0/mrjob.tar.gz/mrjob/job.py", line 507, in run_mapper 
for out_key, out_value in mapper(key, value) or(): 
File "MRQ7_trevor.py", line 90, in mapper_network 
weight = nx.get_edge_attributes(A_square, weight) 
File "/home/vagrant/anaconda/lib/python2.7/site-packages/networkx/classes/function.py", line 428, in get_edge_attributes 
if G.is_multigraph(): 
File "/home/vagrant/anaconda/lib/python2.7/site-packages/scipy/sparse/base.py", line 499, in __getattr__ 
raise AttributeError(attr + " not found") 

AttributeError:is_multigraph not found

有人可以帮我解决这个问题吗?非常感谢你!

回答

1

这一行的问题:

heapq.nlargest(20, A_square) 

那是nlargest不期望iterables的迭代,但数字的interable。

所以,你可以这样做,而不是:

heapq.nlargest(20, itertools.chain.from_iterable(A_square)) 

itertools.chain.iterables需要iterables的迭代,并创建一个新的迭代与所有内部iterables的内容。


然而,这不会entierly解决您最初的问题,原因有二:

  1. 你为什么要采取邻接矩阵的平方?这样做只会给你图中长度为2的最高加权总和,这与你想要的完全不同。只需使用邻接矩阵。

  2. 你的代码中没有任何地方去掉对角线。你可以这样做:for n in G.number_of_nodes(): A[n][n] = 0

+0

非常感谢你的帮助。我拿走了邻接矩阵的平方,因为我想知道哪些节点'最接近共享节点'。例如,A-B,A-C,但!B-C。 B和C通过A连接。我希望知道哪一对节点以这种方式连接最多。我在G.number_of_edges()中尝试了n:A [n] [n] = 0,但它没有显示边的属性number_。我也希望知道节点名称(边缘)(即A-C),但不知道如何用稀疏的csv矩阵做到这一点。谢谢! – achimneyswallow

+0

对不起,我的意思是'number_of_nodes()'(它是矩阵的大小) –