Python：列表的列表的字典

def makecounter(): 
    return collections.defaultdict(int) 

class RankedIndex(object): 
    def __init__(self): 
    self._inverted_index = collections.defaultdict(list) 
    self._documents = [] 
    self._inverted_index = collections.defaultdict(makecounter) 


def index_dir(self, base_path): 
    num_files_indexed = 0 
    allfiles = os.listdir(base_path) 
    self._documents = os.listdir(base_path) 
    num_files_indexed = len(allfiles) 
    docnumber = 0 
    self._inverted_index = collections.defaultdict(list) 

    docnumlist = [] 
    for file in allfiles: 
      self.documents = [base_path+file] #list of all text files 
      f = open(base_path+file, 'r') 
      lines = f.read() 

      tokens = self.tokenize(lines) 
      docnumber = docnumber + 1 
      for term in tokens: 
       if term not in sorted(self._inverted_index.keys()): 
        self._inverted_index[term] = [docnumber] 
        self._inverted_index[term][docnumber] +=1           
       else: 
        if docnumber not in self._inverted_index.get(term): 
         docnumlist = self._inverted_index.get(term) 
         docnumlist = docnumlist.append(docnumber) 
      f.close() 
    print '\n \n' 
    print 'Dictionary contents: \n' 
    for term in sorted(self._inverted_index): 
     print term, '->', self._inverted_index.get(term) 
    return num_files_indexed 
    return 0

我得到执行此代码时的索引错误：列表索引超出范围。Python：列表的列表的字典

上面的代码生成一个字典索引，它将'term'存储为一个键，并将该术语作为列表存储在其中的文档编号。对于例如：如果术语“猫”在文件1.txt的，5.txt和7.txt字典时将有：猫< - [1,5,7]

现在，我要修改它会添加词频，因此如果单词cat在文档1中出现两次，文档5中出现三次，文档7出现一次：预期结果： term < - [[docnumber，term freq]，[docnumber，term freq]] < - 列表中的字典！猫< - [[1,2]，[5,3]，[7,1]]

我玩过代码，但没有任何效果。我不知道如何修改这个数据结构来达到上述目的。

在此先感谢。

来源

2010-10-05 csguy11

首先，使用工厂。首先：

def makecounter(): 
    return collections.defaultdict(int)

，并在以后使用

self._inverted_index = collections.defaultdict(makecounter)

，并为for term in tokens:循环，

 for term in tokens: 
       self._inverted_index[term][docnumber] +=1

这使得在每个self._inverted_index[term]的字典如

{1:2,5:3,7:1}

在ÿ我们的例子。既然你想要，而不是在每个self._inverted_index[term]列表的列表，然后就在循环加载结束后：

self._inverted_index = dict((t,[d,v[d] for d in sorted(v)]) 
          for t in self._inverted_index)

一旦制成（这种方式或其他任何 - 我只是显示一个简单的方法来构建它！），那么这个数据结构实际上会使用起来很尴尬，因为当你不必要地构造时，这个数据结构很难使用（字典的字典更加有用，易于使用和构造），但是，嘿，男人肉＆c ;-)。

来源

2010-10-05 03:14:44

我已经做出了您所建议的更改。我意识到你的方法比实施清单列表更简单明了。但是，它目前给我一个错误，我编辑了上面的代码。 – csguy11 2010-10-05 03:37:37

@csguy，在你的'indexdir'方法中（假设它**是** 1，你的缩进如上所述都是错误的），你可以完全摧毁以前分配给'self._inverted_index'的任何东西，方法是将之前的，错误的数据结构，从而使您对代码的编辑完全无关紧要。当你做'self.a = b'的时候，你意识到，只要没有更多的事情就无所谓了，如果有的话，以前被分配给'self.a'，对吧？！ – 2010-10-05 05:10:25

我得到了问题所在，但由于我不太了解你的实现，所以我决定坚持我的方法，即列表列表的字典，即使它过于复杂。 – csguy11 2010-10-05 06:42:29

也许你可以为（docname，frequency）创建一个简单的类。

然后你的字典可能有这个新的数据类型的列表。你也可以做一个列表清单，但是一个单独的数据类型会更干净。

来源

2010-10-05 03:06:17 JoshD

下面是一个可以使用的通用算法，但是您可以调整一些代码。它产生一个字典，其中包含每个文件的字数统计字典。

filedicts = {} 
for file in allfiles: 
    filedicts[file] = {} 

    for term in terms: 
    filedict.setdefault(term, 0) 
    filedict[term] += 1

来源

2010-10-05 03:09:32 mikerobi

Python：列表的列表的字典

回答

相关问题