当我创建的本地使用的嵌套字典存储文本文件的倒排索引不海峡。倒排索引的抽象结构在下面(值是整数)。在键'0'的任何字值中,键'1'的idf和值是tf。类型错误:列表索引必须是整数或切片,使用嵌套的字典
inverted_index={'word1':{'0':idf_value, '1': 2 , 'filename1': frequency_value, 'filename2': frequency_value},'word2':{'0':idf_value, '1': 2, 'filename1': frequency_value, 'filename2': frequency_value}}
这是代码:
import textract, math, os
docs=[]
#Read the files and store them in docs
folder = os.listdir("./input/")
for file in folder:
if file.endswith("txt"):
docs.append ([file,textract.process("./input/"+file)])
inverted_index={}
for doc in docs:
words=doc[1].decode()
words=words.split(" ")
#loop through and build the inverted index
for word in words:
temp={}
#to remove initial white space
if (word == " ") or (word==""):
continue
if word not in inverted_index:
temp[doc[0]]=1
temp['0']=0 #idf
temp['1']=1 #tf
inverted_index[word]=temp
else:
if doc[0] not in inverted_index[word].keys():
inverted_index[word][doc[0]]=1
inverted_index[word]['1']=inverted_index[word]['1']+1
else:
inverted_index[word][doc[0]]=inverted_index[word][doc[0]]+1
# to sort and print values with calculating the the tf and idf on the fly
for key, value in sorted(inverted_index.items()): # to sort words alphabitically
inverted_index[key]=sorted(inverted_index[key]) # to sort the filenames where the word occured.
inverted_index[key]['0']=math.log2(len(docs)/value['1']) # the error in this line
print(key, value)
,但我得到这个错误在倒数第二行:
Traceback (most recent call last):
File "aaaa.py", line 34, in <module>
inverted_index[key]['0']=math.log2(len(docs)/value['1'])
TypeError: list indices must be integers or slices, not str
能否请你帮我解决这个bug。谢谢
请张贴满'Traceback' – ksai
是inverted_index列出的值是多少?如果是这样,inverted_index [key] ['1']可能是问题。尝试将该行中的索引从“1”更改为1,将“0”更改为0,而不加引号。如果你有一个列表'a = [1,2,3]',那么你可以通过'a [0]'而不是'a ['0']来访问列表中的项目, ]'。字符串索引是不允许的。 –
我不知道inverted_index'的'的内容,但根据错误尝试改变'值“1”]''到值[1]'在最后第二行。 – ksai