0
我的代码运行但我的函数输出总是0.0
。我的代码调用.txt
文件并创建一个矩阵,其中每个.txt
文件表示矩阵中的一行,并且.txt
文件中的每个单词在矩阵的相应行中都有自己的列。用“一包字”的方法计算距离
我将两条线进行比较。我想要统计两行联合的每个词出现的频率。然而,尽管代码运行,我得到了错误的结果(0.0
)。
我想我可能会在我用于该功能的矩阵中出现错误,但矩阵看起来不错。
奇怪的是,如果我手动创建到列表:
a = ["a", "b", "c", "d"],
b = ["b", "c", "d", "e"]
它的工作原理,但是当我更改为:
a = ["word 1", "word 2", "word 3", "word 4"],
b = ["word 2","word 3","word 4","word 5",]
结果再次0.0
。我很困惑!
我的代码:
def bow_distance(a, b):
p = 0
if len(a) > len(b):
max_words = len(a)
else:
max_words = len(b)
list_words_ab = list(set(a) | set(b))
len_bow_matrix = len(list_words_ab)
bow_matrix = numpy.zeros(shape = (3, len_bow_matrix), dtype = str)
while p < len_bow_matrix:
bow_matrix[0, p] = str(list_words_ab[p])
p = p+1
p = 0
while p < len_bow_matrix:
bow_matrix[1, p] = a.count(bow_matrix[0, p])
bow_matrix[2, p] = b.count(bow_matrix[0, p])
p = p+1
p = 0
overlap = 0
while p < len_bow_matrix:
abs_difference = abs(float(bow_matrix[1, p]) - float(bow_matrix[2, p]))
overlap = overlap + abs_difference
p = p+1
return (overlap/2)/max_num_parts
# Calculate the distances
i = 1
j = 1
while i < num_of_txt + 1:
print(i)
newfile = open("TXT_distance_" + str(i)+".txt", "w")
while j < num_of_txt + 1:
newfile.write(str(bow_distance(text_word_matrix[i-1], text_word_matrix[j-1])) + " ")
j = j+1
newfile.close()
j = 1
i = i+1
还有后'“字5”'需要被去除的多余的逗号。 – Harrison
诚然,谢谢你。 – turkus
单词5之后的逗号并不重要,因为它可以在列表中以逗号结尾。然而,列表定义之后的逗号*(其中定义了'a')会使'a'成为具有单个值(即数组本身)的元组,并且可能会抛弃您的逻辑。 – Riaz